From 545e18b08e3b0638838ba2ea7de563e712d2494b Mon Sep 17 00:00:00 2001
From: Erfan Noury <erfannoury@users.noreply.github.com>
Date: Wed, 19 Sep 2018 09:27:15 -0400
Subject: [PATCH 001/531] Probability Refresher Translation (#1)

---
 CONTRIBUTORS                |   4 +-
 README.md                   |  14 ++---
 fa/refresher-probability.md | 119 ++++++++++++++++++------------------
 3 files changed, 71 insertions(+), 66 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index d628ba1be..78dba6908 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -5,7 +5,9 @@
 --es
 
 --fa
-
+  Probability:
+    Translation: Erfan Noury (https://github.com/erfannoury)
+    Review: Mohammad Karimi (https://github.com/m-karimi)
 --fr
 
 --he
diff --git a/README.md b/README.md
index c350d9116..562477ef9 100644
--- a/README.md
+++ b/README.md
@@ -3,13 +3,13 @@
 This repository aims at collaboratively translating our [Machine Learning cheatsheets](https://github.com/afshinea/stanford-cs-229-machine-learning) into a ton of languages, so that this content can be enjoyed by anyone from any part of the world!
 
 ## Progression
-|Cheatsheet topic|[Deutsch](https://github.com/shervinea/cheatsheet-translation/tree/master/de)|[Español](https://github.com/shervinea/cheatsheet-translation/tree/master/es)|[فارسی](https://github.com/shervinea/cheatsheet-translation/tree/master/fa)|[Français](https://github.com/shervinea/cheatsheet-translation/tree/master/fr)|[日本語](https://github.com/shervinea/cheatsheet-translation/tree/master/ja)|[Português](https://github.com/shervinea/cheatsheet-translation/tree/master/pt)|[官话](https://github.com/shervinea/cheatsheet-translation/tree/master/zh)|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|0%|0%|0%|0%|0%|0%|0%|
-|Supervised learning|0%|0%|0%|0%|0%|0%|1%|
-|Unsupervised learning|0%|0%|0%|0%|0%|0%|0%|
-|Probabilities and Statistics|0%|0%|0%|0%|0%|0%|0%|
-|Linear algebra|0%|0%|0%|0%|0%|0%|0%|
+| Cheatsheet topic | [Deutsch](https://github.com/shervinea/cheatsheet-translation/tree/master/de) | [Español](https://github.com/shervinea/cheatsheet-translation/tree/master/es) | [فارسی](https://github.com/shervinea/cheatsheet-translation/tree/master/fa) | [Français](https://github.com/shervinea/cheatsheet-translation/tree/master/fr) | [日本語](https://github.com/shervinea/cheatsheet-translation/tree/master/ja) | [Português](https://github.com/shervinea/cheatsheet-translation/tree/master/pt) | [官话](https://github.com/shervinea/cheatsheet-translation/tree/master/zh) |
+|------------------------------|:-----------------------------------------------------------------------------:|:-----------------------------------------------------------------------------:|:---------------------------------------------------------------------------:|:------------------------------------------------------------------------------:|:----------------------------------------------------------------------------:|:-------------------------------------------------------------------------------:|:--------------------------------------------------------------------------:|
+| Deep learning | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
+| Supervised learning | 0% | 0% | 0% | 0% | 0% | 0% | 1% |
+| Unsupervised learning | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
+| Probabilities and Statistics | 0% | 0% | 100% | 0% | 0% | 0% | 0% |
+| Linear algebra | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|[हिन्दी](https://github.com/shervinea/cheatsheet-translation/tree/master/hi)|[ಕನ್ನಡ](https://github.com/shervinea/cheatsheet-translation/tree/master/kn)|[मराठी](https://github.com/shervinea/cheatsheet-translation/tree/master/mr)|[తెలుగు](https://github.com/shervinea/cheatsheet-translation/tree/master/te)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
diff --git a/fa/refresher-probability.md b/fa/refresher-probability.md
index db03157d5..ab4cba236 100644
--- a/fa/refresher-probability.md
+++ b/fa/refresher-probability.md
@@ -1,347 +1,350 @@
 **1. Probabilities and Statistics refresher**
 
-&#10230;
+یادآوری آمار و احتمالات
 
 <br>
 
 **2. Introduction to Probability and Combinatorics**
 
-&#10230;
+مقدمه‌ای بر احتمالات و ترکیبیات
 
 <br>
 
 **3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
 
-&#10230;
+فضای نمونه - مجموعه‌ی همه‌ی پیشامدهای یک آزمایش را فضای نمونه‌ی آن آزمایش گویند که با $S$ نمایش داده می‌شود.
 
 <br>
 
 **4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
 
-&#10230;
+رخداد - هر زیرمجموعه‌ی $E$از فضای نمونه یک رخداد در نظر گرفته می‌شود.
+به عبارت دیگر، یک رخداد مجموعه‌ای از پیشامدهای یک آزمایش است.
+اگر پیشامد یک آزمایش عضوی از مجموعه‌ی $E$ باشد، در این حالت می‌گوییم که رخداد $E$ اتفاق افتاده است.
 
 <br>
 
 **5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
 
-&#10230;
+اصول موضوعه‌ی احتمالات.
+برای هر رخداد $E$، $P(E)$ احتمال اتفاق افتادن رخداد $E$ می‌باشد.
 
 <br>
 
 **6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
 
-&#10230;
+اصل ۱ - احتمال عددی بین ۰ و ۱ است.
 
 <br>
 
 **7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
 
-&#10230;
+اصل ۲ - احتمال اینکه حداقل یکی از رخدادهای موجود در فضای نمونه اتفاق بیوفتد، ۱ است.
 
 <br>
 
 **8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
 
-&#10230;
+اصل ۳ - برای هر دنباله از رخدادهایی که دو به دو اشتراک نداشته باشند، داریم:
 
 <br>
 
 **9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
 
-&#10230;
+جایگشت - یک جایگشت چیدمانی از $r$ شی از $n$ شی با یک ترتیب خاص است. تعداد این چنین جایگشت‌ها $P(n, r)$ است که به صورت زیر تعریف می‌شود:
 
 <br>
 
 **10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
 
-&#10230;
+ترکیب - یک ترکیب چیدمانی از $r$ شی از $n$ شی است، به طوری که ترتیب اهمیتی نداشته باشد. تعداد این چنین ترکیب‌ها $C(n, r)$ است که به صورت زیر تعریف می‌شود:
 
 <br>
 
 **11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
 
-&#10230;
+نکته: برای $0 \leq r \leq n$، داریم $P(n, r) \geq C(n, r)$
 
 <br>
 
 **12. Conditional Probability**
 
-&#10230;
+احتمال شرطی
 
 <br>
 
 **13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
 
-&#10230;
+قضیه‌ی بیز - برای رخدادهای $A$ و $B$ به طوری که $P(B) > 0$ داریم:
 
 <br>
 
 **14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
 
-&#10230;
+نکته:‌داریم $P(A \cap B) = P(A) P(B | A) = P(A | B) P(B)$
 
 <br>
 
 **15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
 
-&#10230;
+افراز - فرض می‌کنیم برای $\{A_i, i \in [1, n] \}$ به ازای هر $i$ داشته باشیم $A_i \neq \mathbf{0}$. در این صورت می‌گوییم $\{A_i\}$ یک افراز است اگر:
 
 <br>
 
 **16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
 
-&#10230;
+نکته: برای هر رخداد $B$ در فضای نمونه داریم $P(B) = \sum_{i=1}^n P(B | A_i) P(A_i)$.
 
 <br>
 
 **17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
 
-&#10230;
+تعمیم قضیه‌ی بیز - فرض می‌کنیم $\{A_i, i \in [1, n]\}$ یک افراز از فضای نمونه باشید. در این صورت داریم:
 
 <br>
 
 **18. Independence ― Two events A and B are independent if and only if we have:**
 
-&#10230;
+استقلال - دو رخداد $A$ و $B$ مستقل هستند اگر و فقط اگر داشته باشیم:
 
 <br>
 
 **19. Random Variables**
 
-&#10230;
+متغیرهای تصادفی
 
 <br>
 
 **20. Definitions**
 
-&#10230;
+تعاریف
 
 <br>
 
 **21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
 
-&#10230;
+متغیر تصادفی - یک متغیر تصادفی، که معمولاً با $X$ نمایش داده می‌شود، یک تابع است که هر عضو فضای نمونه را به اعداد حقیقی نگاشت می‌کند.
 
 <br>
 
 **22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
 
-&#10230;
+تابع توزیع تجمعی - تابع توزیع تجمعی $F$، که تابعی یکنوا و اکیدا غیرنزولی است و برای آن $\lim_{x \rightarrow -\infty} F(x) = 0$ و $\lim_{x \rightarrow +\infty} F(x) = 1$ صدق می‌کنید، به صورت زیر تعریف می‌شود:
 
 <br>
 
 **23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
 
-&#10230;
+نکته: داریم $P(a < X \leq b) = F(b) - F(a)$.
 
 <br>
 
 **24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
 
-&#10230;
+تابع چگالی احتمال (PDF) - تابع چگالی احتمال $f$ احتمال آن است که متغیر تصادفی $X$ مقداری بین دو تحقق همجوار این متغیر تصادفی را بگیرد.
 
 <br>
 
 **25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
 
-&#10230;
+ارتباط بین PDF و CDF - موارد زیر ویژگی‌های مهمی هستند که باید در مورد حالت گسسته و حالت پیوسته در نظر گرفت.
 
 <br>
 
 **26. [Case, CDF F, PDF f, Properties of PDF]**
 
-&#10230;
+[[CDF F, PDF f, ویژگی‌های PDF]]
 
 <br>
 
 **27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
 
-&#10230;
+امید ریاضی و گشتاورهای یک توزیع - عبارت‌های مربوط به امید ریاضی $E[X]$، امید ریاضی تعمیم یافته $E[g(X)]$، $k$-مین گشتاور $E[X^k]$، و تابع ویژگی $\psi(\omega)$ برای حالات پیوسته و گسسته به صورت زیر هستند:
 
 <br>
 
 **28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
 
-&#10230;
+واریانس - واریانس یک متغیر تصادفی، که معمولاً با $Var(X)$ یا $\sigma^2$ نمایش داده می‌شود، میزانی از پراکندگی یک تابع توزیع است. مقدار واریانس به صورت زیر به دست می‌آید:
 
 <br>
 
 **29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
 
-&#10230;
+انحراف معیار - انحراف معیار یک متغیر تصادفی، که با $\sigma$ نمایش داده می‌شود، میزانی از پراکندگی یک تابع توزیع است که با متغیر تصادفی هم‌واحد است. مقدار آن به صورت زیر به دست می‌آید:
 
 <br>
 
 **30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
 
-&#10230;
+تبدیلات متغیرهای تصادفی - فرض کنید متغیرهای تصادفی $X$ و $Y$ توسط تابعی به هم مرتبط هستند. با نمایش تابع توزیع متغیرهای تصادفی $X$ و $Y$ با $f_X$ و $f_Y$ داریم:
 
 <br>
 
 **31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
 
-&#10230;
+قضیه‌ی انتگرال لایبنیتس - فرض کنید $g$ تابعی از $x$ و $c$ باشد، و $a$ و $b$ کران‌هایی باشند که مقدار آن‌ها وابسته به مقدار $c$ باشد. داریم:
 
 <br>
 
 **32. Probability Distributions**
 
-&#10230;
+توزیع‌های احتمالی
 
 <br>
 
 **33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
 
-&#10230;
+نابرابری چبیشف - فرض کنید $X$ متغیری تصادفی با امید ریاضی $\mu$ باشد. برای هر $k$ و $\sigma > 0$ نابرابری زیر را داریم:
 
 <br>
 
 **34. Main distributions ― Here are the main distributions to have in mind:**
 
-&#10230;
+توزیع‌های احتمالی اصلی - توزیع‌های زیر توزیع‌های احتمالی اصلی هستند که بهتر است به خاطر بسپارید:
 
 <br>
 
 **35. [Type, Distribution]**
 
-&#10230;
+[نوع، توزیع]
 
 <br>
 
 **36. Jointly Distributed Random Variables**
 
-&#10230;
+متغیرهای تصادفی با توزیع مشترک
 
 <br>
 
 **37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
 
-&#10230;
+چگالی حاشیه‌ای و توزیع تجمعی - از تابع چگالی احتمالی مشترک $f_{XY}$ داریم
 
 <br>
 
 **38. [Case, Marginal density, Cumulative function]**
 
-&#10230;
+[حالت، چگالی حاشیه‌ای، تابع تجمعی]
 
 <br>
 
 **39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
 
-&#10230;
+چگالی شرطی - چگالی شرطی $X$ نسبت به $Y، که معمولاً با $f_{X | Y}$ نمایش داده می‌شود، به صورت زیر تعریف می‌شود:
 
 <br>
 
 **40. Independence ― Two random variables X and Y are said to be independent if we have:**
 
-&#10230;
+استقلال - دو متغیر تصادفی $X$ و $Y$ مستقل هستند اگر داشته باشیم:
 
 <br>
 
 **41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
 
-&#10230;
+کواریانس - کواریانس دو متغیر تصادفی $X$ و $Y$ که با $\sigma_{XY}$ یا به صورت معمول‌تر با $Cov{X,Y}$ نمایش داده می‌شود، به صورت زیر است:
 
 <br>
 
 **42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
 
-&#10230;
+همبستگی - با نمایش انحراف معیار $X$ و $Y$ به صورت $\sigma_X$ و $\sigma_Y$، همبستگی مابین دو متغیر تصادفی $X$ و $Y$ که با $\rho_{XY}$ نمایش داده می‌شود به صورت زیر تعریف می‌شود:
 
 <br>
 
 **43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
 
-&#10230;
+نکته‌ی ۱: برای هر دو متغیر تصادفی دلخواه $X$ و $Y$، داریم $\rho_{XY} \in [-1, 1]$.
 
 <br>
 
 **44. Remark 2: If X and Y are independent, then ρXY=0.**
 
-&#10230;
+نکته‌ی ۲: اگر $X$ و $Y$ مستقل باشند، داریم $\rho_{XY}=0$.
 
 <br>
 
 **45. Parameter estimation**
 
-&#10230;
+تخمین پارامتر
 
 <br>
 
 **46. Definitions**
 
-&#10230;
+تعاریف
 
 <br>
 
 **47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
 
-&#10230;
+نمونه‌ی تصادفی - یک نمونه‌ی تصادفی مجموعه‌ای از $n$ متغیر تصادفی $\{X_1, \dots, X_n\}$ است که از هم مستقل هستند و توزیع یکسانی با $X$ دارند.
 
 <br>
 
 **48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
 
-&#10230;
+تخمین‌گر - یک تخمین‌گر تابعی از داده‌ها است که برای به‌دست‌آوردن مقدار نامشخص یک پارامتر در یک مدل آماری به کار می‌رود.
 
 <br>
 
 **49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
 
-&#10230;
+پیش‌قدر - پیش‌قدر یک تخمین‌گر $\hat{\theta}$ به عنوان اختلاف بین امید ریاضی توزیع $\hat{\theta}$ و مقدار واقعی تعریف می‌شود. یعنی:
 
 <br>
 
 **50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
 
-&#10230;
+نکته: یک تخمین‌گر بدون پیش‌قدر است اگر داشته باشیم $E[\hat{\theta}] = \theta$.
 
 <br>
 
 **51. Estimating the mean**
 
-&#10230;
+تخمین میانگین
 
 <br>
 
 **52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
 
-&#10230;
+میانگین نمونه - میانگین نمونه‌ی یک نمونه‌ی تصادفی که برای تخمین مقدار واقعی میانگین $\mu$ یک توزیع به کار می‌رود، معمولاً با $\bar{X}$ نمایش داده می‌شود و به صورت زیر تعریف می‌شود:
 
 <br>
 
 **53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
 
-&#10230;
+نکته: میانگین نمونه بدون پیش‌قدر است، یعنی $E[\bar{X}] = \mu$.
 
 <br>
 
 **54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
 
-&#10230;
+قضیه‌ی حد مرکزی - یک نمونه‌ی تصادفی $\{X_1, \dots, X_n \}$ که از یک توزیع با میانگین $\mu$ و واریانس $\sigma^2$ به دست آمده‌اند را در نظر بگیرید؛ داریم:
 
 <br>
 
 **55. Estimating the variance**
 
-&#10230;
+تخمین واریانس
 
 <br>
 
 **56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
 
-&#10230;
+واریانس نمونه - واریانس نمونه‌ی یک نمونه‌ی تصادفی که برای تخمین مقدار واقعی واریانس $\sigma^2$ یک توزیع به کار می‌رود، معمولاً با $\s^2$ یا $\hat{\sigma}^2$ نمایش داده می‌شود و به صورت زیر تعریف می‌شود:
 
 <br>
 
 **57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
 
-&#10230;
+نکته: واریانس نمونه بدون پیش‌قدر است، یعنی $E[s^2] = \sigma^2$.
 
 <br>
 
 **58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
 
-&#10230;
+رابطه‌ی $\chi^2$ با واریانس نمونه - فرض کنید $s^2$ واریانس نمونه‌ی یک نمونه‌ی تصادفی باشد. داریم:
 
 <br>

From 09d7ded1f8a1d013b7996636e0ea5facfe9a6ca3 Mon Sep 17 00:00:00 2001
From: Erfan Noury <erfannoury@users.noreply.github.com>
Date: Wed, 19 Sep 2018 09:36:15 -0400
Subject: [PATCH 002/531] Linear Algebra Refresher Translation (#2)

---
 CONTRIBUTORS                   |   4 ++
 README.md                      |   2 +
 fa/refresher-linear-algebra.md | 106 ++++++++++++++++-----------------
 3 files changed, 59 insertions(+), 53 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 78dba6908..544ebf8c8 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -8,6 +8,10 @@
   Probability:
     Translation: Erfan Noury (https://github.com/erfannoury)
     Review: Mohammad Karimi (https://github.com/m-karimi)
+    
+  Linear Algebra:
+    Translation: Erfan Noury (https://github.com/erfannoury)
+    Review: Mohammad Karimi (https://github.com/erfannoury)
 --fr
 
 --he
diff --git a/README.md b/README.md
index 562477ef9..17dcc2927 100644
--- a/README.md
+++ b/README.md
@@ -8,6 +8,8 @@ This repository aims at collaboratively translating our [Machine Learning cheats
 | Deep learning | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
 | Supervised learning | 0% | 0% | 0% | 0% | 0% | 0% | 1% |
 | Unsupervised learning | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
+| Probabilities and Statistics | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
+| Linear algebra | 0% | 0% | 100% | 0% | 0% | 0% | 0% |
 | Probabilities and Statistics | 0% | 0% | 100% | 0% | 0% | 0% | 0% |
 | Linear algebra | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
 
diff --git a/fa/refresher-linear-algebra.md b/fa/refresher-linear-algebra.md
index a824025f7..36d741893 100644
--- a/fa/refresher-linear-algebra.md
+++ b/fa/refresher-linear-algebra.md
@@ -1,315 +1,315 @@
 **1. Linear Algebra and Calculus refresher**
 
-&#10230;
+یادآوری جبر خطی و حسابان
 
 <br>
 
 **2. General notations**
 
-&#10230;
+نمادها
 
 <br>
 
 **3. Definitions**
 
-&#10230;
+تعاریف
 
 <br>
 
 **4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
 
-&#10230;
+بردار - $x \in \mathbb{R}^n$ یک بردار با $n$ درایه است، که $x_i \in \mathbb{R}$ درایه‌ی $i$ام می‌باشد:
 
 <br>
 
 **5. Matrix ― We note A∈Rm×n a matrix with n rows and m, where Ai,j∈R is the entry located in the ith row and jth column:**
 
-&#10230;
+ماتریس - $A \in \mathbb{R} ^ {m \times n}$ یک بردار با $n$ سطر و $m$ ستون است، که در آن $A_{i, j} \in \mathbb{R}$ درایه‌ای است که در سطر $i$ام و ستون $j$ام قرار دارد:
 
 <br>
 
 **6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
 
-&#10230;
+نکته: بردار $x$ که در بالا تعریف شد را می‌توان به صورت یک ماتریس $n \times 1$ در نظر گرفت که به طور خاص به آن بردار ستونی گویند.
 
 <br>
 
 **7. Main matrices**
 
-&#10230;
+ماتریس‌های اصلی:
 
 <br>
 
 **8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
 
-&#10230;
+ماتریس همانی - ماتریس همانی $I \in \mathbb{R}^{n \times n}$ یک ماتریس مربعی است که درایه‌های قطری آن همه مقدار ۱ و بقیه‌ی درایه‌ها مقدار ۰ دارند:
 
 <br>
 
 **9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
 
-&#10230;
+نکته: برای همه‌ی ماتریس‌های $A \in \mathbb{R}^{n \times n}$ داریم $A \times I = I \times A = A$.
 
 <br>
 
 **10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
 
-&#10230;
+ماتریس قطری - ماتریس $D \in \mathbb{R} ^ {n \times n}$ یک ماتریس مربعی است که درایه‌های قطری آن مقادیر غیرصفر دارند و بقیه‌ی درایه‌ها صفر هستند:
 
 <br>
 
 **11. Remark: we also note D as diag(d1,...,dn).**
 
-&#10230;
+نکته:‌$D$ همچنین به صورت $\text{diag}(d_1, \dots, d_n)$ هم نمایش داده می‌شود.
 
 <br>
 
 **12. Matrix operations**
 
-&#10230;
+عملیات ماتریسی
 
 <br>
 
 **13. Multiplication**
 
-&#10230;
+ضرب
 
 <br>
 
 **14. Vector-vector ― There are two types of vector-vector products:**
 
-&#10230;
+بردار با بردار - دو نوع عملیات ضرب بردار با بردار وجود دارد:
 
 <br>
 
 **15. inner product: for x,y∈Rn, we have:**
 
-&#10230;
+ضرب داخلی: برای هر $x, y \in \mathbb{R}^n$ داریم:
 
 <br>
 
 **16. outer product: for x∈Rm,y∈Rn, we have:**
 
-&#10230;
+ضرب خارجی: برای هر $x \in \mathbb{R}^m$ و $y \in \mathbb{R}^n$ داریم:
 
 <br>
 
 **17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
 
-&#10230;
+ماتریس با بردار - ضرب ماتریس $A \in \mathbb{R}^{m \times n}$ و بردار $x \in \mathbb{R}^n$ برداری با اندازه‌ی $m$ است به طوری که:
 
 <br>
 
 **18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
 
-&#10230;
+که $a^T_{r, i} بردارهای سطری و $a_{c, j}$ بردارهای ستونی $A$، و $x_i$ درایه‌های $x$ هستند.
 
 <br>
 
 **19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
 
-&#10230;
+ماتریس با ماتریس - ضرب ماتریس‌های $A \in \mathbb{R}^{n \times m}$ و $B \in \mathbb{R}^{n \times p}$ ماتریسی با اندازه‌ی $n \times p$ است که:
 
 <br>
 
 **20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
 
-&#10230;
+که $a^T_{r, i}$ و $b^T_{r, i}$ بردارهای سطری و $a_{c, j}$ و b_{c, j}$ بردارهای ستونی $A$ و $B$ هستند.
 
 <br>
 
 **21. Other operations**
 
-&#10230;
+دیگر عملیات
 
 <br>
 
 **22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
 
-&#10230;
+ترانهاده - ترانهاده‌ی ماتریس $A \in \mathbb{R}^{m \times n}$ که با $A^T$ نمایش داده می‌شود، ماتریسی است که مکان درایه‌های آن نسبت به قطر ماتریس برعکس شده‌اند:
 
 <br>
 
 **23. Remark: for matrices A,B, we have (AB)T=BTAT**
 
-&#10230;
+نکته: برای ماتریس‌های $A$ و $B$، داریم $(AB)^T = B^T A^T$.
 
 <br>
 
 **24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
 
-&#10230;
+معکوس - معکوس یک ماتریس مربعی معکوس‌پذیر $A$ که با $A^{-1}$ نمایش داده می‌شود، تنها ماتریسی است که:
 
 <br>
 
 **25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
 
-&#10230;
+نکته: همه‌ی ماتریس‌های مربعی معکوس‌پذیر نیستند. همچنین، برای ماتریس‌های مربعی معکوس‌پذیر $A$ و $B$ داریم $(AB)^{-1} = B^{-1} A^{-1}$.
 
 <br>
 
 **26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
 
-&#10230;
+اثر - اثر ماتریس مربعی $A$ که با $tr(A)$ نمایش داده می‌شود، مجموع همه‌ی درایه‌های قطری ماتریس است.
 
 <br>
 
 **27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
 
-&#10230;
+نکته: برای ماتریس‌های $A$ و $B$ داریم $tr(A^T) = tr(A)$ و $tr(AB) = tr(BA)$.
 
 <br>
 
 **28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
 
-&#10230;
+دترمینان - دترمینان یک ماتریس مربعی $A \in \mathbb{R}^{n \times n}$ که با $|A|$ یا $\det(A)$ نمایش داده می‌شود، به صورت یک عبارت بازگشتی بر روی $A_{\\i, \\j}$، که ماتریس $A$ بدون سطر $i$-ام و ستون $j$-ام است، به صورت زیر تعریف می‌شود: 
 
 <br>
 
 **29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
 
-&#10230;
+نکته: $A$ معکوس‌پذیر است اگر و فقط اگر $|A| \neq 0$. همچنین $|A B| = |A| |B|$ و $|A^T| = |A|$.
 
 <br>
 
 **30. Matrix properties**
 
-&#10230;
+ویژگی‌های ماتریس‌ها
 
 <br>
 
 **31. Definitions**
 
-&#10230;
+تعاریف
 
 <br>
 
 **32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
 
-&#10230;
+تجزیه‌ی متقارن - یک ماتریس دلخواه $A$ را می‌توان با استفاده از اجزای متقارن و غیرمتقارن آن به صورت زیر نشان داد:
 
 <br>
 
 **33. [Symmetric, Antisymmetric]**
 
-&#10230;
+[متقارن، غیرمتقارن]
 
 <br>
 
 **34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
 
-&#10230;
+نرم - نرم تابع $N: \mathbb{V} \rightarrow [0, +\infty[$ است که $V$ یک فضای برداری است، و به گونه‌ای است که برای هر $x, y \in \mathbb{V}$ داریم:
 
 <br>
 
 **35. N(ax)=|a|N(x) for a scalar**
 
-&#10230;
+$N(a x) = |a| N(x)$ برای عدد اسکالر 
 
 <br>
 
 **36. if N(x)=0, then x=0**
 
-&#10230;
+اگر $N(x) = $ باشد در این صورت $x = 0$
 
 <br>
 
 **37. For x∈V, the most commonly used norms are summed up in the table below:**
 
-&#10230;
+برای $x \in \mathbb{V}$، نرم‌هایی که بیشتر استفاده می‌شوند در جدول زیر آمده‌اند:
 
 <br>
 
 **38. [Norm, Notation, Definition, Use case]**
 
-&#10230;
+[نُرم، نماد، تعریف، کاربرد]
 
 <br>
 
 **39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
 
-&#10230;
+وابستگی خطی - مجموعه‌ای از بردارها وابستگی خطی دارند اگر یکی از بردارهای مجموعه را بتوان به صورت ترکیب خطی دیگر بردارها تعریف کرد.
 
 <br>
 
 **40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
 
-&#10230;
+نکته: اگر نتوان هیچ برداری را به این شکل تعریف کرد، در این صورت بردارها استقلال خطی دارند.
 
 <br>
 
 **41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
 
-&#10230;
+رتبه ماتریس - رتبه‌ی یک ماتریس $A$ که با $\text{rank}(A)$ نمایش داده می‌شود، تعداد ابعاد فضایی است که توسط ستون‌های آن ایجاد می‌شود. این مقدار برابر است با حداکثر تعداد ستون‌های $A$ که استقلال خطی داشته باشند.
 
 <br>
 
 **42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
 
-&#10230;
+ماتریس مثبت نیمه‌معین - ماتریس $A \in \mathbb{R}^{n \times n}$ یک ماتریس مثبت نیمه‌معین است که با $A \succeq 0$ نمایش داده می‌شود اگر داشته باشیم:
 
 <br>
 
 **43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
 
-&#10230;
+نکته: به طور مشابه، یک ماتریس $A$ مثبت معین است ($A \succ 0$)، اگر یک ماتریس مثبت نیمه‌معین باشد که برای هر بردار غیرصفر $x$ داشته باشیم $x^T A > 0$.
 
 <br>
 
 **44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
-&#10230;
+مقدار ویژه، بردار ویژه - برای یک ماتریس $A \in \mathbb{R}^{n \times n}$، گوییم $\lambda$ یک مقدار ویژه ماتریس $A$ است اگر وجود داشته باشد بردار $z \in \mathbb{R}^n \\ \{0\}$، که یک بردار ویژه نام دارد، به طوری که:
 
 <br>
 
 **45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230;
+قضیه‌ی طیفی - فرض کنید $A \in \mathbb{R}^{n \times n}$ باشد. اگر $A$ متقارن باشد، در این صورت $A$ توسط یک ماتریس حقیقی متعامد $U \in \mathbb{R} ^{n \times n}$ قطری‌پذیر است. با نمایش $\Lambda = \diag(\lambda_1, \dots, \lambda_n)$ داریم:
 
 <br>
 
 **46. diagonal**
 
-&#10230;
+قطری
 
 <br>
 
 **47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
 
-&#10230;
+تجزیه‌ی مقدار منفرد - برای یک ماتریس $A$ با ابعاد $m \times n$، تجزیه‌ی مقدار منفرد یک تکنیک تقسیم‌بندی است که تضمین می‌کند یک ماتریس یکانی $U \in \mathbb{R}^{n \times n}$، یک ماتریس قطری $\Sigma \in \mathbb{R}^{m \times n}$، و یک ماتریس یکانی $V \in \mathbb{R}^{n \times n}$ وجود دارند، به طوری که:
 
 <br>
 
 **48. Matrix calculus**
 
-&#10230;
+حسابان ماتریسی
 
 <br>
 
 **49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
 
-&#10230;
+گرادیان - فرض کنید $f: \mathbb{R}^{m \times n} \rightarrow \mathbb{R}$ یک تابع و $A \in \mathbb{R}^{m \times n}$ یک ماتریس باشد. گرادیان $f$ نسبت به $A$ یک ماتریس با ابعاد $m \times n$ است و با $\nabla_A f(A)$ نمایش داده می‌شود، به طوری که:
 
 <br>
 
 **50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
 
-&#10230;
+نکته: گرادیان $f$ تنها زمانی تعریف شده است که $f$ تابعی باشد که یک عدد اسکالر خروجی دهد.
 
 <br>
 
 **51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
 
-&#10230;
+هسیان - فرض کنید $f: \mathbb{R}^n \rightarrow \mathbb{R}$ یک تابع و $x \in \mathbb{R}^n$ یک بردار باشد. هسیان $f$ نسبت به $x$ یک ماتریس متقارن با ابعاد $n \times n$ است و با $\nabla^2_x f(x)$ نمایش داده می‌شود، به طوری که:
 
 <br>
 
 **52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
 
-&#10230;
+نکته: هسیان تابع $f$ تنها زمانی تعریف شده است که $f$ تابعی با خروجی اسکالر باشد.
 
 <br>
 
 **53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
 
-&#10230;
+عملیات گرادیانی - برای ماتریس‌های $A$، $B$، و $C$، ویژگی‌های زیر را به خاطر داشته باشید:

From 29eae810ef28605159992fbbc922331c589a045f Mon Sep 17 00:00:00 2001
From: Erfan Noury <erfannoury@users.noreply.github.com>
Date: Wed, 19 Sep 2018 09:37:07 -0400
Subject: [PATCH 003/531] Fix the bug in merge conflict resolve

---
 README.md | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 17dcc2927..bd047629f 100644
--- a/README.md
+++ b/README.md
@@ -8,10 +8,8 @@ This repository aims at collaboratively translating our [Machine Learning cheats
 | Deep learning | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
 | Supervised learning | 0% | 0% | 0% | 0% | 0% | 0% | 1% |
 | Unsupervised learning | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
-| Probabilities and Statistics | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
-| Linear algebra | 0% | 0% | 100% | 0% | 0% | 0% | 0% |
 | Probabilities and Statistics | 0% | 0% | 100% | 0% | 0% | 0% | 0% |
-| Linear algebra | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
+| Linear algebra | 0% | 0% | 100% | 0% | 0% | 0% | 0% |
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|[हिन्दी](https://github.com/shervinea/cheatsheet-translation/tree/master/hi)|[ಕನ್ನಡ](https://github.com/shervinea/cheatsheet-translation/tree/master/kn)|[मराठी](https://github.com/shervinea/cheatsheet-translation/tree/master/mr)|[తెలుగు](https://github.com/shervinea/cheatsheet-translation/tree/master/te)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|

From db489c1165276e8c3ef3f4a74e6c9501ad46c122 Mon Sep 17 00:00:00 2001
From: Erfan Noury <erfannoury@users.noreply.github.com>
Date: Fri, 21 Sep 2018 13:32:07 -0400
Subject: [PATCH 004/531] Unsupervised Learning Translation (#4)

---
 CONTRIBUTORS                           |   4 +
 README.md                              |  14 ++--
 fa/cheatsheet-unsupervised-learning.md | 101 +++++++++++++------------
 3 files changed, 62 insertions(+), 57 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 544ebf8c8..f5dfe025e 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -5,6 +5,10 @@
 --es
 
 --fa
+  Unsupervised Learning:
+    Translation: Erfan Noury (https://github.com/erfannoury)
+    Review: Mohammad Karimi (https://github.com/erfannoury)
+
   Probability:
     Translation: Erfan Noury (https://github.com/erfannoury)
     Review: Mohammad Karimi (https://github.com/m-karimi)
diff --git a/README.md b/README.md
index bd047629f..5bed7ad1b 100644
--- a/README.md
+++ b/README.md
@@ -3,13 +3,13 @@
 This repository aims at collaboratively translating our [Machine Learning cheatsheets](https://github.com/afshinea/stanford-cs-229-machine-learning) into a ton of languages, so that this content can be enjoyed by anyone from any part of the world!
 
 ## Progression
-| Cheatsheet topic | [Deutsch](https://github.com/shervinea/cheatsheet-translation/tree/master/de) | [Español](https://github.com/shervinea/cheatsheet-translation/tree/master/es) | [فارسی](https://github.com/shervinea/cheatsheet-translation/tree/master/fa) | [Français](https://github.com/shervinea/cheatsheet-translation/tree/master/fr) | [日本語](https://github.com/shervinea/cheatsheet-translation/tree/master/ja) | [Português](https://github.com/shervinea/cheatsheet-translation/tree/master/pt) | [官话](https://github.com/shervinea/cheatsheet-translation/tree/master/zh) |
-|------------------------------|:-----------------------------------------------------------------------------:|:-----------------------------------------------------------------------------:|:---------------------------------------------------------------------------:|:------------------------------------------------------------------------------:|:----------------------------------------------------------------------------:|:-------------------------------------------------------------------------------:|:--------------------------------------------------------------------------:|
-| Deep learning | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
-| Supervised learning | 0% | 0% | 0% | 0% | 0% | 0% | 1% |
-| Unsupervised learning | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
-| Probabilities and Statistics | 0% | 0% | 100% | 0% | 0% | 0% | 0% |
-| Linear algebra | 0% | 0% | 100% | 0% | 0% | 0% | 0% |
+|Cheatsheet topic|[Deutsch](https://github.com/shervinea/cheatsheet-translation/tree/master/de)|[Español](https://github.com/shervinea/cheatsheet-translation/tree/master/es)|[فارسی](https://github.com/shervinea/cheatsheet-translation/tree/master/fa)|[Français](https://github.com/shervinea/cheatsheet-translation/tree/master/fr)|[日本語](https://github.com/shervinea/cheatsheet-translation/tree/master/ja)|[Português](https://github.com/shervinea/cheatsheet-translation/tree/master/pt)|[官话](https://github.com/shervinea/cheatsheet-translation/tree/master/zh)|
+|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+|Deep learning|0%|0%|0%|0%|0%|0%|0%|
+|Supervised learning|0%|0%|0%|0%|0%|0%|1%|
+|Unsupervised learning|0%|0%|100%|0%|0%|0%|0%|
+|Probabilities and Statistics|0%|0%|100%|0%|0%|0%|0%|
+|Linear algebra|0%|0%|100%|0%|0%|0%|0%|
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|[हिन्दी](https://github.com/shervinea/cheatsheet-translation/tree/master/hi)|[ಕನ್ನಡ](https://github.com/shervinea/cheatsheet-translation/tree/master/kn)|[मराठी](https://github.com/shervinea/cheatsheet-translation/tree/master/mr)|[తెలుగు](https://github.com/shervinea/cheatsheet-translation/tree/master/te)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
diff --git a/fa/cheatsheet-unsupervised-learning.md b/fa/cheatsheet-unsupervised-learning.md
index 5826ff44b..58d0d9e6a 100644
--- a/fa/cheatsheet-unsupervised-learning.md
+++ b/fa/cheatsheet-unsupervised-learning.md
@@ -1,299 +1,300 @@
 **1. Unsupervised Learning cheatsheet**
 
-&#10230;
+راهنمای کوتاه یادگیری بدون نظارت
 
 <br>
 
 **2. Introduction to Unsupervised Learning**
 
-&#10230;
+مبانی یادگیری بدون نظارت
 
 <br>
 
 **3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
 
-&#10230;
+انگیزه - هدف از یادگیری بدون نظارت کشف الگوهای پنهان در داده‌های بدون برچسب $\{x_1, \dots, x_m\}$ است.
 
 <br>
 
 **4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
 
-&#10230;
+نابرابری ینسن - فرض کنید $f$ تابعی محدب و $X$ یک متغیر تصادفی باشد. در این صورت نابرابری زیر را داریم:
 
 <br>
 
 **5. Clustering**
 
-&#10230;
+خوشه‌بندی
 
 <br>
 
 **6. Expectation-Maximization**
 
-&#10230;
+بیشینه‌سازی امید ریاضی
 
 <br>
 
 **7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
 
-&#10230;
+متغیرهای نهفته - متغیرهای نهفته متغیرهای پنهان یا مشاهده‌نشده‌ای هستند که مسائل تخمین را دشوار می‌کنند، و معمولاً با $z$ نمایش داده می‌شوند. شرایط معمول که در آن‌ها متغیرهای نهفته وجود دارند در زیر آمده‌اند:
 
 <br>
 
 **8. [Setting, Latent variable z, Comments]**
 
-&#10230;
+[موقعیت، متغیر نهفته‌ی $z$، توضیحات]
 
 <br>
 
 **9. [Mixture of k Gaussians, Factor analysis]**
 
-&#10230;
+[ترکیب $k$ توزیع گاوسی، تحلیل عامل]
 
 <br>
 
 **10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
 
-&#10230;
+الگوریتم - الگوریتم بیشینه‌سازی امید ریاضی روشی بهینه برای تخمین پارامتر $
+theta$ از طریق تخمین درستی بشینه در اختیار قرار می‌دهد. این کار با تکرار مرحله‌ی به دست آوردن یک کران پایین برای درستی (مرحله‌ی امید ریاضی) و همچنین بهینه‌سازی آن کران پایین (مرحله‌ی بیشینه‌سازی) طبق توضیح زیر انجام می‌شود:
 
 <br>
 
 **11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
 
-&#10230;
+مرحله‌ی امید ریاضی:‌احتمال پسین $Q_i(z(i))$ که هر نمونه داده $x(i)$ متعلق به خوشه‌ی $z(i)$ باشد به صورت زیر محاسبه می‌شود:
 
 <br>
 
 **12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
 
-&#10230;
+مرحله‌ی بیشینه‌سازی: با استفاده از احتمالات پسین $Q_i(z(i))$ به عنوان وزن‌های وابسته به خوشه‌ها برای نمونه‌های داده‌ی $x(i)$، مدل مربوط به هر کدام از خوشه‌ها، طبق توضیح زیر، دوباره تخمین زده می‌شوند: 
 
 <br>
 
 **13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
 
-&#10230;
+[مقداردهی اولیه‌ی توزیع‌های گاوسی، مرحله‌ی امید ریاضی، مرحله‌ی بیشینه‌سازی، هم‌گرایی]
 
 <br>
 
 **14. k-means clustering**
 
-&#10230;
+خوشه‌بندی $k$-میانگین
 
 <br>
 
 **15. We note c(i) the cluster of data point i and μj the center of cluster j.**
 
-&#10230;
+توجه کنید که $c(i)$ خوشه‌ی نمونه داده‌ی $i$ و $\mu_j$ مرکز خوشه‌ی $j$ است.
 
 <br>
 
 **16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
 
-&#10230;
+الگوریتم - بعد از مقداردهی اولیه‌ی تصادفی مراکز خوشه‌ها $\mu_1, \mu_2, \dots, \mu_k \in \mathbb{R}^n$، الگوریتم $k$-میانگین مراحل زیر را تا هم‌گرایی تکرار می‌کند:
 
 <br>
 
 **17. [Means initialization, Cluster assignment, Means update, Convergence]**
 
-&#10230;
+[مقداردهی اولیه‌ی میانگین‌ها، تخصیص خوشه، به‌روزرسانی میانگین‌ها، هم‌گرایی]
 
 <br>
 
 **18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
 
-&#10230;
+تابع اعوجاج - برای تشخیص اینکه الگوریتم به هم‌گرایی رسیده است، به تابع اعوجاج که به صورت زیر تعریف می‌شود رجوع می‌کنیم:
 
 <br>
 
 **19. Hierarchical clustering**
 
-&#10230;
+خوشه‌بندی سلسله‌مراتبی
 
 <br>
 
 **20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
 
-&#10230;
+الگوریتم - یک الگوریتم خوشه‌بندی سلسله‌مراتبی تجمعی است که خوشه‌های تودرتو را به صورت پی‌در‌پی ایجاد می‌کند.
 
 <br>
 
 **21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
 
-&#10230;
+انواع - انواع مختلفی الگوریتم خوشه‌بندی سلسله‌مراتبی وجود دارند که هر کدام به دنبال بهینه‌سازی توابع هدف مختلفی هستند، که در جدول زیر به اختصار آمده‌اند:
 
 <br>
 
 **22. [Ward linkage, Average linkage, Complete linkage]**
 
-&#10230;
+[پیوند بخشی، پیوند میانگین، پیوند کامل]
 
 <br>
 
 **23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
 
-&#10230;
+[کمینه‌کردن فاصله‌ی درونِ خوشه، کمینه‌کردن فاصله‌ی میانگین بین هر دو جفت خوشه، کمینه‌کردن حداکثر فاصله بین هر دو جفت خوشه]
 
 <br>
 
 **24. Clustering assessment metrics**
 
-&#10230;
+معیارهای ارزیابی خوشه‌بندی
 
 <br>
 
 **25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
 
-&#10230;
+در یک وضعیت یادگیری بدون نظارت، معمولاً ارزیابی یک مدل کار دشواری است، زیرا برخلاف حالت یادگیری نظارتی اطلاعاتی در مورد برچسب‌های حقیقی داده‌ها نداریم.
 
 <br>
 
 **26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
 
-&#10230;
+ضریب نیم‌رخ - با نمایش $a$ به عنوان میانگین فاصله‌ی یک نمونه با همه‌ی نمونه‌های دیگر در همان کلاس، و با نمایش $b$ به عنوان میانگین فاصله‌ی یک نمونه با همه‌ی نمونه‌های دیگر از نزدیک‌ترین خوشه، ضریب نیم‌رخ $s$ به صورت زیر تعریف می‌شود:
 
 <br>
 
 **27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
 
-&#10230;
+شاخص Calinski-Harabasz - با در نظر گرفتن $k$ به عنوان تعداد خوشه‌ها، ماتریس پراکندگی درون خوشه‌ای $B_k$ و ماتریس پراکندگی میان‌خوشه‌ای $W_k$ به صورت زیر تعریف می‌شوند:
 
 <br>
 
 **28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
 
-&#10230;
+شاخص Calinski-Harabasz $s(k)$ بیان می‌کند که یک مدل خوشه‌بندی چگونه خوشه‌های خود را مشخص می‌کند، به گونه‌ای که هر چقدر مقدار این شاخص بیشتر باشد، خوشه‌ها متراکم‌تر و از هم تفکیک‌یافته‌تر خواهند بود. این شاخص به صورت زیر تعریف می‌شود:
 
 <br>
 
 **29. Dimension reduction**
 
-&#10230;
+کاهش ابعاد
 
 <br>
 
 **30. Principal component analysis**
 
-&#10230;
+تحلیل مولفه‌های اصلی
 
 <br>
 
 **31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
 
-&#10230;
+روشی برای کاهش ابعاد است که جهت‌هایی را با حداکثر واریانس پیدا می‌کند تا داده‌ها را در آن جهت‌ها تصویر کند.
 
 <br>
 
 **32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
-&#10230;
+مقدار ویژه، بردار ویژه - برای ماتریس دلخواه $A \in \mathbb{R}^{n \times n}$، $\lambda$ مقدار ویژه‌ی ماتریس $A$ است اگر وجود داشته باشد بردار $z \in \mathbb{R}^n \\ \{0\}$ که به آن بردار ویژه می‌گویند، به طوری که:
 
 <br>
 
 **33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230;
+قضیه‌ی طیفی - فرض کنید $A \in \mathbb{R}^{n \times n}$ باشد. اگر $A$ متقارن باشد، در این صورت $A$ توسط یک ماتریس حقیقی متعامد $U \in \mathbb{R} ^{n \times n}$ قطری‌پذیر است. با نمایش $\Lambda = \diag(\lambda_1, \dots, \lambda_n)$ داریم:
 
 <br>
 
 **34. diagonal**
 
-&#10230;
+قطری
 
 <br>
 
 **35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
 
-&#10230;
+نکته: بردار ویژه‌ی متناظر با بزرگ‌ترین مقدار ویژه، بردار ویژه‌ی اصلی ماتریس $A$ نام دارد.
 
 <br>
 
 **36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
 dimensions by maximizing the variance of the data as follows:**
 
-&#10230;
+الگوریتم - رویه‌ی تحلیل مولفه‌های اصلی یک روش کاهش ابعاد است که داده‌ها را در فضای $k$-بعدی با بیشینه کردن واریانس داده‌ها، به صورت زیر تصویر می‌کند:
 
 <br>
 
 **37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
 
-&#10230;
+مرحله‌ی ۱: داده‌ها به گونه‌ای نرمال‌سازی می‌شوند که میانگین ۰ و انحراف معیار ۱ داشته باشند.
 
 <br>
 
 **38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
 
-&#10230;
+مرحله‌ی ۲: مقدار $\Sigma = \frac{1}{m} \sum_{i=1}^m x(i) x(i)^T \in \mathbb{R}^{n \times n}$، که ماتریسی متقارن با مقادیر ویژه‌ی حقیقی است محاسبه می‌شود. 
 
 <br>
 
 **39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
 
-&#10230;
+مرحله‌ی ۳: بردارهای $u_1, \dots, u_k \in \mathbb{R}^n$ که $k$ بردارهای ویژه‌ی اصلی متعامد $\Sigma$ هستند محاسبه می‌شوند. این بردارهای ویژه متناظر با $k$ مقدار ویژه با بزرگ‌ترین مقدار هستند.
 
 <br>
 
 **40. Step 4: Project the data on spanR(u1,...,uk).**
 
-&#10230;
+مرحله‌ی ۴: داده‌ها بر روی فضای $\text{span}_ {\mathbb{R}} (u_1, \dots, u_k)$ تصویر می‌شوند.
 
 <br>
 
 **41. This procedure maximizes the variance among all k-dimensional spaces.**
 
-&#10230;
+این رویه واریانس را در فضای $k$-بعدی به دست آمده بیشینه می‌کند.
 
 <br>
 
 **42. [Data in feature space, Find principal components, Data in principal components space]**
 
-&#10230;
+[داده‌ها در فضای ویژگی، پیدا کردن مولفه‌های اصلی، داده‌ها در فضای مولفه‌های اصلی]
 
 <br>
 
 **43. Independent component analysis**
 
-&#10230;
+تحلیل مولفه‌های مستقل
 
 <br>
 
 **44. It is a technique meant to find the underlying generating sources.**
 
-&#10230;
+روشی است که برای پیدا کردن منابع مولد داده به کار می‌رود.
 
 <br>
 
 **45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
 
-&#10230;
+فرضیه‌ها - فرض می‌کنیم که داده‌ی $x$ توسط بردار $n$-بعدی $s=(s_1, \dots, s_n)$ تولید شده است، که $s_i$ها متغیرهای تصادفی مستقل  هستند، و این تولید داده از طریق بردار منبع به وسیله‌ی یک ماتریس معکوس‌پذیر و ترکیب‌کننده‌ی $A$ به صورت زیر انجام می‌گیرد:
 
 <br>
 
 **46. The goal is to find the unmixing matrix W=A−1.**
 
-&#10230;
+هدف پیدا کردن ماتریس ضدترکیب $W=A^{-1}$ است.
 
 <br>
 
 **47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
 
-&#10230;
+الگوریتم تحلیل مولفه‌های مستقل Bell و Sejnowski - این الگوریتم ماتریس ضدترکیب $W$ را در مراحل زیر پیدا می‌کند:
 
 <br>
 
 **48. Write the probability of x=As=W−1s as:**
 
-&#10230;
+احتمال $x = As = W^{-1}s$ به صورت زیر نوشته می‌شود:
 
 <br>
 
 **49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
 
-&#10230;
+با نمایش تابع سیگموئید با $g$، لگاریتم درست‌نمایی با توجه به داده‌های $\{x(i), \in [1, m]\}$ به صورت زیر نوشته می‌شود:
 
 <br>
 
 **50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
 
-&#10230;
+بنابراین، رویه‌ی یادگیری گرادیان تصادفی افزایشی برای هر نمونه از داده‌های آموزش $x(i)$ به گونه‌ای است که برای به‌روزرسانی $W$ داریم:
 

From e7935cd7e2bb009b124069500dad5956a5cab73d Mon Sep 17 00:00:00 2001
From: Zaid Alyafeai <alyafey22@gmail.com>
Date: Sat, 20 Oct 2018 21:19:14 +0300
Subject: [PATCH 005/531] Update refresher-linear-algebra.md

---
 ar/refresher-linear-algebra.md | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/ar/refresher-linear-algebra.md b/ar/refresher-linear-algebra.md
index a6b440d1e..360bb13da 100644
--- a/ar/refresher-linear-algebra.md
+++ b/ar/refresher-linear-algebra.md
@@ -1,25 +1,29 @@
 **1. Linear Algebra and Calculus refresher**
 
-&#10230;
-
+<div dir="rtl">
+الجبر الخطي وحساب التفاضل والتكامل
+</div>
 <br>
 
 **2. General notations**
-
-&#10230;
+<div dir="rtl">
+الرموز العامة 
+</div> 
 
 <br>
 
 **3. Definitions**
 
-&#10230;
+<div dir="rtl">
+التعريفات  
+</div>
 
 <br>
 
 **4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
-
-&#10230;
-
+<div dir="rtl">
+بردار - $x \in \mathbb{R}^n$ یک بردار با $n$ درایه است، که $x_i \in \mathbb{R}$ درایه‌ی $i$ام می‌باشد:
+</div>
 <br>
 
 **5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**

From 117e566b84b9133513e41a3fa22e421ae341be2d Mon Sep 17 00:00:00 2001
From: Zaid Alyafeai <alyafey22@gmail.com>
Date: Sat, 20 Oct 2018 21:23:05 +0300
Subject: [PATCH 006/531] Update refresher-linear-algebra.md

---
 ar/refresher-linear-algebra.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ar/refresher-linear-algebra.md b/ar/refresher-linear-algebra.md
index 360bb13da..2db1d0cd4 100644
--- a/ar/refresher-linear-algebra.md
+++ b/ar/refresher-linear-algebra.md
@@ -22,7 +22,7 @@
 
 **4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
 <div dir="rtl">
-بردار - $x \in \mathbb{R}^n$ یک بردار با $n$ درایه است، که $x_i \in \mathbb{R}$ درایه‌ی $i$ام می‌باشد:
+  متجه- نرمز ل $x \in \mathbb{R^n}$ متجه يحتوي على $n$ مدخلات، حيث $x_i \in \mathbb{R}$  يعتبر المدخل رقم i . 
 </div>
 <br>
 

From eb25f62f0add442e1289e5d0d7b5e4ab0e5e1e1d Mon Sep 17 00:00:00 2001
From: Zaid Alyafeai <alyafey22@gmail.com>
Date: Sat, 20 Oct 2018 21:52:32 +0300
Subject: [PATCH 007/531] Update refresher-linear-algebra.md

---
 ar/refresher-linear-algebra.md | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/ar/refresher-linear-algebra.md b/ar/refresher-linear-algebra.md
index 2db1d0cd4..f2d5be6eb 100644
--- a/ar/refresher-linear-algebra.md
+++ b/ar/refresher-linear-algebra.md
@@ -28,38 +28,42 @@
 
 **5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
 
-&#10230;
+<div dir="rtl">
+  مصفوفة - نرمز ل $A \in \mathbb{R}^{m\times n}$ مصفوفة تحتوي على $m$ صفوف و $n$ أعمدة، حيث $A_{i,j}$  يرمز للمدخل في الصف i و العمود j 
+</div>
 
 <br>
 
 **6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
-
-&#10230;
+<div dir="rtl">
+ملاحظة : المتجه $x$ المعرف مسبقا يمكن اعتباره مصفوفة من الشكل $n \times 1$ والذي يتم تسميته ب مصفوفة من عمود واحد 
+</div>
 
 <br>
 
 **7. Main matrices**
 
-&#10230;
-
+<div dir="rtl">
+المصفوفات الأساسية
+</div>
 <br>
 
 **8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
-
-&#10230;
-
+<div dir="rtl">
+مصفوفة الوحدة - مصفوفة الوحدة $I \in \mathbb{R^{n\times n}$ تعتبر مصفوفة مربعة تحتوي على المدخل 1 في قطر المصفوفة و 0 في بقية المدخلات
+</div>
 <br>
 
 **9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
 
-&#10230;
-
+<div dir="rtl">
+ملاحظة : جميع المصفوفات من الشكل $A \in \mathbb{R^{n\times n}}$  فإن $A \times I = I \times A = A$.</div>
 <br>
 
 **10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
-
-&#10230;
-
+<div dir="rtl">
+مصفوفة قطرية - المصفوفة القطرية هي مصفوفة من الشكل $D \in \mathbb{R^{n\times n}}$  حيث أن جميع العناصر الواقعة خارج القطر الرئيسي تساوي الصفر والعناصر على القطر الرئيسي تحتوي أعداد لاتساوي الصفر.   
+</div>
 <br>
 
 **11. Remark: we also note D as diag(d1,...,dn).**

From d56d7bb4544afc798ff1a971502178712965f44f Mon Sep 17 00:00:00 2001
From: Zaid Alyafeai <alyafey22@gmail.com>
Date: Sat, 20 Oct 2018 21:52:47 +0300
Subject: [PATCH 008/531] Update refresher-linear-algebra.md


From b1967e25da3d28b44e7fcc8f2236ce50f5a7fe51 Mon Sep 17 00:00:00 2001
From: qunaieer <fsaq2000@gmail.com>
Date: Sat, 20 Oct 2018 23:20:02 +0300
Subject: [PATCH 009/531] Update ar/cheatsheet-supervised-learning.md

[ar] Supervised Learning
---
 ar/cheatsheet-supervised-learning.md | 47 ++++++++++++++--------------
 1 file changed, 24 insertions(+), 23 deletions(-)

diff --git a/ar/cheatsheet-supervised-learning.md b/ar/cheatsheet-supervised-learning.md
index a6b19ea1c..9ff4b7ee9 100644
--- a/ar/cheatsheet-supervised-learning.md
+++ b/ar/cheatsheet-supervised-learning.md
@@ -1,132 +1,133 @@
-**1. Supervised Learning cheatsheet**
+﻿**1. Supervised Learning cheatsheet**
 
-&#10230;
+مرجع سريع للتعلّم تحت الإشراف
 
 <br>
 
 **2. Introduction to Supervised Learning**
 
-&#10230;
+مقدمة للتعلّم تحت الإشراف
 
 <br>
 
 **3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
 
-&#10230;
+إذا كان لدينا مجموعة من نقاط البيانات {x(1),...,x(m)} مرتبطة بمجموعة مخرجات {y(1),...,y(m)}، نريد أن نبني نموذج تصنيف يتعلم كيف يتوقع y من x.
+
 
 <br>
 
 **4. Type of prediction ― The different types of predictive models are summed up in the table below:**
 
-&#10230;
+نوع التوقّع - أنواع نماذج التوقّع المختلفة موضحة في الجدول التالي:
 
 <br>
 
 **5. [Regression, Classifier, Outcome, Examples]**
 
-&#10230;
+[الارتباط (Regression)، التصنيف (Classification)، المُخرَج، أمثلة]
 
 <br>
 
 **6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
 
-&#10230;
+[مستمر، فئة، ارتباط خطّي (Linear regression)، ارتباط لوجستي (Logistic regression)، SVM، بايز البسيط (Naive Bayes)]
 
 <br>
 
 **7. Type of model ― The different models are summed up in the table below:**
 
-&#10230;
+نوع النموذج - أنواع النماذج المختلفة موضحة في الجدول التالي:
 
 <br>
 
 **8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
 
-&#10230;
+[النماذج التمييزية (Discriminative)، النماذج التوليدية (Generative)، الهدف، ماذا تتعلم، توضيح، أمثلة]
 
 <br>
 
 **9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
 
-&#10230;
+[التقدير المباشر لـ P(y|x)، تقدير P(x|y) ثم استنتاج P(y|x)، حدود القرار، التوزيع الاحتمالي للبيانات، الارتباط (Regression)، SVM، GDA، بايز البسيط (Naive Bayes)]
 
 <br>
 
 **10. Notations and general concepts**
 
-&#10230;
+تعريفات ومفاهيم أساسية
 
 <br>
 
 **11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
 
-&#10230;
+الفرضية (Hypothesis) - الفرضية، ويرمز لها بـ hθ، هي النموذج الذي نختاره. إذا كان لدينا المدخل x(i)، فإن المخرج الذي سيتوقعه النموذج هو hθ(x(i)).
 
 <br>
 
 **12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
 
-&#10230;
+دالة الفرق (Loss function) - دالة الفرق هي الدالة L:(z,y)∈R×Y⟼L(z,y)∈R التي تأخذ كمدخلات القيمة المتوقعة z والقيمة الحقيقية y وتعطينا الفرق بينهما. الجدول التالي يحتوي على بعض دوال الفرق المستخدمة بكثرة:
 
 <br>
 
 **13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
 
-&#10230;
+[مربع الخطأ الأصغر (Least squared error)، الفرق اللوجستي (Logistic loss)، الفرق المفصلي (Hinge loss)، Cross-entropy]
 
 <br>
 
 **14. [Linear regression, Logistic regression, SVM, Neural Network]**
 
-&#10230;
+[الارتباط الخطّي (Linear regression)، الارتباط اللوجستي (Logistic regression)، SVM، الشبكات العصبية (Neural Network)]
 
 <br>
 
 **15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
 
-&#10230;
+دالة التكلفة (Cost function) - دالة التكلفة J تستخدم عادة لتقييم أداء نموذج ما، ويتم تعريفها مع دالة الفرق L كالتالي:
 
 <br>
 
 **16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
 
-&#10230;
+الهبوط التفاضلي (Gradient descent) - لنعرّف معدل التعلّم α∈R، يمكن تعريف القانون الذي يتم تحديث خوارزمية الهبوط التفاضلي من خلاله باستخدام معدل التعلّم ودالة التكلفة J كالتالي:
 
 <br>
 
 **17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
 
-&#10230;
+ملاحظة: في الهبوط التفاضلي العشوائي (Stochastic gradient descent (SGD)) يتم تحديث المُعاملات (parameters) بناءاً على كل نقطة تدريب على حدة، بينما في الهبوط التفاضلي الحُزَمي (batch gradient descent) يتم تحديثها باستخدام حُزَم من نقاط التدريب.
 
 <br>
 
 **18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
 
-&#10230;
+الأرجحية (Likelihood) - تستخدم أرجحية النموذج L(θ)، حيث أن θ هي المعاملات، للبحث عن أفضل المُعاملات θ عن طريق تعظيم (maximizing) الأرجحية. عملياً يتم استخدام الأرجحية اللوغاريثمية (log-likelihood) ℓ(θ)=log(L(θ)) حيث أنها أسهل في التحسين (optimize). فيكون لدينا:
 
 <br>
 
 **19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
 
-&#10230;
+خوارزمية نيوتن (Newton's algorithm) - خوارزمية نيوتن هي طريقة حسابية للعثور على θ بحيث يكون ℓ′(θ)=0. قاعدة التحديث للخوارزمية كالتالي:
 
 <br>
 
 **20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
 
-&#10230;
+ملاحظة: هناك خوارزمية أعم وهي متعددة الأبعاد (multidimensional)، يطلق عليها خوارزمية نيوتن-رافسون (Newton-Raphson)، ويتم تحديثها عبر القانون التالي:
 
 <br>
 
 **21. Linear models**
 
-&#10230;
+النماذج الخطيّة (Linear models)
 
 <br>
 
 **22. Linear regression**
 
-&#10230;
+الارتباط الخطّي (Linear regression)
 
 <br>
 

From 089cc3799a27593b81c8702343a575a1296be50f Mon Sep 17 00:00:00 2001
From: Redouane Lguensat <Redouane.lguensat@univ-grenoble-alpes.fr>
Date: Sun, 21 Oct 2018 14:10:51 +0200
Subject: [PATCH 010/531] [ar] Unsupervised Learning

---
 ar/cheatsheet-unsupervised-learning.md | 44 ++++++++++++++++----------
 1 file changed, 28 insertions(+), 16 deletions(-)

diff --git a/ar/cheatsheet-unsupervised-learning.md b/ar/cheatsheet-unsupervised-learning.md
index 1d80c47b5..e47df827c 100644
--- a/ar/cheatsheet-unsupervised-learning.md
+++ b/ar/cheatsheet-unsupervised-learning.md
@@ -1,61 +1,73 @@
 **1. Unsupervised Learning cheatsheet**
 
-&#10230;
+<div dir="rtl">
+ورقة مراجعة للتعلم بدون إشراف
+</div>
 
 <br>
 
 **2. Introduction to Unsupervised Learning**
 
-&#10230;
+<div dir=\"rtl\">
+  مقدمة للتعلم بدون إشراف
+</div>
 
 <br>
 
 **3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
 
-&#10230;
+<div dir=\"rtl\"> 
+  {x(1),...,x(m)} الحافز ― الهدف من التعلم بدون إشراف هو إيجاد الأنماط الخفية في البيانات الغير موسومة 
+</div> 
 
 <br>
 
 **4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
 
-&#10230;
+<div dir="rtl">
+متباينة جينسن  ― لتكن f دالة محدبة و X متغير عشوائي. لدينا المتفاوتة التالية
+:
+</div>
 
 <br>
 
 **5. Clustering**
 
-&#10230;
-
+<div dir="rtl">
+  تجميع
+</div>
 <br>
 
 **6. Expectation-Maximization**
 
-&#10230;
-
+<div dir="rtl">
+تحقيق أقصى قدر للتوقع
+</div>
 <br>
 
 **7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
 
-&#10230;
-
+<div dir="rtl">
+المتغيرات الكامنة ― المتغيرات الكامنة هي متغيرات باطنية/غير معاينة تزيد من صعوبة مشاكل التقدير، غالبا ما ترمز بالحرف z. في مايلي الإعدادات الشائعة التي تحتوي على متغيرات كامنة.</div>
 <br>
 
 **8. [Setting, Latent variable z, Comments]**
 
-&#10230;
-
+<div dir="rtl">
+إعداد، متغير كامن z، تعاليق</div>
 <br>
 
 **9. [Mixture of k Gaussians, Factor analysis]**
 
-&#10230;
-
+<div dir="rtl">
+مزيج من k غاوسيات، تحليل العوامل </div>
 <br>
 
 **10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
 
-&#10230;
-
+<div dir="rtl">
+خوارزمية ― خوارزمية تحقيق أقصى قدر للتوقع هي عبارة عن طريقة فعالة لتقدير المعامل θ عبر تقدير الاحتمال الأرجح، و يتم ذلك بشكل تكراري حيث يتم إيجاد حد أدنى لدالة الإمكان (الخطوة M) ثم يتم استمثال ذلك الحد الأدنى (الخطوة E) كما يلي:
+</div>
 <br>
 
 **11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**

From 42ff4557b30a039f12c995f6790da1b35825491b Mon Sep 17 00:00:00 2001
From: Zaid Alyafeai <alyafey22@gmail.com>
Date: Sun, 21 Oct 2018 22:00:17 +0300
Subject: [PATCH 011/531] Update refresher-linear-algebra.md

---
 ar/refresher-linear-algebra.md | 136 ++++++++++++++++++++++-----------
 1 file changed, 90 insertions(+), 46 deletions(-)

diff --git a/ar/refresher-linear-algebra.md b/ar/refresher-linear-algebra.md
index f2d5be6eb..f5c2090e1 100644
--- a/ar/refresher-linear-algebra.md
+++ b/ar/refresher-linear-algebra.md
@@ -68,182 +68,226 @@
 
 **11. Remark: we also note D as diag(d1,...,dn).**
 
-&#10230;
+<div dir="rtl">
+ملاحظة: نرمز كذلك ل $D$ ب $text{diag}(d_1, \dots, d_n)\$ 
+</div>
 
 <br>
 
 **12. Matrix operations**
 
-&#10230;
+<div dir="rtl">
+ عمليات المصفوفات
+</div>
 
 <br>
 
 **13. Multiplication**
 
-&#10230;
+<div dir="rtl">
+  الضرب
+</div>
 
 <br>
 
 **14. Vector-vector ― There are two types of vector-vector products:**
 
-&#10230;
+<div dir="rtl">
+  متجه و متجه - هناك نوعين من الضرب ل متجه - متجه : 
+</div>
 
 <br>
 
 **15. inner product: for x,y∈Rn, we have:**
 
-&#10230;
+<div dir="rtl">
+  ضرب داخلي: ل $x,y \in \mathbb{R}^n$ نستنتج :
+</div>
 
 <br>
 
 **16. outer product: for x∈Rm,y∈Rn, we have:**
 
-&#10230;
+<div dir="rtl">
+  ضرب خارجي:  ل $x \in \mathbb{m}, y \in \mathbb{R}^n$ نستنتج : 
+</div>
 
 <br>
 
 **17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
 
-&#10230;
+<div dir="rtl">
+  مصفوفة - متجه : ضرب المصفوفة $A \in \mathbb{R}^{n\times m}$ والمتجه $x \in \mathbb{R}^n$ ينتجه متجه من الشكل $x \in \mathbb{R}^n$ حيث : 
+</div>
 
 <br>
 
 **18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
 
-&#10230;
+<div dir="rtl">
+  حيث $a^{T}_{r,i}$ يعتبر متجه الصفوف و $a_{c,j}$ يعتبر متجه الأعمدة ل $A$ كذلك $x_i$ يرمز لعناصر $x$.
+</div>
 
 <br>
 
 **19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
 
-&#10230;
+<div dir="rtl">
+  مصفوفة - مصفوفة - ضرب المصفوفة $A \in \mathbb{R}^{n \times m}$ و $A \in \mathbb{R}^{n \times p}$ ينتجه عنه المصفوفة $A \in \mathbb{R}^{n \times p}$ حيث أن : 
+</div>
 
 <br>
 
 **20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
 
-&#10230;
+<div dir="rtl">
+حيث $a^T_{r, i}$ و $b^T_{r, i}$ يعتبر متجه الصفوف $a_{c, j}$ و b_{c, j}$ متجه الأعمدة $A$ و $B$ على التوالي.
+</div>
 
 <br>
 
 **21. Other operations**
 
-&#10230;
+<div dir="rtl">
+  عمليات أخرى
+</div>
 
 <br>
 
 **22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
 
-&#10230;
+<div dir="rtl">
+  المنقول  - منقول المصفوفة$A \in \mathbb{R}^{m \times n}$ يرمز له ب $A^T$ حيث الصفوف يتم تبديلها مع الأعمدة : 
+</div>
 
 <br>
 
 **23. Remark: for matrices A,B, we have (AB)T=BTAT**
 
-&#10230;
-
+<div dir="rtl">
+   ملاحظة: لأي مصفوفتين $A$ و $B$، نستنتج $(AB)^T = B^T A^T$. 
+</div>
 <br>
 
 **24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
 
-&#10230;
-
+<div dir="rtl">
+   المعكوس - معكوس أي مصفوفة $A$ قابلة للعكس يرمز له ب $A^{-1}$ وتعتبر المعكوس المصفوفة الوحيدة التي لديها الخاصية التالية :
+</div>
 <br>
 
 **25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
 
-&#10230;
+<div dir="rtl">
+ملاحظة: ليس جميع المصفوفات يمكن أيجاد معكوس لها. كذلك لأي مصفوفتين $A$ و $B$ نستنتج $(AB)^{-1} = B^{-1} A^{-1}$.
+</div>
 
 <br>
 
 **26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
 
-&#10230;
-
+<div dir="rtl">
+أثر المصفوفة (trace) - أثر أي مصفوفة مربعة $A$ يرمز له ب $tr(A)$ يعتبر مجموع العناصر التي في القطر. 
+</div>
 <br>
 
 **27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
 
-&#10230;
-
+<div dir="rtl">
+ ملاحظة : لأي مصفوفتين $A$ و $B$ لدينا $tr(A^T) = tr(A)$ و $tr(AB) = tr(BA)$. 
+</div>
 <br>
 
 **28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
 
-&#10230;
-
+<div dir="rtl">
+المحدد - المحدد لأي مصفوفة مربعة من الشكل $A \in \mathbb{R}^{n \times n}$ يرمز له ب $|A|$ او $det(A)$يتم تعريفه بإستخدام $ِA_{\\i,\\j}$ والذي يعتبر المصفوفة $A$ مع حذف الصف $i$ والعمود $j$ كالتالي : 
+</div>
 <br>
 
 **29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
 
-&#10230;
-
+<div dir="rtl">
+ ملاحظة: $A$ يكون لديه معكوذ إذا وفقط إذا $\neq 0 |A|$. كذلك $|A B| = |A| |B|$ و $|A^T| = |A|$. 
+</div>
 <br>
 
 **30. Matrix properties**
 
-&#10230;
-
+<div dir="rtl">
+خواص المصفوفات
+</div>
 <br>
 
 **31. Definitions**
 
-&#10230;
-
+<div dir="rtl">
+التعريفات
+</div>
 <br>
 
 **32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
 
-&#10230;
-
+<div dir="rtl">
+  التفكيك المتماثل - المصفوفة $A$ يمكن التعبير عنها بإستخدام جزئين مثماثل وغير متماثل كالتالي : 
+</div>
 <br>
 
 **33. [Symmetric, Antisymmetric]**
 
-&#10230;
+<div dir="rtl">
+[متماثل، غير متماثل]
+</div>
 
 <br>
 
 **34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
 
-&#10230;
-
+<div dir="rtl">
+المعيار (norm) - المعيار يعتبر دالة $N: V \to [0, +\infity)$ حيث $V$ يعتبر فضاء متجه، حيث أن لكل $x,y \in V$ لدينا :
+</div>
 <br>
 
 **35. N(ax)=|a|N(x) for a scalar**
 
-&#10230;
-
+<div dir="rtl">
+لأي عدد $a$ فإن $N(ax) = |a| N(x)$
+</div>
 <br>
 
 **36. if N(x)=0, then x=0**
 
-&#10230;
-
+<div dir="rtl">
+$N(x) =0 \implies x = 0$
+</div>
 <br>
 
 **37. For x∈V, the most commonly used norms are summed up in the table below:**
 
-&#10230;
-
+<div dir="rtl">
+لأي $x \in V$ المعايير الأكثر إستخداماً ملخصة في الجدول التالي:
+</div>
 <br>
 
 **38. [Norm, Notation, Definition, Use case]**
 
-&#10230;
-
+<div dir="rtl">
+[المعيار، الرمز، التعريف، مثال للإستخدام]
+</div>
 <br>
 
 **39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
 
-&#10230;
-
+<div dir="rtl">
+الإتباع الخطي: مجموعة المتجهات تعتبر تابعة خطياً إذا وفقط إذا كل متجه يمكن كتابته بشكل خطي بإسخدام مجموعة من المتجهات الأخرى. 
+</div>
 <br>
 
 **40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
 
-&#10230;
-
+<div dir="rtl">
+ملاحظة: إذا لم يتحقق هذا الشرط فإنها تسمى مستقلة خطياً . 
+</div>
 <br>
 
 **41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**

From a7dd240b4e761f660b78cf4221528b43a772273d Mon Sep 17 00:00:00 2001
From: Zaid Alyafeai <alyafey22@gmail.com>
Date: Tue, 23 Oct 2018 00:32:32 +0300
Subject: [PATCH 012/531] Update refresher-linear-algebra.md

---
 ar/refresher-linear-algebra.md | 81 +++++++++++++++++++++-------------
 1 file changed, 51 insertions(+), 30 deletions(-)

diff --git a/ar/refresher-linear-algebra.md b/ar/refresher-linear-algebra.md
index f5c2090e1..a85edae94 100644
--- a/ar/refresher-linear-algebra.md
+++ b/ar/refresher-linear-algebra.md
@@ -292,100 +292,121 @@ $N(x) =0 \implies x = 0$
 
 **41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
 
-&#10230;
-
+<div dir="rtl">
+ رتبة المصفوفة (Rank) - رتبة المصفوفة $A$ يرمز له ب $text{rank}(A)\$ وهو يصف حجم الفضاء المتجهي الذي نتج من أعمدة المصفوفة. يمكن وصفه كذلك بأقصى عدد من أعمدة المصفوفة $A$ التي تمتلك خاصية أنها مستقلة خطياً. 
+</div>
 <br>
 
 **42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
 
-&#10230;
-
+<div dir="rtl">
+  مصفوفة شبه معرفة موجبة (positive semi definite) - هي مصفوفة   $A \in \mathbb{R}^{n \times n}$ تعتبر مصفوفة شبه معرفة موجبة (PSD) ويرمز لها بالرمز  $A \succed 0  $ إذا : 
+</div>
 <br>
 
 **43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
 
-&#10230;
-
+<div dir="rtl">
+  ملاحظة: المصفوفة $A$ تعتبر مصفوفة معرفة موجبة إذا $A \succ 0  $  وهي تعتبر مصفوفة (PSD) والتي تستوفي الشرط : لكل متجه غير الصفر $x$ حيث $x^TAx>0 $.
+</div>
 <br>
 
 **44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
-&#10230;
-
+<div dir="rtl">
+  القيم الذايتة (eigenvalue), المتجه الذاتي (eigenvector) - إذا كان لدينا مصفوفة $A \in \mathbb{R}^{n \times n}$، القيمة $\lambda$  تعتبر قيمة ذاتية للمصفوفة $A$ إذا وجد متجه $z \in \mathbb{R}^n \\ \{0\}$ يسمى متجه ذاتي حيث أن : 
+</div>
 <br>
 
 **45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230;
-
+<div dir="rtl">
+  النظرية الطيفية (spectral theorem) - نفرض $A \in \mathbb{R}^{n \times n}$ إذا كانت المصفوفة $A$ متماثلة فإن $A$ تعتبر مصفوفة قطرية بإستخدام مصفوفة  متعامدة (orthogonal) $U \in \mathbb{R} ^{n \times n}$ ويرمز لها بالرمز  $\Lambda = \diag(\lambda_1, \dots, \lambda_n)$ حيث أن:
+</div>
 <br>
 
 **46. diagonal**
 
-&#10230;
-
+<div dir="rtl">
+  قطرية 
+</div>
 <br>
 
 **47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
 
-&#10230;
-
+<div dir="rtl">
+  تفكيك القيمة المنفردة (singular value decomposition) : لأي مصفوفة $A$ من الشكل $n\times m$ ، تفكيك القيمة المنفردة (SVD) يعتبر طريقة تحليل تضمن وجود $U \in \mathbb{R}^{m \times m}$ , مصفوفة قطرية  $\Sigma \in \mathbb{R}^{m \times n}$ و $V \in \mathbb{R}^{n \times n}$ حيث أن : 
+</div>
 <br>
 
 **48. Matrix calculus**
 
-&#10230;
-
+<div dir="rtl">
+  حساب المصفوفات 
+</div>
 <br>
 
 **49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
 
-&#10230;
-
+<div dir="rtl">
+   المشتقة في فضاءات عالية (gradient) - افترض $f: \mathbb{R}^{m \times n} \rightarrow \mathbb{R}$ تعتبر دالة و $f: \mathbb{R}^{m \times n} \rightarrow \mathbb{R}$ تعتبر مصفوفة. المشتقة العليا ل $f$ بالنسبة ل $A$  يعتبر مصفوفة $n\times m$ يرمز له $nabla_A f(A)\$ حيث أن:
+</div>
 <br>
 
 **50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
 
-&#10230;
-
+<div dir="rtl">
+ملاحظة : المشتقة العليا معرفة فقط إذا كانت الدالة $f$ لديها مدى ضمن الأعداد الحقيقية.
+</div>
 <br>
 
 **51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
 
-&#10230;
-
+<div dir="rtl">
+هيشيان (Hessian) - افترض $f: \mathbb{R}^n \rightarrow \mathbb{R}$ تعتبر دالة و $x \in \mathbb{R}^n$ يعتبر متجه. الهيشيان ل $f$ بالنسبة ل $x$ تعتبر مصفوفة متماثلة من الشكل $n \times n$ يرمز لها بالرمز $nabla^2_x f(x)\$ حيثب أن : 
+</div>
 <br>
 
 **52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
 
-&#10230;
+<div dir="rtl">
+  ملاحظة : الهيشيان معرفة فقط إذا كانت الدالة $f$ لديها مدى ضمن الأعداد الحقيقية.
 
+</div>
 <br>
 
 **53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
 
-&#10230;
+<div dir="rtl">
+  الحساب في مشتقة الفضاءات العالية- لأي مصفوفات $A,B,C$ فإن الخواص التالية مهمة : 
 
+</div>
 <br>
 
 **54. [General notations, Definitions, Main matrices]**
 
-&#10230;
+<div dir="rtl">
+    [الرموز العامة، التعاريف، المصفوفات الرئيسية]
+</div>
 
 <br>
 
 **55. [Matrix operations, Multiplication, Other operations]**
 
-&#10230;
-
+<div dir="rtl">
+  [عمليات المصفوفات، الضرب، عمليات أخرى]
+</div>
 <br>
 
 **56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
 
-&#10230;
-
+<div dir="rtl">
+  [خواص المصفوفات، المعيار، قيمة ذاتية/متجه ذاتي، تفكيك القيمة المنفردة]
+</div>
 <br>
 
 **57. [Matrix calculus, Gradient, Hessian, Operations]**
 
-&#10230;
+<div dir="rtl">
+  [حساب المصفوفات، مشتقة الفضاءات العالية، الهيشيان، العمليات]
+</div>

From c6149959d15a33b50fd56841301e5f819a5076e0 Mon Sep 17 00:00:00 2001
From: Zaid Alyafeai <alyafey22@gmail.com>
Date: Tue, 23 Oct 2018 21:04:14 +0300
Subject: [PATCH 013/531] Check some typos

Add some English translations
---
 ar/refresher-linear-algebra.md | 43 +++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 21 deletions(-)

diff --git a/ar/refresher-linear-algebra.md b/ar/refresher-linear-algebra.md
index a85edae94..ca00694fb 100644
--- a/ar/refresher-linear-algebra.md
+++ b/ar/refresher-linear-algebra.md
@@ -1,7 +1,7 @@
 **1. Linear Algebra and Calculus refresher**
 
 <div dir="rtl">
-الجبر الخطي وحساب التفاضل والتكامل
+  ملخص عن الجبر الخطي
 </div>
 <br>
 
@@ -22,21 +22,21 @@
 
 **4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
 <div dir="rtl">
-  متجه- نرمز ل $x \in \mathbb{R^n}$ متجه يحتوي على $n$ مدخلات، حيث $x_i \in \mathbb{R}$  يعتبر المدخل رقم i . 
+  متجه (vector) - نرمز ل $x \in \mathbb{R^n}$ متجه يحتوي على $n$ مدخلات، حيث $x_i \in \mathbb{R}$  يعتبر المدخل رقم $i$ . 
 </div>
 <br>
 
 **5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
 
 <div dir="rtl">
-  مصفوفة - نرمز ل $A \in \mathbb{R}^{m\times n}$ مصفوفة تحتوي على $m$ صفوف و $n$ أعمدة، حيث $A_{i,j}$  يرمز للمدخل في الصف i و العمود j 
+ مصفوفة (Matrix) - نرمز ل ${A \in \mathbb{R}^{m\times n$ مصفوفة تحتوي على $m$ صفوف و $n$ أعمدة، حيث $A_{i,j}$  يرمز للمدخل في الصف$ i$ و العمود $j$  
 </div>
 
 <br>
 
 **6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
 <div dir="rtl">
-ملاحظة : المتجه $x$ المعرف مسبقا يمكن اعتباره مصفوفة من الشكل $n \times 1$ والذي يتم تسميته ب مصفوفة من عمود واحد 
+ملاحظة : المتجه $x$ المعرف مسبقا يمكن اعتباره مصفوفة من الشكل $n \times 1$ والذي يتم تسميته ب مصفوفة من عمود واحد.
 </div>
 
 <br>
@@ -50,28 +50,30 @@
 
 **8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
 <div dir="rtl">
-مصفوفة الوحدة - مصفوفة الوحدة $I \in \mathbb{R^{n\times n}$ تعتبر مصفوفة مربعة تحتوي على المدخل 1 في قطر المصفوفة و 0 في بقية المدخلات
+  مصفوفة الوحدة (Identity) - مصفوفة الوحدة $I \in \mathbb{R^{n\times n}$ تعتبر مصفوفة مربعة تحتوي على المدخل 1 في قطر المصفوفة و 0 في بقية المدخلات:
+
 </div>
 <br>
 
 **9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
 
 <div dir="rtl">
-ملاحظة : جميع المصفوفات من الشكل $A \in \mathbb{R^{n\times n}}$  فإن $A \times I = I \times A = A$.</div>
+ملاحظة : جميع المصفوفات من الشكل $A \in \mathbb{R^}{n\times n}$  فإن $A \times I = I \times A = A$.
+</div>
 <br>
 
 **10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
 <div dir="rtl">
-مصفوفة قطرية - المصفوفة القطرية هي مصفوفة من الشكل $D \in \mathbb{R^{n\times n}}$  حيث أن جميع العناصر الواقعة خارج القطر الرئيسي تساوي الصفر والعناصر على القطر الرئيسي تحتوي أعداد لاتساوي الصفر.   
+مصفوفة قطرية (diagonal) - المصفوفة القطرية هي مصفوفة من الشكل
+ $D \in \mathbb{R}^{n\times n}$  حيث أن جميع العناصر الواقعة خارج القطر الرئيسي تساوي الصفر والعناصر على القطر الرئيسي تحتوي أعداد لاتساوي الصفر.   
 </div>
 <br>
 
 **11. Remark: we also note D as diag(d1,...,dn).**
 
 <div dir="rtl">
-ملاحظة: نرمز كذلك ل $D$ ب $text{diag}(d_1, \dots, d_n)\$ 
+ملاحظة: نرمز كذلك ل $D$ ب $text{diag}(d_1, \dots, d_n)\$.
 </div>
-
 <br>
 
 **12. Matrix operations**
@@ -101,7 +103,7 @@
 **15. inner product: for x,y∈Rn, we have:**
 
 <div dir="rtl">
-  ضرب داخلي: ل $x,y \in \mathbb{R}^n$ نستنتج :
+  ضرب داخلي (inner product): ل $x,y \in \mathbb{R}^n$ نستنتج :
 </div>
 
 <br>
@@ -109,7 +111,7 @@
 **16. outer product: for x∈Rm,y∈Rn, we have:**
 
 <div dir="rtl">
-  ضرب خارجي:  ل $x \in \mathbb{m}, y \in \mathbb{R}^n$ نستنتج : 
+  ضرب خارجي (outer product):  ل $x \in \mathbb{m}, y \in \mathbb{R}^n$ نستنتج : 
 </div>
 
 <br>
@@ -119,7 +121,6 @@
 <div dir="rtl">
   مصفوفة - متجه : ضرب المصفوفة $A \in \mathbb{R}^{n\times m}$ والمتجه $x \in \mathbb{R}^n$ ينتجه متجه من الشكل $x \in \mathbb{R}^n$ حيث : 
 </div>
-
 <br>
 
 **18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
@@ -141,7 +142,7 @@
 **20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
 
 <div dir="rtl">
-حيث $a^T_{r, i}$ و $b^T_{r, i}$ يعتبر متجه الصفوف $a_{c, j}$ و b_{c, j}$ متجه الأعمدة $A$ و $B$ على التوالي.
+حيث $a^T_{r, i}$ و $b^T_{r, i}$ يعتبر متجه الصفوف $a_{c, j}$ و b_{c, j}$ متجه الأعمدة ل $A$ و $B$ على التوالي.
 </div>
 
 <br>
@@ -157,7 +158,7 @@
 **22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
 
 <div dir="rtl">
-  المنقول  - منقول المصفوفة$A \in \mathbb{R}^{m \times n}$ يرمز له ب $A^T$ حيث الصفوف يتم تبديلها مع الأعمدة : 
+  المنقول (Transpose) - منقول المصفوفة$A \in \mathbb{R}^{m \times n}$ يرمز له ب $A^T$ حيث الصفوف يتم تبديلها مع الأعمدة : 
 </div>
 
 <br>
@@ -172,7 +173,7 @@
 **24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
 
 <div dir="rtl">
-   المعكوس - معكوس أي مصفوفة $A$ قابلة للعكس يرمز له ب $A^{-1}$ وتعتبر المعكوس المصفوفة الوحيدة التي لديها الخاصية التالية :
+   المعكوس (Inverse)- معكوس أي مصفوفة $A$ قابلة للعكس (Invertible) يرمز له ب $A^{-1}$ ويعتبر المعكوس المصفوفة الوحيدة التي لديها الخاصية التالية :
 </div>
 <br>
 
@@ -187,7 +188,7 @@
 **26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
 
 <div dir="rtl">
-أثر المصفوفة (trace) - أثر أي مصفوفة مربعة $A$ يرمز له ب $tr(A)$ يعتبر مجموع العناصر التي في القطر. 
+أثر المصفوفة (Trace) - أثر أي مصفوفة مربعة $A$ يرمز له ب $tr(A)$ يعتبر مجموع العناصر التي في القطر:
 </div>
 <br>
 
@@ -201,7 +202,7 @@
 **28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
 
 <div dir="rtl">
-المحدد - المحدد لأي مصفوفة مربعة من الشكل $A \in \mathbb{R}^{n \times n}$ يرمز له ب $|A|$ او $det(A)$يتم تعريفه بإستخدام $ِA_{\\i,\\j}$ والذي يعتبر المصفوفة $A$ مع حذف الصف $i$ والعمود $j$ كالتالي : 
+المحدد (Determinant) - المحدد لأي مصفوفة مربعة من الشكل $A \in \mathbb{R}^{n \times n}$ يرمز له ب $|A|$ او $det(A)$يتم تعريفه بإستخدام $ِA_{\\i,\\j}$ والذي يعتبر المصفوفة $A$ مع حذف الصف $i$ والعمود $j$ كالتالي : 
 </div>
 <br>
 
@@ -229,7 +230,7 @@
 **32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
 
 <div dir="rtl">
-  التفكيك المتماثل - المصفوفة $A$ يمكن التعبير عنها بإستخدام جزئين مثماثل وغير متماثل كالتالي : 
+  التفكيك المتماثل (Symmetric Decomposition)- المصفوفة $A$ يمكن التعبير عنها بإستخدام جزئين مثماثل (Symmetric) وغير متماثل(Antisymmetric) كالتالي : 
 </div>
 <br>
 
@@ -244,7 +245,7 @@
 **34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
 
 <div dir="rtl">
-المعيار (norm) - المعيار يعتبر دالة $N: V \to [0, +\infity)$ حيث $V$ يعتبر فضاء متجه، حيث أن لكل $x,y \in V$ لدينا :
+المعيار (Norm) - المعيار يعتبر دالة $N: V \to [0, +\infity)$ حيث $V$ يعتبر فضاء متجه (Vector Space)، حيث أن لكل $x,y \in V$ لدينا :
 </div>
 <br>
 
@@ -279,7 +280,7 @@ $N(x) =0 \implies x = 0$
 **39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
 
 <div dir="rtl">
-الإتباع الخطي: مجموعة المتجهات تعتبر تابعة خطياً إذا وفقط إذا كل متجه يمكن كتابته بشكل خطي بإسخدام مجموعة من المتجهات الأخرى. 
+الإتباع الخطي (Linear Dependence): مجموعة المتجهات تعتبر تابعة خطياً إذا وفقط إذا كل متجه يمكن كتابته بشكل خطي بإسخدام مجموعة من المتجهات الأخرى. 
 </div>
 <br>
 
@@ -300,7 +301,7 @@ $N(x) =0 \implies x = 0$
 **42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
 
 <div dir="rtl">
-  مصفوفة شبه معرفة موجبة (positive semi definite) - هي مصفوفة   $A \in \mathbb{R}^{n \times n}$ تعتبر مصفوفة شبه معرفة موجبة (PSD) ويرمز لها بالرمز  $A \succed 0  $ إذا : 
+  مصفوفة شبه معرفة موجبة (Positive semi-definite) - المصفوفة  $A \in \mathbb{R}^{n \times n}$ تعتبر مصفوفة شبه معرفة موجبة (PSD) ويرمز لها بالرمز  $A \succed 0  $ إذا : 
 </div>
 <br>
 

From 303f33d1dc798749175422db487ca7f3d6170d3e Mon Sep 17 00:00:00 2001
From: Zaid Alyafeai <alyafey22@gmail.com>
Date: Tue, 23 Oct 2018 21:10:16 +0300
Subject: [PATCH 014/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index afd1d1f12..e9adef78d 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -1,4 +1,5 @@
 --ar
+Zaid Alyafeai (translation fo linear algebra)
 
 --de
 

From e3be0da13a14851d51bbf71a8b4cdce901183ce2 Mon Sep 17 00:00:00 2001
From: Redouane Lguensat <Redouane.lguensat@univ-grenoble-alpes.fr>
Date: Wed, 24 Oct 2018 16:18:54 +0200
Subject: [PATCH 015/531] Update cheatsheet-unsupervised-learning.md

---
 ar/cheatsheet-unsupervised-learning.md | 27 +++++++++++++++-----------
 1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/ar/cheatsheet-unsupervised-learning.md b/ar/cheatsheet-unsupervised-learning.md
index e47df827c..8c91dabd3 100644
--- a/ar/cheatsheet-unsupervised-learning.md
+++ b/ar/cheatsheet-unsupervised-learning.md
@@ -17,7 +17,7 @@
 **3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
 
 <div dir=\"rtl\"> 
-  {x(1),...,x(m)} الحافز ― الهدف من التعلم بدون إشراف هو إيجاد الأنماط الخفية في البيانات الغير موسومة 
+  {x(1),...,x(m)} الحافز ― الهدف من التعلم بدون إشراف هو إيجاد الأنماط الخفية في البيانات غير الموسومة 
 </div> 
 
 <br>
@@ -72,32 +72,37 @@
 
 **11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
 
-&#10230;
-
+<div dir="rtl">
+الخطوة E : حساب الاحتمال البعدي Qi(z(i)) بأن تصدر كل نقطة x(i) من التجمع z(i) كما يلي:
+</div>
 <br>
 
 **12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
 
-&#10230;
-
+<div dir="rtl">
+الخطوة M : يتم استعمال الاحتمالات البعدية Qi(z(i)) كأثقال خاصة لكل تجمع على النقط x(i) ، لكي يتم تقدير نموذج لكل تجمع بشكل منفصل، و ذلك كما يلي: 
+</div>
 <br>
 
 **13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
 
-&#10230;
-
+<div dir="rtl">
+[تهيئة غاوسية، خطوة التوقع، خطوة التعظيم، التقاء]
+</div>
 <br>
 
 **14. k-means clustering**
 
-&#10230;
-
+<div dir="rtl">
+تجميع k-أوساط
+</div>
 <br>
 
 **15. We note c(i) the cluster of data point i and μj the center of cluster j.**
 
-&#10230;
-
+<div dir="rtl">
+نرمز تجمع النقط i ب c(i) ، و نرمز ب μj  j مركز التجمع
+</div>
 <br>
 
 **16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**

From f38c788454c498f4c2f49678d847ebbe7638726a Mon Sep 17 00:00:00 2001
From: qunaieer <fsaq2000@gmail.com>
Date: Wed, 24 Oct 2018 22:57:26 +0300
Subject: [PATCH 016/531] Update ar/cheatsheet-supervised-learning.md

up to line 40
---
 ar/cheatsheet-supervised-learning.md | 38 ++++++++++++++--------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/ar/cheatsheet-supervised-learning.md b/ar/cheatsheet-supervised-learning.md
index 9ff4b7ee9..4e5b2c2cd 100644
--- a/ar/cheatsheet-supervised-learning.md
+++ b/ar/cheatsheet-supervised-learning.md
@@ -133,115 +133,115 @@
 
 **23. We assume here that y|x;θ∼N(μ,σ2)**
 
-&#10230;
+هنا نفترض أن y|x;θ∼N(μ,σ2)
 
 <br>
 
 **24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
 
-&#10230;
+معادلة Normal - إذا كان لدينا المصفوفة X، القيمة θ التي تقلل من دالة التكلفة يمكن حلها رياضياً بشكل مغلق (closed-form) عن طريق:
 
 <br>
 
 **25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
 
-&#10230;
+خوارزمية LMS - إذا كان لدينا معدل التعلّم α، فإن قانون التحديث لخوارزمية معدل المربعات الأصغر (Least Mean Squares (LMS)) لمجموعة بيانات من m عينة، ويطلق عليه قانون تعلم ويدرو-هوف (Widrow-Hoff)، كالتالي:
 
 <br>
 
 **26. Remark: the update rule is a particular case of the gradient ascent.**
 
-&#10230;
+ملاحظة: قانون التحديث هذا يعتبر حالة خاصة من الهبوط التفاضلي (Gradient descent).
 
 <br>
 
 **27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
 
-&#10230;
+LWR - الارتباط الموزون محلّياً (Locally Weighted Regression)، ويعرف بـ LWR، هو نوع من الارتباط الخطي يَزِن كل عينة تدريب أثناء حساب دالة التكلفة باستخدام w(i)(x)، التي يمكن تعريفها باستخدام المعامل τ∈R كالتالي:
 
 <br>
 
 **28. Classification and logistic regression**
 
-&#10230;
+التصنيف والارتباط اللوجستي
 
 <br>
 
 **29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
 
-&#10230;
+دالة سيجمويد (Sigmoid) - دالة سيجمويد g، وتعرف كذلك بالدالة اللوجستية، تعرّف كالتالي:
 
 <br>
 
 **30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
 
-&#10230;
+الارتباط اللوجستي (Logistic regression) - نفترض هنا أن  y|x;θ∼Bernoulli(ϕ). فيكون لدينا:
 
 <br>
 
 **31. Remark: there is no closed form solution for the case of logistic regressions.**
 
-&#10230;
+ملاحظة: ليس هناك حل رياضي مغلق للارتباط اللوجستي.
 
 <br>
 
 **32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
 
-&#10230;
+Softmax regression - ويطلق عليه الارتباط اللوجستي متعدد الفئات (multiclass logistic regression)، يستخدم لتعميم الارتباط اللوجستي إذا كان لدينا أكثر من فئتين. في العرف يتم تعيين θK=0، بحيث تجعل معامل بيرنوللي (Bernoulli) ϕi لكل فئة i يساوي:
 
 <br>
 
 **33. Generalized Linear Models**
 
-&#10230;
+النماذج الخطية العامة (Generalized Linear Models)
 
 <br>
 
 **34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
 
-&#10230;
+العائلة الأسيّة (Exponential family) - يطلق على صنف من التوزيعات (distributions) بأنها تنتمي إلى العائلة الأسيّة إذا كان يمكن كتابتها ###########
 
 <br>
 
 **35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
 
-&#10230;
+ملاحظة: كثيراً ما سيكون T(y)=y. كذلك فإن exp(−a(η)) يمكن أن تفسر كمُعامل تسوية (normalization) للتأكد من أن الاحتمالات يكون حاصل جمعها واحد.
 
 <br>
 
 **36. Here are the most common exponential distributions summed up in the following table:**
 
-&#10230;
+أكثر التوزيعات الأسيّة استخداماً تم تلخيصها في الجدول التالي:
 
 <br>
 
 **37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
 
-&#10230;
+[التوزيع، بيرنوللي (Bernoulli)، Gaussian، Poisson، Geometric]
 
 <br>
 
 **38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
 
-&#10230;
+افتراضات GLMs - تهدف النماذج الخطيّة العامة (GLM) إلى توقع القيمة العشوائية y كدالة لـ x∈Rn+1، وتستند إلى ثلاثة افتراضات:
 
 <br>
 
 **39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
 
-&#10230;
+ملاحظة: المربعات الصغرى (least squares) الاعتيادية و الارتباط اللوجستي يعتبران من الحالات الخاصة للنماذج الخطيّة العامة.
 
 <br>
 
 **40. Support Vector Machines**
 
-&#10230;
+Support Vector Machines
 
 <br>
 
 **41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
 
-&#10230;
+تهدف Support Vector Machines إلى العثور على الخط الذي يعظم المسافة الدنيا إلى الخط:
 
 <br>
 

From 067e0710be41573764e1d443d8b7e492b068ec96 Mon Sep 17 00:00:00 2001
From: wooil <ssnn1541@naver.com>
Date: Thu, 25 Oct 2018 23:30:59 +0900
Subject: [PATCH 017/531] Update
 ko/cheatsheet-machine-learning-tips-and-tricks.md

---
 ...tsheet-machine-learning-tips-and-tricks.md | 285 ++++++++++++++++++
 1 file changed, 285 insertions(+)
 create mode 100644 ko/cheatsheet-machine-learning-tips-and-tricks.md

diff --git a/ko/cheatsheet-machine-learning-tips-and-tricks.md b/ko/cheatsheet-machine-learning-tips-and-tricks.md
new file mode 100644
index 000000000..b32415a20
--- /dev/null
+++ b/ko/cheatsheet-machine-learning-tips-and-tricks.md
@@ -0,0 +1,285 @@
+﻿**1. Machine Learning tips and tricks cheatsheet**
+
+&#10230;머신러닝 팁과 트릭 치트시트
+
+<br>
+
+**2. Classification metrics**
+
+&#10230;분류 측정 항목
+
+<br>
+
+**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
+
+&#10230;이진 분류 상황에서 모델의 성능을 평가하기 위해 눈 여겨 봐야하는 주요 측정 항목이 여기에 있다.
+
+<br>
+
+**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
+
+&#10230;혼동 행렬 ― 혼동 행렬은 모델의 성능을 평가할 때, 큰 그림을 보기위해 사용된다. 이는 다음과 같이 정의된다.. 
+
+<br>
+
+**5. [Predicted class, Actual class]**
+
+&#10230;[예측된 클래스, 실제 클래스]
+
+<br>
+
+**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
+
+&#10230;주요 측정 항목들 ― 다음 측정 항목들은 주로 분류 모델의 성능을 평가할 때 사용된다.
+
+<br>
+
+**7. [Metric, Formula, Interpretation]**
+
+&#10230;[측정 항목, 공식, 해석]
+
+<br>
+
+**8. Overall performance of model**
+
+&#10230;전반적인 모델의 성능
+
+<br>
+
+**9. How accurate the positive predictions are**
+
+&#10230;예측된 양성이 정확한 정도
+
+<br>
+
+**10. Coverage of actual positive sample**
+
+&#10230;실제 양성의 예측 정도
+
+<br>
+
+**11. Coverage of actual negative sample**
+
+&#10230;실제 음성의 예측 정도
+
+<br>
+
+**12. Hybrid metric useful for unbalanced classes**
+
+&#10230;불균형 클래스에 유용한 하이브리드 측정 항목
+
+<br>
+
+**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
+
+&#10230;ROC(Receiver Operating Curve) ― ROC 곡선은 임계값의 변화에 따른 TPR 대 FPR의 플롯이다. 이 측정 항목은 아래 표에 요약되어 있다:
+
+<br>
+
+**14. [Metric, Formula, Equivalent]**
+
+&#10230;[측정 항목, 공식, 같은 측도]
+
+<br>
+
+**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
+
+&#10230;AUC(Area Under the receiving operating Curve) ― AUC 또는 AUROC라고도 하는 이 측정 항목은 다음 그림과 같이 ROC 곡선 아래의 영역이다:
+
+<br>
+
+**16. [Actual, Predicted]**
+
+&#10230;[실제값, 예측된 값]
+
+<br>
+
+**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
+
+&#10230;기본 측정 항목 ― 회귀 모델 f가 주어졌을때, 다음의 측정 항목들은 모델의 성능을 평가할 때 주로 사용된다:
+
+<br>
+
+**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
+
+&#10230;[총 제곱합, 설명된 제곱합, 잔차 제곱합]
+
+<br>
+
+**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
+
+&#10230;결정 계수 ― 종종 R2 또는 r2로 표시되는 결정 계수는 관측된 결과가 모델에 의해 얼마나 잘 재현되는지를 측정하는 측도로서 다음과 같이 정의된다:
+
+<br>
+
+**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
+
+&#10230;주요 측정 항목들 ― 다음 측정 항목들은 주로 변수의 수를 고려하여 회귀 모델의 성능을 평가할 때 사용된다:
+
+<br>
+
+**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
+
+&#10230;여기서 L은 가능도이고 ^σ2는 각각의 반응과 관련된 분산의 추정값이다.
+
+<br>
+
+**22. Model selection**
+
+&#10230;모델 선택
+
+<br>
+
+**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230;어휘 ― 모델을 선택할 때 우리는 다음과 같이 가지고 있는 데이터를 세 부분으로 구분한다:
+
+<br>
+
+**24. [Training set, Validation set, Testing set]**
+
+&#10230;[학습 세트, 검증 세트, 테스트 세트]
+
+<br>
+
+**25. [Model is trained, Model is assessed, Model gives predictions]**
+
+&#10230;[모델 훈련, 모델 평가, 모델 예측]
+
+<br>
+
+**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
+
+&#10230;[주로 데이터 세트의 80%, 주로 데이터 세트의 20%]
+
+<br>
+
+**27. [Also called hold-out or development set, Unseen data]**
+
+&#10230;[홀드아웃 또는 개발 세트라고도하는, 보지 않은 데이터]
+
+<br>
+
+**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+&#10230;모델이 선택되면 전체 데이터 세트에 대해 학습을 하고 보지 않은 데이터에서 테스트한다. 이는 아래 그림에 나타나있다.
+
+<br>
+
+**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
+
+&#10230;교차-검증 ― CV라고도하는 교차-검증은 초기의 학습 세트에 지나치게 의존하지 않는 모델을 선택하는데 사용되는 방법이다. 다양한 유형이 아래 표에 요약되어 있다:
+
+<br>
+
+**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
+
+&#10230;[k-1 폴드에 대한 학습과 나머지 1폴드에 대한 평가, n-p개 관측치에 대한 학습과 나머지 p개 관측치에 대한 평가]
+
+<br>
+
+**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
+
+&#10230;[일반적으로 k=5 또는 10, p=1인 케이스는 leave-one-out]
+
+<br>
+
+**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
+
+&#10230;가장 일반적으로 사용되는 방법은 k-폴드 교차-검증이라고하며 이는 학습 데이터를 k개의 폴드로 분할하고, 그 중 k-1개의 폴드로 모델을 학습하는 동시에 나머지 1개의 폴드로 모델을 검증한다. 이 작업을 k번 수행한다. 오류는 k 폴드에 대해 평균화되고 교차-검증 오류라고 부른다. 
+
+<br>
+
+**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230;정규화 ― 정규화 절차는 데이터에 대한 모델의 과적합을 피하고 분산이 커지는 문제를 처리하는 것을 목표로 한다. 다음의 표는 일반적으로 사용되는 정규화 기법의 여러 유형을 요약한 것이다:
+
+<br>
+
+**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230;[계수를 0으로 축소, 변수 선택에 좋음, 계수를 작게 함, 변수 선택과 작은 계수 간의 트래이드오프]
+
+<br>
+
+**35. Diagnostics**
+
+&#10230;진단
+
+<br>
+
+**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
+
+&#10230;편향 ― 모델의 편향은 기대되는 예측과 주어진 데이터 포인트에 대해 예측하려고하는 올바른 모델 간의 차이이다.
+
+<br>
+
+**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
+
+&#10230;분산 ― 모델의 분산은 주어진 데이터 포인트에 대한 모델 예측의 가변성이다.
+
+<br>
+
+**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
+
+&#10230;편향/분산 트래이드오프 ― 모델이 간단할수록 편향이 높아지고 모델이 복잡할수록 분산이 커진다.
+
+<br>
+
+**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
+
+&#10230;[증상, 회귀 일러스트레이션, 분류 일러스트레이션, 딥러닝 일러스트레이션, 가능한 처리방법]
+
+<br>
+
+**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
+
+&#10230;[높은 학습 오류, 테스트 오류에 가까운 학습 오류, 높은 편향, 테스트 에러 보다 약간 낮은 학습 오류, 매우 낮은 학습 오류, 테스트 오류보다 훨씬 낮은 학습 오류, 높은 분산]
+
+<br>
+
+**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
+
+&#10230;[모델 복잡화, 특징 추가, 학습 증대, 정규화 수행, 추가 데이터 수집]
+
+<br>
+
+**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
+
+&#10230;오류 분석 ― 오류 분석은 현재 모델과 완벽한 모델 간의 성능 차이의 근본 원인을 분석한다.
+
+<br>
+
+**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
+
+&#10230;애블러티브 분석 ― 애블러티브 분석은 현재 모델과 베이스라인 모델 간의 성능 차이의 근본 원인을 분석한다.
+
+<br>
+
+**44. Regression metrics**
+
+&#10230;회귀 측정 항목
+
+<br>
+
+**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
+
+&#10230;[분류 측정 항목, 혼동 행렬, 정확도, 정밀도, 리콜, F1 스코어, ROC]
+
+<br>
+
+**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
+
+&#10230;[회귀 측정 항목, R 스퀘어, 맬로우의 CP, AIC, BIC]
+
+<br>
+
+**47. [Model selection, cross-validation, regularization]**
+
+&#10230;[모델 선택, 교차-검증, 정규화]
+
+<br>
+
+**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
+
+&#10230;[진단, 편향/분산 트래이드오프, 오류/애블러티브 분석]

From eeeefa06728bcf7561fa4fca99d974e7193b7562 Mon Sep 17 00:00:00 2001
From: Wooil <38076110+WooilJeong@users.noreply.github.com>
Date: Thu, 25 Oct 2018 23:45:59 +0900
Subject: [PATCH 018/531] Update cheatsheet-machine-learning-tips-and-tricks.md

Changed existing expression to honorific expression.
---
 ...tsheet-machine-learning-tips-and-tricks.md | 38 +++++++++----------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/ko/cheatsheet-machine-learning-tips-and-tricks.md b/ko/cheatsheet-machine-learning-tips-and-tricks.md
index b32415a20..d6732e145 100644
--- a/ko/cheatsheet-machine-learning-tips-and-tricks.md
+++ b/ko/cheatsheet-machine-learning-tips-and-tricks.md
@@ -12,13 +12,13 @@
 
 **3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
 
-&#10230;이진 분류 상황에서 모델의 성능을 평가하기 위해 눈 여겨 봐야하는 주요 측정 항목이 여기에 있다.
+&#10230;이진 분류 상황에서 모델의 성능을 평가하기 위해 눈 여겨 봐야하는 주요 측정 항목이 여기에 있습니다.
 
 <br>
 
 **4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
 
-&#10230;혼동 행렬 ― 혼동 행렬은 모델의 성능을 평가할 때, 큰 그림을 보기위해 사용된다. 이는 다음과 같이 정의된다.. 
+&#10230;혼동 행렬 ― 혼동 행렬은 모델의 성능을 평가할 때, 보다 큰 그림을 보기위해 사용됩니다. 이는 다음과 같이 정의됩니다.
 
 <br>
 
@@ -30,7 +30,7 @@
 
 **6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
 
-&#10230;주요 측정 항목들 ― 다음 측정 항목들은 주로 분류 모델의 성능을 평가할 때 사용된다.
+&#10230;주요 측정 항목들 ― 다음 측정 항목들은 주로 분류 모델의 성능을 평가할 때 사용됩니다.
 
 <br>
 
@@ -72,7 +72,7 @@
 
 **13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
 
-&#10230;ROC(Receiver Operating Curve) ― ROC 곡선은 임계값의 변화에 따른 TPR 대 FPR의 플롯이다. 이 측정 항목은 아래 표에 요약되어 있다:
+&#10230;ROC(Receiver Operating Curve) ― ROC 곡선은 임계값의 변화에 따른 TPR 대 FPR의 플롯입니다. 이 측정 항목은 아래 표에 요약되어 있습니다:
 
 <br>
 
@@ -84,7 +84,7 @@
 
 **15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
 
-&#10230;AUC(Area Under the receiving operating Curve) ― AUC 또는 AUROC라고도 하는 이 측정 항목은 다음 그림과 같이 ROC 곡선 아래의 영역이다:
+&#10230;AUC(Area Under the receiving operating Curve) ― AUC 또는 AUROC라고도 하는 이 측정 항목은 다음 그림과 같이 ROC 곡선 아래의 영역입니다:
 
 <br>
 
@@ -96,7 +96,7 @@
 
 **17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
 
-&#10230;기본 측정 항목 ― 회귀 모델 f가 주어졌을때, 다음의 측정 항목들은 모델의 성능을 평가할 때 주로 사용된다:
+&#10230;기본 측정 항목 ― 회귀 모델 f가 주어졌을때, 다음의 측정 항목들은 모델의 성능을 평가할 때 주로 사용됩니다:
 
 <br>
 
@@ -108,19 +108,19 @@
 
 **19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
 
-&#10230;결정 계수 ― 종종 R2 또는 r2로 표시되는 결정 계수는 관측된 결과가 모델에 의해 얼마나 잘 재현되는지를 측정하는 측도로서 다음과 같이 정의된다:
+&#10230;결정 계수 ― 종종 R2 또는 r2로 표시되는 결정 계수는 관측된 결과가 모델에 의해 얼마나 잘 재현되는지를 측정하는 측도로서 다음과 같이 정의됩니다:
 
 <br>
 
 **20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
 
-&#10230;주요 측정 항목들 ― 다음 측정 항목들은 주로 변수의 수를 고려하여 회귀 모델의 성능을 평가할 때 사용된다:
+&#10230;주요 측정 항목들 ― 다음 측정 항목들은 주로 변수의 수를 고려하여 회귀 모델의 성능을 평가할 때 사용됩니다:
 
 <br>
 
 **21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
 
-&#10230;여기서 L은 가능도이고 ^σ2는 각각의 반응과 관련된 분산의 추정값이다.
+&#10230;여기서 L은 가능도이고 ^σ2는 각각의 반응과 관련된 분산의 추정값입니다.
 
 <br>
 
@@ -132,7 +132,7 @@
 
 **23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
 
-&#10230;어휘 ― 모델을 선택할 때 우리는 다음과 같이 가지고 있는 데이터를 세 부분으로 구분한다:
+&#10230;어휘 ― 모델을 선택할 때 우리는 다음과 같이 가지고 있는 데이터를 세 부분으로 구분합니다:
 
 <br>
 
@@ -162,13 +162,13 @@
 
 **28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
 
-&#10230;모델이 선택되면 전체 데이터 세트에 대해 학습을 하고 보지 않은 데이터에서 테스트한다. 이는 아래 그림에 나타나있다.
+&#10230;모델이 선택되면 전체 데이터 세트에 대해 학습을 하고 보지 않은 데이터에서 테스트합니다. 이는 아래 그림에 나타나있습니다.
 
 <br>
 
 **29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
 
-&#10230;교차-검증 ― CV라고도하는 교차-검증은 초기의 학습 세트에 지나치게 의존하지 않는 모델을 선택하는데 사용되는 방법이다. 다양한 유형이 아래 표에 요약되어 있다:
+&#10230;교차-검증 ― CV라고도하는 교차-검증은 초기의 학습 세트에 지나치게 의존하지 않는 모델을 선택하는데 사용되는 방법입니다. 다양한 유형이 아래 표에 요약되어 있습니다:
 
 <br>
 
@@ -186,13 +186,13 @@
 
 **32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
 
-&#10230;가장 일반적으로 사용되는 방법은 k-폴드 교차-검증이라고하며 이는 학습 데이터를 k개의 폴드로 분할하고, 그 중 k-1개의 폴드로 모델을 학습하는 동시에 나머지 1개의 폴드로 모델을 검증한다. 이 작업을 k번 수행한다. 오류는 k 폴드에 대해 평균화되고 교차-검증 오류라고 부른다. 
+&#10230;가장 일반적으로 사용되는 방법은 k-폴드 교차-검증이라고하며 이는 학습 데이터를 k개의 폴드로 분할하고, 그 중 k-1개의 폴드로 모델을 학습하는 동시에 나머지 1개의 폴드로 모델을 검증합니다. 이 작업을 k번 수행합니다. 오류는 k 폴드에 대해 평균화되고 교차-검증 오류라고 부릅니다. 
 
 <br>
 
 **33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
 
-&#10230;정규화 ― 정규화 절차는 데이터에 대한 모델의 과적합을 피하고 분산이 커지는 문제를 처리하는 것을 목표로 한다. 다음의 표는 일반적으로 사용되는 정규화 기법의 여러 유형을 요약한 것이다:
+&#10230;정규화 ― 정규화 절차는 데이터에 대한 모델의 과적합을 피하고 분산이 커지는 문제를 처리하는 것을 목표로 합니다. 다음의 표는 일반적으로 사용되는 정규화 기법의 여러 유형을 요약한 것입니다:
 
 <br>
 
@@ -210,19 +210,19 @@
 
 **36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
 
-&#10230;편향 ― 모델의 편향은 기대되는 예측과 주어진 데이터 포인트에 대해 예측하려고하는 올바른 모델 간의 차이이다.
+&#10230;편향 ― 모델의 편향은 기대되는 예측과 주어진 데이터 포인트에 대해 예측하려고하는 올바른 모델 간의 차이입니다.
 
 <br>
 
 **37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
 
-&#10230;분산 ― 모델의 분산은 주어진 데이터 포인트에 대한 모델 예측의 가변성이다.
+&#10230;분산 ― 모델의 분산은 주어진 데이터 포인트에 대한 모델 예측의 가변성입니다.
 
 <br>
 
 **38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
 
-&#10230;편향/분산 트래이드오프 ― 모델이 간단할수록 편향이 높아지고 모델이 복잡할수록 분산이 커진다.
+&#10230;편향/분산 트래이드오프 ― 모델이 간단할수록 편향이 높아지고 모델이 복잡할수록 분산이 커집니다.
 
 <br>
 
@@ -246,13 +246,13 @@
 
 **42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
 
-&#10230;오류 분석 ― 오류 분석은 현재 모델과 완벽한 모델 간의 성능 차이의 근본 원인을 분석한다.
+&#10230;오류 분석 ― 오류 분석은 현재 모델과 완벽한 모델 간의 성능 차이의 근본 원인을 분석합니다.
 
 <br>
 
 **43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
 
-&#10230;애블러티브 분석 ― 애블러티브 분석은 현재 모델과 베이스라인 모델 간의 성능 차이의 근본 원인을 분석한다.
+&#10230;애블러티브 분석 ― 애블러티브 분석은 현재 모델과 베이스라인 모델 간의 성능 차이의 근본 원인을 분석합니다.
 
 <br>
 

From 4c3a7c6f4004a19ad0b3c342858253e79834396f Mon Sep 17 00:00:00 2001
From: wooil <ssnn1541@naver.com>
Date: Thu, 25 Oct 2018 23:58:18 +0900
Subject: [PATCH 019/531] Update ko/refresher-probability.md

---
 ko/refresher-probability.md | 381 ++++++++++++++++++++++++++++++++++++
 1 file changed, 381 insertions(+)
 create mode 100644 ko/refresher-probability.md

diff --git a/ko/refresher-probability.md b/ko/refresher-probability.md
new file mode 100644
index 000000000..5c9b34656
--- /dev/null
+++ b/ko/refresher-probability.md
@@ -0,0 +1,381 @@
+**1. Probabilities and Statistics refresher**
+
+&#10230;
+
+<br>
+
+**2. Introduction to Probability and Combinatorics**
+
+&#10230;
+
+<br>
+
+**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
+
+&#10230;
+
+<br>
+
+**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
+
+&#10230;
+
+<br>
+
+**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
+
+&#10230;
+
+<br>
+
+**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
+
+&#10230;
+
+<br>
+
+**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
+
+&#10230;
+
+<br>
+
+**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
+
+&#10230;
+
+<br>
+
+**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
+
+&#10230;
+
+<br>
+
+**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
+
+&#10230;
+
+<br>
+
+**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
+
+&#10230;
+
+<br>
+
+**12. Conditional Probability**
+
+&#10230;
+
+<br>
+
+**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
+
+&#10230;
+
+<br>
+
+**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
+
+&#10230;
+
+<br>
+
+**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
+
+&#10230;
+
+<br>
+
+**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
+
+&#10230;
+
+<br>
+
+**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
+
+&#10230;
+
+<br>
+
+**18. Independence ― Two events A and B are independent if and only if we have:**
+
+&#10230;
+
+<br>
+
+**19. Random Variables**
+
+&#10230;
+
+<br>
+
+**20. Definitions**
+
+&#10230;
+
+<br>
+
+**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
+
+&#10230;
+
+<br>
+
+**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
+
+&#10230;
+
+<br>
+
+**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
+
+&#10230;
+
+<br>
+
+**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
+
+&#10230;
+
+<br>
+
+**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
+
+&#10230;
+
+<br>
+
+**26. [Case, CDF F, PDF f, Properties of PDF]**
+
+&#10230;
+
+<br>
+
+**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
+
+&#10230;
+
+<br>
+
+**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
+
+&#10230;
+
+<br>
+
+**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
+
+&#10230;
+
+<br>
+
+**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
+
+&#10230;
+
+<br>
+
+**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
+
+&#10230;
+
+<br>
+
+**32. Probability Distributions**
+
+&#10230;
+
+<br>
+
+**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
+
+&#10230;
+
+<br>
+
+**34. Main distributions ― Here are the main distributions to have in mind:**
+
+&#10230;
+
+<br>
+
+**35. [Type, Distribution]**
+
+&#10230;
+
+<br>
+
+**36. Jointly Distributed Random Variables**
+
+&#10230;
+
+<br>
+
+**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
+
+&#10230;
+
+<br>
+
+**38. [Case, Marginal density, Cumulative function]**
+
+&#10230;
+
+<br>
+
+**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
+
+&#10230;
+
+<br>
+
+**40. Independence ― Two random variables X and Y are said to be independent if we have:**
+
+&#10230;
+
+<br>
+
+**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
+
+&#10230;
+
+<br>
+
+**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
+
+&#10230;
+
+<br>
+
+**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
+
+&#10230;
+
+<br>
+
+**44. Remark 2: If X and Y are independent, then ρXY=0.**
+
+&#10230;
+
+<br>
+
+**45. Parameter estimation**
+
+&#10230;
+
+<br>
+
+**46. Definitions**
+
+&#10230;
+
+<br>
+
+**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
+
+&#10230;
+
+<br>
+
+**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
+
+&#10230;
+
+<br>
+
+**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
+
+&#10230;
+
+<br>
+
+**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
+
+&#10230;
+
+<br>
+
+**51. Estimating the mean**
+
+&#10230;
+
+<br>
+
+**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
+
+&#10230;
+
+<br>
+
+**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
+
+&#10230;
+
+<br>
+
+**55. Estimating the variance**
+
+&#10230;
+
+<br>
+
+**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
+
+&#10230;
+
+<br>
+
+**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
+
+&#10230;
+
+<br>
+
+**59. [Introduction, Sample space, Event, Permutation]**
+
+&#10230;
+
+<br>
+
+**60. [Conditional probability, Bayes' rule, Independence]**
+
+&#10230;
+
+<br>
+
+**61. [Random variables, Definitions, Expectation, Variance]**
+
+&#10230;
+
+<br>
+
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
+
+&#10230;
+
+<br>
+
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
+
+&#10230;
+
+<br>
+
+**64. [Parameter estimation, Mean, Variance]**
+
+&#10230;

From 0767f9bc52cd3a07b03080d483d9f4b641f5e995 Mon Sep 17 00:00:00 2001
From: Wooil <38076110+WooilJeong@users.noreply.github.com>
Date: Fri, 26 Oct 2018 00:06:41 +0900
Subject: [PATCH 020/531] Update refresher-probability.md

---
 ko/refresher-probability.md | 382 ++++++++++++++++++++++++++++++++++++
 1 file changed, 382 insertions(+)

diff --git a/ko/refresher-probability.md b/ko/refresher-probability.md
index 5c9b34656..3f5024280 100644
--- a/ko/refresher-probability.md
+++ b/ko/refresher-probability.md
@@ -1,3 +1,385 @@
+
+**1. Probabilities and Statistics refresher**
+
+&#10230;확률과 통계
+
+<br>
+
+**2. Introduction to Probability and Combinatorics**
+
+&#10230;확률과 조합론에 대한 소개
+
+<br>
+
+**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
+
+&#10230;표본 공간 ― 시행의 가능한 모든 결과 집합은 시행의 표본 공간으로 알려져 있으며 S로 표기합니다.
+
+<br>
+
+**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
+
+&#10230;사건 ― 표본 공간의 모든 부분 집합 E를 사건이라고 합니다. 즉, 사건은 시행 가능한 결과로 구성된 집합입니다. 시행 결과가 E에 포함된다면, E가 발생했다고 이야기합니다.
+
+<br>
+
+**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
+
+&#10230;
+
+<br>
+
+**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
+
+&#10230;
+
+<br>
+
+**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
+
+&#10230;
+
+<br>
+
+**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
+
+&#10230;
+
+<br>
+
+**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
+
+&#10230;
+
+<br>
+
+**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
+
+&#10230;
+
+<br>
+
+**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
+
+&#10230;
+
+<br>
+
+**12. Conditional Probability**
+
+&#10230;
+
+<br>
+
+**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
+
+&#10230;
+
+<br>
+
+**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
+
+&#10230;
+
+<br>
+
+**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
+
+&#10230;
+
+<br>
+
+**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
+
+&#10230;
+
+<br>
+
+**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
+
+&#10230;
+
+<br>
+
+**18. Independence ― Two events A and B are independent if and only if we have:**
+
+&#10230;
+
+<br>
+
+**19. Random Variables**
+
+&#10230;
+
+<br>
+
+**20. Definitions**
+
+&#10230;
+
+<br>
+
+**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
+
+&#10230;
+
+<br>
+
+**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
+
+&#10230;
+
+<br>
+
+**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
+
+&#10230;
+
+<br>
+
+**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
+
+&#10230;
+
+<br>
+
+**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
+
+&#10230;
+
+<br>
+
+**26. [Case, CDF F, PDF f, Properties of PDF]**
+
+&#10230;
+
+<br>
+
+**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
+
+&#10230;
+
+<br>
+
+**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
+
+&#10230;
+
+<br>
+
+**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
+
+&#10230;
+
+<br>
+
+**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
+
+&#10230;
+
+<br>
+
+**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
+
+&#10230;
+
+<br>
+
+**32. Probability Distributions**
+
+&#10230;
+
+<br>
+
+**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
+
+&#10230;
+
+<br>
+
+**34. Main distributions ― Here are the main distributions to have in mind:**
+
+&#10230;
+
+<br>
+
+**35. [Type, Distribution]**
+
+&#10230;
+
+<br>
+
+**36. Jointly Distributed Random Variables**
+
+&#10230;
+
+<br>
+
+**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
+
+&#10230;
+
+<br>
+
+**38. [Case, Marginal density, Cumulative function]**
+
+&#10230;
+
+<br>
+
+**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
+
+&#10230;
+
+<br>
+
+**40. Independence ― Two random variables X and Y are said to be independent if we have:**
+
+&#10230;
+
+<br>
+
+**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
+
+&#10230;
+
+<br>
+
+**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
+
+&#10230;
+
+<br>
+
+**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
+
+&#10230;
+
+<br>
+
+**44. Remark 2: If X and Y are independent, then ρXY=0.**
+
+&#10230;
+
+<br>
+
+**45. Parameter estimation**
+
+&#10230;
+
+<br>
+
+**46. Definitions**
+
+&#10230;
+
+<br>
+
+**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
+
+&#10230;
+
+<br>
+
+**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
+
+&#10230;
+
+<br>
+
+**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
+
+&#10230;
+
+<br>
+
+**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
+
+&#10230;
+
+<br>
+
+**51. Estimating the mean**
+
+&#10230;
+
+<br>
+
+**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
+
+&#10230;
+
+<br>
+
+**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
+
+&#10230;
+
+<br>
+
+**55. Estimating the variance**
+
+&#10230;
+
+<br>
+
+**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
+
+&#10230;
+
+<br>
+
+**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
+
+&#10230;
+
+<br>
+
+**59. [Introduction, Sample space, Event, Permutation]**
+
+&#10230;
+
+<br>
+
+**60. [Conditional probability, Bayes' rule, Independence]**
+
+&#10230;
+
+<br>
+
+**61. [Random variables, Definitions, Expectation, Variance]**
+
+&#10230;
+
+<br>
+
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
+
+&#10230;
+
+<br>
+
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
+
+&#10230;
+
+<br>
+
+**64. [Parameter estimation, Mean, Variance]**
+
+&#10230;
 **1. Probabilities and Statistics refresher**
 
 &#10230;

From e7f01968bf0aa8da38614ffe028cbd7ba1581d37 Mon Sep 17 00:00:00 2001
From: kwanghyeokahn <eyelighthyeok@gmail.com>
Date: Fri, 26 Oct 2018 11:23:23 +0900
Subject: [PATCH 021/531] Update ko/cheatsheet-unsupervised-learning.md

---
 ko/cheatsheet-unsupervised-learning.md | 340 +++++++++++++++++++++++++
 1 file changed, 340 insertions(+)
 create mode 100644 ko/cheatsheet-unsupervised-learning.md

diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md
new file mode 100644
index 000000000..827d815a3
--- /dev/null
+++ b/ko/cheatsheet-unsupervised-learning.md
@@ -0,0 +1,340 @@
+**1. Unsupervised Learning cheatsheet**
+
+&#10230;
+
+<br>
+
+**2. Introduction to Unsupervised Learning**
+
+&#10230;
+
+<br>
+
+**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
+
+&#10230;
+
+<br>
+
+**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
+
+&#10230;
+
+<br>
+
+**5. Clustering**
+
+&#10230;
+
+<br>
+
+**6. Expectation-Maximization**
+
+&#10230;
+
+<br>
+
+**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
+
+&#10230;
+
+<br>
+
+**8. [Setting, Latent variable z, Comments]**
+
+&#10230;
+
+<br>
+
+**9. [Mixture of k Gaussians, Factor analysis]**
+
+&#10230;
+
+<br>
+
+**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230;
+
+<br>
+
+**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
+
+&#10230;
+
+<br>
+
+**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
+
+&#10230;
+
+<br>
+
+**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
+
+&#10230;
+
+<br>
+
+**14. k-means clustering**
+
+&#10230;
+
+<br>
+
+**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
+
+&#10230;
+
+<br>
+
+**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230;
+
+<br>
+
+**17. [Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230;
+
+<br>
+
+**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
+
+&#10230;
+
+<br>
+
+**19. Hierarchical clustering**
+
+&#10230;
+
+<br>
+
+**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
+
+&#10230;
+
+<br>
+
+**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**22. [Ward linkage, Average linkage, Complete linkage]**
+
+&#10230;
+
+<br>
+
+**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
+
+&#10230;
+
+<br>
+
+**24. Clustering assessment metrics**
+
+&#10230;
+
+<br>
+
+**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
+
+&#10230;
+
+<br>
+
+**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
+
+&#10230;
+
+<br>
+
+**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
+
+&#10230;
+
+<br>
+
+**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+**29. Dimension reduction**
+
+&#10230;
+
+<br>
+
+**30. Principal component analysis**
+
+&#10230;
+
+<br>
+
+**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
+
+&#10230;
+
+<br>
+
+**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+
+<br>
+
+**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+
+<br>
+
+**34. diagonal**
+
+&#10230;
+
+<br>
+
+**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230;
+
+<br>
+
+**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
+dimensions by maximizing the variance of the data as follows:**
+
+&#10230;
+
+<br>
+
+**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230;
+
+<br>
+
+**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
+
+&#10230;
+
+<br>
+
+**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
+
+&#10230;
+
+<br>
+
+**40. Step 4: Project the data on spanR(u1,...,uk).**
+
+&#10230;
+
+<br>
+
+**41. This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230;
+
+<br>
+
+**42. [Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230;
+
+<br>
+
+**43. Independent component analysis**
+
+&#10230;
+
+<br>
+
+**44. It is a technique meant to find the underlying generating sources.**
+
+&#10230;
+
+<br>
+
+**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
+
+&#10230;
+
+<br>
+
+**46. The goal is to find the unmixing matrix W=A−1.**
+
+&#10230;
+
+<br>
+
+**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
+
+&#10230;
+
+<br>
+
+**48. Write the probability of x=As=W−1s as:**
+
+&#10230;
+
+<br>
+
+**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
+
+&#10230;
+
+<br>
+
+**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
+
+&#10230;
+
+<br>
+
+**51. The Machine Learning cheatsheets are now available in Japanese.**
+
+&#10230;
+
+<br>
+
+**52. Original authors**
+
+&#10230;
+
+<br>
+
+**53. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**54. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**55. [Introduction, Motivation, Jensen's inequality]**
+
+&#10230;
+
+<br>
+
+**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
+
+&#10230;
+
+<br>
+
+**57. [Dimension reduction, PCA, ICA]**
+
+&#10230;

From 80fa7fdbede7297e7ed2a3ddc3e04e8a54fc08e7 Mon Sep 17 00:00:00 2001
From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com>
Date: Fri, 26 Oct 2018 11:26:15 +0900
Subject: [PATCH 022/531] Update cheatsheet-unsupervised-learning.md

---
 ko/cheatsheet-unsupervised-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md
index 827d815a3..3f9df3dd0 100644
--- a/ko/cheatsheet-unsupervised-learning.md
+++ b/ko/cheatsheet-unsupervised-learning.md
@@ -1,6 +1,6 @@
 **1. Unsupervised Learning cheatsheet**
 
-&#10230;
+&#10230; 하하
 
 <br>
 

From 3f5f3e74a94353c137848df351fe27ac55b505fe Mon Sep 17 00:00:00 2001
From: sy95lee <37721312+sy95lee@users.noreply.github.com>
Date: Fri, 26 Oct 2018 11:31:09 +0900
Subject: [PATCH 023/531] Update ko/refresher-linear-algebra.md

---
 ko/refresher-linear-algebra.md | 339 +++++++++++++++++++++++++++++++++
 1 file changed, 339 insertions(+)
 create mode 100644 ko/refresher-linear-algebra.md

diff --git a/ko/refresher-linear-algebra.md b/ko/refresher-linear-algebra.md
new file mode 100644
index 000000000..a6b440d1e
--- /dev/null
+++ b/ko/refresher-linear-algebra.md
@@ -0,0 +1,339 @@
+**1. Linear Algebra and Calculus refresher**
+
+&#10230;
+
+<br>
+
+**2. General notations**
+
+&#10230;
+
+<br>
+
+**3. Definitions**
+
+&#10230;
+
+<br>
+
+**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
+
+&#10230;
+
+<br>
+
+**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
+
+&#10230;
+
+<br>
+
+**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
+
+&#10230;
+
+<br>
+
+**7. Main matrices**
+
+&#10230;
+
+<br>
+
+**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
+
+&#10230;
+
+<br>
+
+**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
+
+&#10230;
+
+<br>
+
+**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
+
+&#10230;
+
+<br>
+
+**11. Remark: we also note D as diag(d1,...,dn).**
+
+&#10230;
+
+<br>
+
+**12. Matrix operations**
+
+&#10230;
+
+<br>
+
+**13. Multiplication**
+
+&#10230;
+
+<br>
+
+**14. Vector-vector ― There are two types of vector-vector products:**
+
+&#10230;
+
+<br>
+
+**15. inner product: for x,y∈Rn, we have:**
+
+&#10230;
+
+<br>
+
+**16. outer product: for x∈Rm,y∈Rn, we have:**
+
+&#10230;
+
+<br>
+
+**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
+
+&#10230;
+
+<br>
+
+**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
+
+&#10230;
+
+<br>
+
+**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
+
+&#10230;
+
+<br>
+
+**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
+
+&#10230;
+
+<br>
+
+**21. Other operations**
+
+&#10230;
+
+<br>
+
+**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
+
+&#10230;
+
+<br>
+
+**23. Remark: for matrices A,B, we have (AB)T=BTAT**
+
+&#10230;
+
+<br>
+
+**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
+
+&#10230;
+
+<br>
+
+**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
+
+&#10230;
+
+<br>
+
+**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
+
+&#10230;
+
+<br>
+
+**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
+
+&#10230;
+
+<br>
+
+**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
+
+&#10230;
+
+<br>
+
+**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
+
+&#10230;
+
+<br>
+
+**30. Matrix properties**
+
+&#10230;
+
+<br>
+
+**31. Definitions**
+
+&#10230;
+
+<br>
+
+**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
+
+&#10230;
+
+<br>
+
+**33. [Symmetric, Antisymmetric]**
+
+&#10230;
+
+<br>
+
+**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
+
+&#10230;
+
+<br>
+
+**35. N(ax)=|a|N(x) for a scalar**
+
+&#10230;
+
+<br>
+
+**36. if N(x)=0, then x=0**
+
+&#10230;
+
+<br>
+
+**37. For x∈V, the most commonly used norms are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**38. [Norm, Notation, Definition, Use case]**
+
+&#10230;
+
+<br>
+
+**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
+
+&#10230;
+
+<br>
+
+**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
+
+&#10230;
+
+<br>
+
+**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
+
+&#10230;
+
+<br>
+
+**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
+
+&#10230;
+
+<br>
+
+**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
+
+&#10230;
+
+<br>
+
+**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+
+<br>
+
+**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+
+<br>
+
+**46. diagonal**
+
+&#10230;
+
+<br>
+
+**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
+
+&#10230;
+
+<br>
+
+**48. Matrix calculus**
+
+&#10230;
+
+<br>
+
+**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
+
+&#10230;
+
+<br>
+
+**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
+
+&#10230;
+
+<br>
+
+**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
+
+&#10230;
+
+<br>
+
+**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
+
+&#10230;
+
+<br>
+
+**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
+
+&#10230;
+
+<br>
+
+**54. [General notations, Definitions, Main matrices]**
+
+&#10230;
+
+<br>
+
+**55. [Matrix operations, Multiplication, Other operations]**
+
+&#10230;
+
+<br>
+
+**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
+
+&#10230;
+
+<br>
+
+**57. [Matrix calculus, Gradient, Hessian, Operations]**
+
+&#10230;

From c4a49eb9d6674f4e540bb45b5229eb210c1b087e Mon Sep 17 00:00:00 2001
From: Wooil <38076110+WooilJeong@users.noreply.github.com>
Date: Fri, 26 Oct 2018 13:02:41 +0900
Subject: [PATCH 024/531] Update refresher-probability.md

---
 ko/refresher-probability.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/ko/refresher-probability.md b/ko/refresher-probability.md
index 3f5024280..634ba201f 100644
--- a/ko/refresher-probability.md
+++ b/ko/refresher-probability.md
@@ -7,7 +7,7 @@
 
 **2. Introduction to Probability and Combinatorics**
 
-&#10230;확률과 조합론에 대한 소개
+&#10230;확률과 조합론 소개
 
 <br>
 
@@ -23,27 +23,27 @@
 
 <br>
 
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
+**5. Axioms of probability ― For each event E, we denote P(E) as the probability of event E occuring.**
 
-&#10230;
+&#10230;확률의 공리 ― 각 사건 E에 대하여, 우리는 사건 E가 발생할 확률을 P(E)로 나타냅니다.
 
 <br>
 
 **6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
 
-&#10230;
+&#10230;공리 1 ― 모든 확률은 0과 1사이에 포함됩니다, 즉:
 
 <br>
 
 **7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
 
-&#10230;
+&#10230;공리 2 ― 전체 표본 공간에서 적어도 하나의 근원 사건이 발생할 확률은 1입니다. 즉:
 
 <br>
 
 **8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
 
-&#10230;
+&#10230;공리 3 ― 
 
 <br>
 

From 3da6cfa5c1650a47f807e43375f72b498eadd832 Mon Sep 17 00:00:00 2001
From: Wooil <38076110+WooilJeong@users.noreply.github.com>
Date: Fri, 26 Oct 2018 13:22:22 +0900
Subject: [PATCH 025/531] Update refresher-probability.md

---
 ko/refresher-probability.md | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/ko/refresher-probability.md b/ko/refresher-probability.md
index 634ba201f..2c5e2252e 100644
--- a/ko/refresher-probability.md
+++ b/ko/refresher-probability.md
@@ -1,4 +1,5 @@
 
+
 **1. Probabilities and Statistics refresher**
 
 &#10230;확률과 통계
@@ -43,49 +44,49 @@
 
 **8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
 
-&#10230;공리 3 ― 
+&#10230;공리 3 ― 서로 배반인 어떤 연속적인 사건 E1,...,En 에 대하여, 우리는:
 
 <br>
 
 **9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
 
-&#10230;
+&#10230;순열(Permutation) ― 순열은 n개의 객체들로부터 r개의 객체들의 순서를 고려한 배열입니다. 그러한 배열의 수는 P (n, r)에 의해 주어지며, 다음과 같이 정의됩니다.
 
 <br>
 
 **10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
 
-&#10230;
+&#10230;조합(Combination) ― 조합은 n개의 객체들로부터 r개의 객체들의 순서를 고려하지 않은 배열입니다. 그러한 배열의 수는 다음과 같이 정의되는 C(n, r)에 의해 주어집니다.
 
 <br>
 
 **11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
 
-&#10230;
+&#10230;비고 :우리는 for 0⩽r⩽n에 대해, P(n,r)⩾C(n,r)를 가집니다.
 
 <br>
 
 **12. Conditional Probability**
 
-&#10230;
+&#10230;조건부 확률
 
 <br>
 
 **13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
 
-&#10230;
+&#10230;베이즈 규칙 ― P(B)>0인 사건 A, B에 대해, 우리는:
 
 <br>
 
 **14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
 
-&#10230;
+&#10230;비고 :우리는 P(A∩B)=P(A)P(B|A)=P(A|B)P(B)를 가집니다.
 
 <br>
 
 **15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
 
-&#10230;
+&#10230;분할 ― 
 
 <br>
 

From 39edca3e193b9c10e5f8edb92ce9b40f3c145301 Mon Sep 17 00:00:00 2001
From: Wooil <38076110+WooilJeong@users.noreply.github.com>
Date: Fri, 26 Oct 2018 13:23:42 +0900
Subject: [PATCH 026/531] Update refresher-probability.md

---
 ko/refresher-probability.md | 381 ------------------------------------
 1 file changed, 381 deletions(-)

diff --git a/ko/refresher-probability.md b/ko/refresher-probability.md
index 2c5e2252e..277908e50 100644
--- a/ko/refresher-probability.md
+++ b/ko/refresher-probability.md
@@ -381,384 +381,3 @@
 **64. [Parameter estimation, Mean, Variance]**
 
 &#10230;
-**1. Probabilities and Statistics refresher**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Probability and Combinatorics**
-
-&#10230;
-
-<br>
-
-**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-
-<br>
-
-**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-
-<br>
-
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-
-<br>
-
-**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-
-<br>
-
-**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-
-<br>
-
-**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-
-<br>
-
-**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-
-<br>
-
-**12. Conditional Probability**
-
-&#10230;
-
-<br>
-
-**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-
-<br>
-
-**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-
-<br>
-
-**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-
-<br>
-
-**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-
-<br>
-
-**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-
-<br>
-
-**18. Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-
-<br>
-
-**19. Random Variables**
-
-&#10230;
-
-<br>
-
-**20. Definitions**
-
-&#10230;
-
-<br>
-
-**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-
-<br>
-
-**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-
-<br>
-
-**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-
-<br>
-
-**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-
-<br>
-
-**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-
-<br>
-
-**26. [Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-
-<br>
-
-**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-
-<br>
-
-**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-
-<br>
-
-**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-
-<br>
-
-**32. Probability Distributions**
-
-&#10230;
-
-<br>
-
-**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-
-<br>
-
-**34. Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-
-<br>
-
-**35. [Type, Distribution]**
-
-&#10230;
-
-<br>
-
-**36. Jointly Distributed Random Variables**
-
-&#10230;
-
-<br>
-
-**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-
-<br>
-
-**38. [Case, Marginal density, Cumulative function]**
-
-&#10230;
-
-<br>
-
-**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**40. Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-
-<br>
-
-**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-
-<br>
-
-**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-
-<br>
-
-**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-
-<br>
-
-**44. Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-
-<br>
-
-**45. Parameter estimation**
-
-&#10230;
-
-<br>
-
-**46. Definitions**
-
-&#10230;
-
-<br>
-
-**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-
-<br>
-
-**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-
-<br>
-
-**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-
-<br>
-
-**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-
-<br>
-
-**51. Estimating the mean**
-
-&#10230;
-
-<br>
-
-**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
-
-&#10230;
-
-<br>
-
-**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-
-<br>
-
-**55. Estimating the variance**
-
-&#10230;
-
-<br>
-
-**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-
-<br>
-
-**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-
-<br>
-
-**59. [Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-
-<br>
-
-**60. [Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-
-<br>
-
-**61. [Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-
-<br>
-
-**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-
-<br>
-
-**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-
-<br>
-
-**64. [Parameter estimation, Mean, Variance]**
-
-&#10230;

From dc82cf3e80465cac291ab9c200222d3320347a47 Mon Sep 17 00:00:00 2001
From: Wooil <38076110+WooilJeong@users.noreply.github.com>
Date: Fri, 26 Oct 2018 14:14:57 +0900
Subject: [PATCH 027/531] Update refresher-probability.md

---
 ko/refresher-probability.md | 67 ++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 34 deletions(-)

diff --git a/ko/refresher-probability.md b/ko/refresher-probability.md
index 277908e50..2328d96f7 100644
--- a/ko/refresher-probability.md
+++ b/ko/refresher-probability.md
@@ -1,5 +1,4 @@
 
-
 **1. Probabilities and Statistics refresher**
 
 &#10230;확률과 통계
@@ -44,19 +43,19 @@
 
 **8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
 
-&#10230;공리 3 ― 서로 배반인 어떤 연속적인 사건 E1,...,En 에 대하여, 우리는:
+&#10230;공리 3 ― 서로 배반인 어떤 연속적인 사건 E1,...,En 에 대하여, 우리는 다음을 가집니다:
 
 <br>
 
 **9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
 
-&#10230;순열(Permutation) ― 순열은 n개의 객체들로부터 r개의 객체들의 순서를 고려한 배열입니다. 그러한 배열의 수는 P (n, r)에 의해 주어지며, 다음과 같이 정의됩니다.
+&#10230;순열(Permutation) ― 순열은 n개의 객체들로부터 r개의 객체들의 순서를 고려한 배열입니다. 그러한 배열의 수는 P (n, r)에 의해 주어지며, 다음과 같이 정의됩니다:
 
 <br>
 
 **10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
 
-&#10230;조합(Combination) ― 조합은 n개의 객체들로부터 r개의 객체들의 순서를 고려하지 않은 배열입니다. 그러한 배열의 수는 다음과 같이 정의되는 C(n, r)에 의해 주어집니다.
+&#10230;조합(Combination) ― 조합은 n개의 객체들로부터 r개의 객체들의 순서를 고려하지 않은 배열입니다. 그러한 배열의 수는 다음과 같이 정의되는 C(n, r)에 의해 주어집니다:
 
 <br>
 
@@ -74,7 +73,7 @@
 
 **13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
 
-&#10230;베이즈 규칙 ― P(B)>0인 사건 A, B에 대해, 우리는:
+&#10230;베이즈 규칙 ― P(B)>0인 사건 A, B에 대해, 우리는 다음을 가집니다:
 
 <br>
 
@@ -86,73 +85,73 @@
 
 **15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
 
-&#10230;분할 ― 
+&#10230;파티션(Partition)― {Ai, i∈ [[1, n]]}은 모든 i에 대해 Ai ≠ ∅이라고 해봅시다. 우리는 {Ai}가 다음과 같은 경우 파티션이라고 말합니다.
 
 <br>
 
 **16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
 
-&#10230;
+&#10230;비고 : 표본 공간에서 어떤 사건 B에 대해서 우리는 P(B) = nΣi = 1P (B | Ai) P (Ai)를 가집니다.
 
 <br>
 
 **17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
 
-&#10230;
+&#10230;베이즈 규칙의 확장된 형태 ― {Ai,i∈[[1,n]]}를 표본 공간의 파티션이라고 합시다. 우리는 다음을 가집니다.: 
 
 <br>
 
 **18. Independence ― Two events A and B are independent if and only if we have:**
 
-&#10230;
+&#10230;독립성 ― 다음의 경우에만 두 사건 A, B가 독립적입니다:
 
 <br>
 
 **19. Random Variables**
 
-&#10230;
+&#10230;확률 변수
 
 <br>
 
 **20. Definitions**
 
-&#10230;
+&#10230;정의
 
 <br>
 
 **21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
 
-&#10230;
+&#10230;확률 변수 ― 주로 X라고 표기된 확률 변수는 표본 공간의 모든 요소를 ​​실선에 대응시키는 함수입니다.
 
 <br>
 
 **22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
 
-&#10230;
+&#10230;누적 분포 함수 (CDF) ― 단조 감소하지 않고 limx → -∞F (x) = 0 이고, limx → + ∞F (x) = 1 인 누적 분포 함수 F는 다음과 같이 정의됩니다:
 
 <br>
 
 **23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
 
-&#10230;
+&#10230;비고 : 우리는 P(a<X⩽B)=F(b)−F(a)를 가집니다.
 
 <br>
 
 **24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
 
-&#10230;
+&#10230;확률 밀도 함수 (PDF) ― 확률 밀도 함수 f는 인접한 두 확률 변수의 사이에 X가 포함될 확률입니다.
 
 <br>
 
 **25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
 
-&#10230;
+&#10230;PDF와 CDF의 관계 ― 이산 (D)과 연속 (C) 예시에서 알아야 할 중요한 특성이 있습니다.
 
 <br>
 
 **26. [Case, CDF F, PDF f, Properties of PDF]**
 
-&#10230;
+&#10230;[예시, CDF F, PDF f, PDF의 특성]
 
 <br>
 
@@ -194,7 +193,7 @@
 
 **33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
 
-&#10230;
+&#10230;체비쇼프 부등식
 
 <br>
 
@@ -290,94 +289,94 @@
 
 **49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
 
-&#10230;
+&#10230;편향 ― 추정량 ^θ의 편향은 
 
 <br>
 
 **50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
 
-&#10230;
+&#10230;비고 : 추정량은 E [^ θ] = θ 일 때, 비 편향적이라고 말합니다.
 
 <br>
 
 **51. Estimating the mean**
 
-&#10230;
+&#10230;평균 추정
 
 <br>
 
 **52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
 
-&#10230;
+&#10230;표본 평균 ― 랜덤 표본의 표본 평균은 분포의 실제 평균 μ를 추정하는 데 사용되며 종종 다음과 같이 정의됩니다:
 
 <br>
 
 **53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
 
-&#10230;
+&#10230;비고 : 표본 평균은 비 편향적입니다, 즉i.e E[¯¯¯¯¯X]=μ.
 
 <br>
 
 **54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
 
-&#10230;
+&#10230;중심 극한 정리 ― 평균 μ와 분산 σ2를 갖는 주어진 분포를 따르는 랜덤 표본 X1, ..., Xn을 가정해 봅시다 그러면 우리는 다음을 가집니다:
 
 <br>
 
 **55. Estimating the variance**
 
-&#10230;
+&#10230;분산 추정
 
 <br>
 
 **56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
 
-&#10230;
+&#10230;표본 분산 ― 랜덤 표본의 표본 분산은 분포의 실제 분산 σ2를 추정하는 데 사용되며 종종 s2 또는 σ2로 표기되며 다음과 같이 정의됩니다:
 
 <br>
 
 **57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
 
-&#10230;
+&#10230;비고 : 표본 분산은 비 편향적입니다, 즉 E[s2]=σ2.
 
 <br>
 
 **58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
 
-&#10230;
+&#10230;표본 분산과 카이 제곱의 관계 ― s2를 랜덤 표본의 표분 분산이라고 합시다. 우리는 다음을 가집니다:
 
 <br>
 
 **59. [Introduction, Sample space, Event, Permutation]**
 
-&#10230;
+&#10230;[소개, 표본 공간, 사건, 순열]
 
 <br>
 
 **60. [Conditional probability, Bayes' rule, Independence]**
 
-&#10230;
+&#10230;[조건부 확률, 베이즈 규칙, 독립]
 
 <br>
 
 **61. [Random variables, Definitions, Expectation, Variance]**
 
-&#10230;
+&#10230;[확률 변수, 정의, 기대값, 분산]
 
 <br>
 
 **62. [Probability distributions, Chebyshev's inequality, Main distributions]**
 
-&#10230;
+&#10230;[확률 분포, 체비쇼프 부등식, 주요 분포]
 
 <br>
 
 **63. [Jointly distributed random variables, Density, Covariance, Correlation]**
 
-&#10230;
+&#10230;[결합 분포의 확률 변수, 밀도, 공분산, 상관관계]
 
 <br>
 
 **64. [Parameter estimation, Mean, Variance]**
 
-&#10230;
+&#10230;[모수 추정, 평균, 분산]

From b22ae6281089cb553108b2753d42ae4b31d15dcc Mon Sep 17 00:00:00 2001
From: Wooil <38076110+WooilJeong@users.noreply.github.com>
Date: Fri, 26 Oct 2018 15:00:31 +0900
Subject: [PATCH 028/531] Update refresher-probability.md

---
 ko/refresher-probability.md | 49 ++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/ko/refresher-probability.md b/ko/refresher-probability.md
index 2328d96f7..53ec90c53 100644
--- a/ko/refresher-probability.md
+++ b/ko/refresher-probability.md
@@ -157,145 +157,144 @@
 
 **27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
 
-&#10230;
+&#10230;분포의 기대값과 적률 ― 이산 혹은 연속일 때, 기대값 E[X], 일반화된 기대값 E[g(X)], k번째 적률 E[Xk] 및 특성 함수 ψ(ω) :
 
 <br>
 
 **28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
 
-&#10230;
+&#10230;분산 (Variance) ― 주로 Var(X) 또는 σ2이라고 표기된 확률 변수의 분산은 분포 함수의 산포(Spread)를 측정한 값입니다. 이는 다음과 같이 결정됩니다:
 
 <br>
 
 **29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
 
-&#10230;
-
+&#10230;표준 편차(Standard Deviation) ― 표준 편차는 실제 확률 변수의 단위를 사용할 수 있는 분포 함수의 산포(Spread)를 측정하는 측도입니다. 이는 다음과 같이 결정됩니다:
 <br>
 
 **30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
 
-&#10230;
+&#10230;확률 변수의 변환 ― 변수 X와 Y를 어떤 함수로 연결되도록 해봅시다. fX와 fY에 각각 X와 Y의 분포 함수를 표기하면 다음과 같습니다:
 
 <br>
 
 **31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
 
-&#10230;
+&#10230;라이프니츠 적분 규칙 ― g를 x의 함수로, 잠재적으로 c라고 해봅시다. 그리고 c에 종속적인 경계 a, b에 대해 우리는 다음을 가집니다:
 
 <br>
 
 **32. Probability Distributions**
 
-&#10230;
+&#10230;확률 분포
 
 <br>
 
 **33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
 
-&#10230;체비쇼프 부등식
+&#10230;체비쇼프 부등식 ― X를 기대값 μ의 확률 변수라고 해봅시다. k에 대하여, σ>0이면 다음과 같은 부등식을 가집니다:
 
 <br>
 
 **34. Main distributions ― Here are the main distributions to have in mind:**
 
-&#10230;
+&#10230;주요 분포들― 기억해야 할 주요 분포들이 여기 있습니다:
 
 <br>
 
 **35. [Type, Distribution]**
 
-&#10230;
+&#10230;[타입(Type), 분포]
 
 <br>
 
 **36. Jointly Distributed Random Variables**
 
-&#10230;
+&#10230;결합 분포 확률 변수
 
 <br>
 
 **37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
 
-&#10230;
+&#10230;주변 밀도와 누적 분포 ― 결합 밀도 확률 함수 fXY로부터 우리는 다음을 가집니다
 
 <br>
 
 **38. [Case, Marginal density, Cumulative function]**
 
-&#10230;
+&#10230;[예시, 주변 밀도, 누적 함수]
 
 <br>
 
 **39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
 
-&#10230;
+&#10230;조건부 밀도 ― 주로 fX|Y로 표기되는 Y에 대한 X의 조건부 밀도는 다음과 같이 정의됩니다:
 
 <br>
 
 **40. Independence ― Two random variables X and Y are said to be independent if we have:**
 
-&#10230;
+&#10230;독립성 ― 두 확률 변수 X와 Y는 다음과 같은 경우에 독립적이라고 합니다:
 
 <br>
 
 **41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
 
-&#10230;
+&#10230;공분산 ― 다음과 같이 두 확률 변수 X와 Y의 공분산을 σ2XY 혹은 더 일반적으로는 Cov(X,Y)로 정의합니다:
 
 <br>
 
 **42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
 
-&#10230;
+&#10230;상관관계 ― σX, σY로 X와 Y의 표준 편차를 표기함으로써 ρXY로 표기된 임의의 변수 X와 Y 사이의 상관관계를 다음과 같이 정의합니다:
 
 <br>
 
 **43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
 
-&#10230;
+&#10230;비고 1 : 우리는 임의의 확률 변수 X, Y에 대해 ρXY∈ [-1,1]를 가진다고 말합니다. 
 
 <br>
 
 **44. Remark 2: If X and Y are independent, then ρXY=0.**
 
-&#10230;
+&#10230;비고 2 : X와 Y가 독립이라면 ρXY=0입니다.
 
 <br>
 
 **45. Parameter estimation**
 
-&#10230;
+&#10230;모수 추정
 
 <br>
 
 **46. Definitions**
 
-&#10230;
+&#10230;정의
 
 <br>
 
 **47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
 
-&#10230;
+&#10230;확률 표본 ― 확률 표본은 X와 독립적으로 동일하게 분포하는 n개의 확률 변수 X1, ..., Xn의 모음입니다.
 
 <br>
 
 **48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
 
-&#10230;
+&#10230;추정량 ―  추정량은 통계 모델에서 알 수 없는 모수의 값을 추론하는 데 사용되는 데이터의 함수입니다.
 
 <br>
 
 **49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
 
-&#10230;편향 ― 추정량 ^θ의 편향은 
+&#10230;편향 ― 추정량 ^θ의 편향은 ^θ 분포의 기대값과 실제값 사이의 차이로 정의됩니다. 즉,:
 
 <br>
 
 **50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
 
-&#10230;비고 : 추정량은 E [^ θ] = θ 일 때, 비 편향적이라고 말합니다.
+&#10230;비고 : 추정량은 E [^ θ]=θ 일 때, 비 편향적이라고 말합니다.
 
 <br>
 

From b660e6a3b92f57bdafc00cbee89845ca7f3784a0 Mon Sep 17 00:00:00 2001
From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com>
Date: Fri, 26 Oct 2018 17:58:53 +0900
Subject: [PATCH 029/531] Update cheatsheet-unsupervised-learning.md

---
 ko/cheatsheet-unsupervised-learning.md | 92 +++++++++++++-------------
 1 file changed, 46 insertions(+), 46 deletions(-)

diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md
index 3f9df3dd0..c76e2d3af 100644
--- a/ko/cheatsheet-unsupervised-learning.md
+++ b/ko/cheatsheet-unsupervised-learning.md
@@ -1,204 +1,204 @@
 **1. Unsupervised Learning cheatsheet**
 
-&#10230; 하하
+&#10230; 비지도 학습 cheatsheet
 
 <br>
 
 **2. Introduction to Unsupervised Learning**
 
-&#10230;
+&#10230; 비지도 학습 소개
 
 <br>
 
 **3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
 
-&#10230;
+&#10230; 동기부여 - 비지도학습의 목표는 {x(1),...,x(m)}와 같이 라벨링이 되어있지 않은 데이터 내의 숨겨진 패턴을 찾는것이다.
 
 <br>
 
 **4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
 
-&#10230;
+&#10230; 옌센 부등식 - f를 볼록함수로 하며 X는 확률변수로 두고 아래와 같은 부등식을 따르도록 하자.
 
 <br>
 
 **5. Clustering**
 
-&#10230;
+&#10230; 군집화
 
 <br>
 
 **6. Expectation-Maximization**
 
-&#10230;
+&#10230; 기댓값 최대화
 
 <br>
 
 **7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
 
-&#10230;
+&#10230; 잠재변수 - 잠재변수들은 숨겨져있거나 관측되지 않는 변수들을 말하며, 이러한 변수들은 추정문제의 어려움을 가져온다. 그리고 잠재변수는 종종 z로 표기되어진다. 일반적인 잠재변수로 구성되어져있는 형태들을 살펴보자 
 
 <br>
 
 **8. [Setting, Latent variable z, Comments]**
 
-&#10230;
+&#10230; 표기형태, 잠재변수 z, 주석
 
 <br>
 
 **9. [Mixture of k Gaussians, Factor analysis]**
 
-&#10230;
+&#10230; 가우시안 혼합모델, 요인분석
 
 <br>
 
-**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter  θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
 
-&#10230;
+&#10230; 알고리즘 - 기댓값 최대화 (EM) 알고리즘은 모수 θ를 추정하는 효율적인 방법을 제공해준다. 모수 θ의 추정은 아래와 같이 우도의 아래 경계지점을 구성하는(E-step)과 그 우도의 아래 경계지점을 최적화하는(M-step)들의 반복적인 최대우도측정을 통해 추정된다. 
 
 <br>
 
 **11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
 
-&#10230;
+&#10230; E-step : 각각의 데이터 포인트 x(i)은 특정 클러스터 z(i)로 부터 발생한 후 사후확률Qi(z(i))를 평가한다. 아래의 식 참조
 
 <br>
 
 **12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
 
-&#10230;
+&#10230; M-step : 데이터 포인트 x(i)에 대한 클러스트의 특정 가중치로 사후확률 Qi(z(i))을 사용, 각 클러스트 모델을 개별적으로 재평가한다. 아래의 식 참조
 
 <br>
 
 **13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
 
-&#10230;
+&#10230; Gaussians 초기값, 기대 단계, 최대화 단계, 수렴
 
 <br>
 
 **14. k-means clustering**
 
-&#10230;
+&#10230; k-평균 군집화
 
 <br>
 
 **15. We note c(i) the cluster of data point i and μj the center of cluster j.**
 
-&#10230;
+&#10230; c(i)는 데이터 포인트 i 와 j군집의 중앙인 μj 들의 군집이다.
 
 <br>
 
 **16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
 
-&#10230;
+&#10230; 알고리즘 - 군집 중앙에 μ1,μ2,...,μk∈Rn 와 같이 무작위로 초기값을 잡은 후, k-평균 알고리즘이 수렴될때 까지 아래와 같은 단계를 반복한다.
 
 <br>
 
 **17. [Means initialization, Cluster assignment, Means update, Convergence]**
 
-&#10230;
+&#10230; 평균 초기값, 군집분할, 평균 재조정, 수렴
 
 <br>
 
 **18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
 
-&#10230;
+&#10230; 왜곡 함수 - 알고리즘이 수렴하는지를 확인하기 위해서는 아래와 같은 왜곡함수를 정의해야 합니다.
 
 <br>
 
 **19. Hierarchical clustering**
 
-&#10230;
+&#10230; 계층적 군집분석
 
 <br>
 
 **20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
 
-&#10230;
+&#10230; 알고리즘 - 연속적 방식으로 중첩된 클러스트를 구축하는 결합형 계층적 접근방식을 사용하는 군집 알고리즘이다.
 
 <br>
 
 **21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
 
-&#10230;
+&#10230; 종류 - 다양한 목적함수의 최적화를 목표로하는 다양한 종류의 계층적 군집분석 알고리즘들이 있으며, 아래 표와 같이 요약되어 있다.
 
 <br>
 
 **22. [Ward linkage, Average linkage, Complete linkage]**
 
-&#10230;
+&#10230; Ward 연결법, 평균 연결법, 완전 연결법
 
 <br>
 
 **23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
 
-&#10230;
+&#10230; 군집 거리 내에서의 최소화, 한쌍의 군집간 평균거리의 최소화, 한쌍의 군집간 최대거리의 최소화
 
 <br>
 
 **24. Clustering assessment metrics**
 
-&#10230;
+&#10230; 군집화 평가 metrics
 
 <br>
 
 **25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
 
-&#10230;
+&#10230; 비지도학습 환경에서는, 지도학습 환경과는 다르게 실측자료에 라벨링이 없기 때문에 종종 모델에 대한 성능평가가 어렵다.
 
 <br>
 
 **26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
 
-&#10230;
+&#10230; 실루엣 계수 -  a와 b를 같은 클래스의 다른 모든점과 샘플 사이의 평균거리와 다음 가장 가까운 군집의 다른 모든 점과 샘플사이의 평균거리로 표기하면 단일 샘플에 대한 실루엣 계수 s는 다음과 같이 정의할 수 있다. 
 
 <br>
 
 **27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
 
-&#10230;
+&#10230; Calinski-Harabaz 색인 - k개 군집에 Bk와 Wk를 표기하면, 다음과 같이 각각 정의 된 군집간 분산행렬이다.
 
 <br>
 
 **28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
 
-&#10230;
+&#10230; Calinski-Harabaz 색인 s(k)는 군집모델이 군집화를 얼마나 잘 정의하는지를 나타낸다. 가령 높은 점수일수록 군집이 더욱 밀도있으며 잘 분리되는 형태이다. 아래와 같은 정의를 따른다. 
 
 <br>
 
 **29. Dimension reduction**
 
-&#10230;
+&#10230; 차원 축소
 
 <br>
 
 **30. Principal component analysis**
 
-&#10230;
+&#10230; 주성분 분석
 
 <br>
 
 **31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
 
-&#10230;
+&#10230; 차원축소 기술은 데이터를 반영하는 최대 분산방향을 찾는 기술입니다.
 
 <br>
 
 **32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
-&#10230;
+&#10230; 고유값, 고유벡터 - A∈Rn×n 행렬이 주어질때, λ는 A의 고유값이 되며, 만약 z∈Rn∖{0} 벡터가 있다면 고유함수이다. 
 
 <br>
 
 **33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230;
+&#10230; 스펠트럼 정리 - A∈Rn×n 이라고 하자 만약 A가 대칭이라면, A는 실수 직교 행렬 U∈Rn×n에 의해 대각행렬로 만들 수 있다.
 
 <br>
 
-**34. diagonal**
+**34. diagonal** 
 
-&#10230;
+&#10230; 대각선
 
 <br>
 
@@ -211,7 +211,7 @@
 **36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
 dimensions by maximizing the variance of the data as follows:**
 
-&#10230;
+&#10230; 알고리즘 - 주성분 분석 
 
 <br>
 
@@ -253,7 +253,7 @@ dimensions by maximizing the variance of the data as follows:**
 
 **43. Independent component analysis**
 
-&#10230;
+&#10230; 독립성분분석
 
 <br>
 
@@ -265,13 +265,13 @@ dimensions by maximizing the variance of the data as follows:**
 
 **45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
 
-&#10230;
+&#10230; 가정 - 우리는 data x가 n차원의 source vector s=(s1,...,sn)에서부터 생성되었음을 가정한다. 이때 si는 독립적인 확률변수에서 나왔으며, 
 
 <br>
 
 **46. The goal is to find the unmixing matrix W=A−1.**
 
-&#10230;
+&#10230; 목표는 
 
 <br>
 
@@ -295,19 +295,19 @@ dimensions by maximizing the variance of the data as follows:**
 
 **50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
 
-&#10230;
+&#10230; 
 
 <br>
 
 **51. The Machine Learning cheatsheets are now available in Japanese.**
 
-&#10230;
+&#10230; 머신러닝 cheatsheet들은 일본어로도 이용 가능하다
 
 <br>
 
 **52. Original authors**
 
-&#10230;
+&#10230; 원작자
 
 <br>
 
@@ -325,16 +325,16 @@ dimensions by maximizing the variance of the data as follows:**
 
 **55. [Introduction, Motivation, Jensen's inequality]**
 
-&#10230;
+&#10230; 소개, 동기부여, 얀센 부등식
 
 <br>
 
 **56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
 
-&#10230;
+&#10230; 군집화, 기댓값-최대화, k-means, 
 
 <br>
 
 **57. [Dimension reduction, PCA, ICA]**
 
-&#10230;
+&#10230; 차원축소, 주성분분석(PCA), 

From 41c3949dc3776d8d4d57088b436bb2b2d650e8c8 Mon Sep 17 00:00:00 2001
From: sy95lee <37721312+sy95lee@users.noreply.github.com>
Date: Fri, 26 Oct 2018 17:58:54 +0900
Subject: [PATCH 030/531] Update refresher-linear-algebra.md

---
 ko/refresher-linear-algebra.md | 56 +++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/ko/refresher-linear-algebra.md b/ko/refresher-linear-algebra.md
index a6b440d1e..0f648361e 100644
--- a/ko/refresher-linear-algebra.md
+++ b/ko/refresher-linear-algebra.md
@@ -1,150 +1,150 @@
 **1. Linear Algebra and Calculus refresher**
 
-&#10230;
+&#10230; 선형대수와 미적분학 복습
 
 <br>
 
 **2. General notations**
 
-&#10230;
+&#10230; 일반적인 개념
 
 <br>
 
 **3. Definitions**
 
-&#10230;
+&#10230; 정의
 
 <br>
 
 **4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
 
-&#10230;
+&#10230; 벡터 - x∈Rn는 n개의 요소를 가진 벡터이고, xi∈R는 i번째 요소이다.
 
 <br>
 
 **5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
 
-&#10230;
+&#10230; 행렬 - A∈Rm×n는 m개의 행과 n개의 열을 가진 행렬이고, Ai,j∈R는 i번째 행, j번째 열에 있는 원소이다.
 
 <br>
 
 **6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
 
-&#10230;
+&#10230; 위에서 정의된 벡터 x는 n×1행렬로 볼 수 있으며, 열벡터라고도 불린다.
 
 <br>
 
 **7. Main matrices**
 
-&#10230;
+&#10230; 주요 행렬
 
 <br>
 
 **8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
 
-&#10230;
+&#10230; 단위행렬 - 단위행렬 I∈Rn×n는 대각성분이 모두 1이고 대각성분이 아닌 성분은 모두 0인 정사각행렬이다.
 
 <br>
 
 **9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
 
-&#10230;
+&#10230; remark: 모든 행렬 A∈Rn×n에 대하여, A×I=I×A=A를 만족한다.
 
 <br>
 
 **10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
 
-&#10230;
+&#10230; 대각행렬 - 대각행렬 D∈Rn×n는 대각성분은 모두 0이 아니고, 대각성분이 아닌 성분은 모두 0인 정사각행렬이다.
 
 <br>
 
 **11. Remark: we also note D as diag(d1,...,dn).**
 
-&#10230;
+&#10230; D를 diag(d1,...,dn)라고도 표시한다.
 
 <br>
 
 **12. Matrix operations**
 
-&#10230;
+&#10230; 행렬 연산
 
 <br>
 
 **13. Multiplication**
 
-&#10230;
+&#10230; 곱셈
 
 <br>
 
 **14. Vector-vector ― There are two types of vector-vector products:**
 
-&#10230;
+&#10230; 벡터-벡터 - 벡터간 연산에는 두가지 종류가 있다.
 
 <br>
 
 **15. inner product: for x,y∈Rn, we have:**
 
-&#10230;
+&#10230; 내적 : x,y∈Rn에 대하여, 
 
 <br>
 
 **16. outer product: for x∈Rm,y∈Rn, we have:**
 
-&#10230;
+&#10230; 외적 : x∈Rm,y∈Rn에 대하여, 
 
 <br>
 
 **17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
 
-&#10230;
+&#10230; 행렬-벡터 - 행렬 A∈Rm×n와 벡터 x∈Rn의 곱은 다음을 만족하는 Rn크기의 벡터이다.
 
 <br>
 
 **18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
 
-&#10230;
+&#10230; aTr,i는 A의 벡터행, ac,j는 A의 벡터열, xi는 x의 성분이다.
 
 <br>
 
 **19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
 
-&#10230;
+&#10230; 행렬 A∈Rm×n와 행렬 B∈Rn×p의 곱은 다음을 만족하는 Rn×p크기의 행렬이다.
 
 <br>
 
 **20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
 
-&#10230;
+&#10230; aTr,i,bTr,i는 A,B의 벡터행, ac,j,bc,j는 A,B의 벡터열이다.
 
 <br>
 
 **21. Other operations**
 
-&#10230;
+&#10230; 그 외 연산
 
 <br>
 
 **22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
 
-&#10230;
+&#10230; 전치 - 행렬 A∈Rm×n의 전치 AT는 모든 성분을 뒤집은 것이다.
 
 <br>
 
 **23. Remark: for matrices A,B, we have (AB)T=BTAT**
 
-&#10230;
+&#10230; 행렬 A,B에 대하여, (AB)T=BTAT가 성립힌다.
 
 <br>
 
 **24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
 
-&#10230;
+&#10230; 역행렬 - 가역행렬 A의 역행렬은 A-1로 표기하며, 유일하다.
 
 <br>
 
 **25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
 
-&#10230;
+&#10230; 모든 정사각행렬이 역행렬을 갖는 것은 아니다. 그리고, 행렬 A,B에 대하여 (AB)−1=B−1A−1가 성립힌다.
 
 <br>
 
@@ -156,13 +156,13 @@
 
 **27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
 
-&#10230;
+&#10230; 행렬 A,B에 대하여, tr(AT)=tr(A)와 tr(AB)=tr(BA)가 성립힌다.
 
 <br>
 
 **28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
 
-&#10230;
+&#10230; 행렬식 - 정사각행렬 A∈Rn×n의 행렬식 |A| 또는 det(A)는 
 
 <br>
 
@@ -174,7 +174,7 @@
 
 **30. Matrix properties**
 
-&#10230;
+&#10230; 행렬의 성질
 
 <br>
 

From bcf3cd19c5b3163b7a7649c3fd59726eb6eb38ec Mon Sep 17 00:00:00 2001
From: qunaieer <fsaq2000@gmail.com>
Date: Sun, 28 Oct 2018 00:17:31 +0300
Subject: [PATCH 031/531] Update ar/cheatsheet-supervised-learning.md

First draft is finished, I need to go back and heaveliy revise it. It is still not finished.
---
 ar/cheatsheet-supervised-learning.md | 110 +++++++++++++--------------
 1 file changed, 55 insertions(+), 55 deletions(-)

diff --git a/ar/cheatsheet-supervised-learning.md b/ar/cheatsheet-supervised-learning.md
index 4e5b2c2cd..2967cb7ca 100644
--- a/ar/cheatsheet-supervised-learning.md
+++ b/ar/cheatsheet-supervised-learning.md
@@ -241,187 +241,187 @@ Support Vector Machines
 
 **41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
 
-تهدف Support Vector Machines إلى العثور على الخط الذي يعظم المسافة الدنيا إلى الخط:
+تهدف Support Vector Machines إلى العثور على الخط الذي يعظم أصغر مسافة إلى الخط:
 
 <br>
 
 **42: Optimal margin classifier ― The optimal margin classifier h is such that:**
 
-&#10230;
+خوارزمية تصنيف الهامش الأمثل (Optimal margin classifier) - تعرَّف خوارزمية تصنيف الهامش الأمثل h كالتالي:
 
 <br>
 
 **43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
 
-&#10230;
+حيث (w,b)∈Rn×R هو الحل لمشكلة التحسين (optimization) التالية:
 
 <br>
 
 **44. such that**
 
-&#10230;
+بحيث
 
 <br>
 
 **45. support vectors**
 
-&#10230;
+المتجهات الداعمة (support vectors)
 
 <br>
 
 **46. Remark: the line is defined as wTx−b=0.**
 
-&#10230;
+ملاحظة: يتم تعريف الخط بهذه المعادلة wTx−b=0.
 
 <br>
 
 **47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
 
-&#10230;
+الفرق المفصلي (Hinge loss) - يستخدم الفرق المفصلي في حل SVM ويعرف على النحو التالي:
 
 <br>
 
 **48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
 
-&#10230;
+النواة (Kernel) - إذا كان لدينا دالة تحويل الخصائص (features) ϕ، يمكننا تعريف النواة K كالتالي:
 
 <br>
 
 **49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
 
-&#10230;
+عملياً تُعَرَّف الدالة K عن طريق المعادلة K(x,z)=exp(−||x−z||22σ2)، ويطلق عليها النواة الجاوسية (Gaussian kernel)، وهي من الأكثر استخداماً.
 
 <br>
 
 **50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
 
-&#10230;
+[فصل غير خطي، استخدام النواة للتحويل، خط القرار في الفضاء الأصلي]
 
 <br>
 
 **51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
 
-&#10230;
+ملاحظة: نقول أننا نستخدم "حيلة النواة" لحساب دالة التكلفة عند استخدام النواة لأننا في الحقيقة لا نحتاج أن نعرف التحويل الصريح ϕ، الذي يكون في الغالب شديد التعقيد. ولكن، نحتاج أن فقط أن نحسب القيم K(x,z).
 
 <br>
 
 **52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
 
-&#10230;
+اللّاغرانجي (Lagrangian) - يتم تعريف اللّاغرانجي L(w,b) على النحو التالي: 
 
 <br>
 
 **53. Remark: the coefficients βi are called the Lagrange multipliers.**
 
-&#10230;
+ملاحظة: المعاملات (coefficients) βi يطلق عليها مضروبات لاغرانج (Lagrange multipliers).
 
 <br>
 
 **54. Generative Learning**
 
-&#10230;
+التعلم التوليدي (Generative Learning)
 
 <br>
 
 **55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
 
-&#10230;
+النموذج التوليدي في البداية يحاول أن يتعلم كيف تم توليد البيانات عن طريق تقدير P(x|y)، التي يمكن حينها استخدامها لتقدير P(y|x) باستخدام قانون بايز (Bayes' rule).
 
 <br>
 
 **56. Gaussian Discriminant Analysis**
 
-&#10230;
+تحليل التمايز الجاوسي (Gaussian Discriminant Analysis)
 
 <br>
 
 **57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
 
-&#10230;
+الإطار - تحليل التمايز الجاوسي يفترض أن y و x|y=0 و x|y=1 بحيث يكونوا كالتالي:
 
 <br>
 
 **58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
 
-&#10230;
+التقدير - الجدول التالي يلخص أهم التي يمكننا التوصل لها عند تعظيم الأرجحية (likelihood):
 
 <br>
 
 **59. Naive Bayes**
 
-&#10230;
+بايز البسيط (Naive Bayes)
 
 <br>
 
 **60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
 
-&#10230;
+الافتراض - يفترض نموذج بايز البسيط أن جميع الخصائص لكل نقطة بيانات مستقلة (independent):
 
 <br>
 
 **61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
 
-&#10230;
+الحل - تعظيم الأرجحية اللوغاريثمية (log-likelihood) يعطينا الحلول التالية إذا كان k∈{0,1}، l∈[[1,L]]:
 
 <br>
 
 **62. Remark: Naive Bayes is widely used for text classification and spam detection.**
 
-&#10230;
+ملاحظة: بايز البسيط يستخدم بشكل واسع لتصنيف النصوص واكتشاف البريد الاكتروني المزعج.
 
 <br>
 
 **63. Tree-based and ensemble methods**
 
-&#10230;
+الطرق الشجرية (tree-based) والمجموعية (ensemble)
 
 <br>
 
 **64. These methods can be used for both regression and classification problems.**
 
-&#10230;
+هذه الطرق يمكن استخدامها لكلٍ من مشاكل الارتباط (regression) والتصنيف (classification).
 
 <br>
 
 **65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
 
-&#10230;
+CART - التصنيف والارتباط الشجري (CART)، والاسم الشائع له أشجار القرار (decision trees)، يمكن أن يمثل كأشجار ثنائية (binary trees). من المزايا لهذه الطريقة إمكانية تفسيرها بسهولة.
 
 <br>
 
 **66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
 
-&#10230;
+الغابة العشوائية (Random forest) - هي أحد الطرق الشجرية التي تستخدم عدداً كبيراً من أشجار القرار مبنية باستخدام مجموعة عشوائية من الخصائص. بخلاف شجرة القرار البسيطة، لا يمكن تفسير النموذج بسهولة، ولكن أدائها العالي جعلها أحد الخوارزمية المشهورة.
 
 <br>
 
 **67. Remark: random forests are a type of ensemble methods.**
 
-&#10230;
+ملاحظة: أشجار القرار نوع من الخوارزميات المجموعية (ensemble).
 
 <br>
 
 **68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
 
-&#10230;
+التعزيز (Boosting) - فكرة خوارزميات التعزيز هي دمج عدة خوارزميات تعلم ضعيفة لتكوين نموذج قوي. الطرق الأساسية ملخصة في الجدول التالي:
 
 <br>
 
 **69. [Adaptive boosting, Gradient boosting]**
 
-&#10230;
+[الدعم المتكيف (Adaptive boosting)، الدعم التفاضلي (Gradient boosting)]
 
 <br>
 
 **70. High weights are put on errors to improve at the next boosting step**
 
-&#10230;
+يتم التركيز على مواطن الخطأ لتحسين النتيجة في الخطوة التالية.
 
 <br>
 
 **71. Weak learners trained on remaining errors**
 
-&#10230;
+يتم تدريب خوارزميات التعلم الضعيفة على الأخطاء المتبقية.
 
 <br>
 
@@ -429,140 +429,140 @@ Support Vector Machines
 
 &#10230;
 
-<br>
+طرق أخرى غير حدودية (non-parametric)
 
 **73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
 
-&#10230;
+خوارزمية أقرب الجيران (k-nearest neighbors) - تعتبر خوارزمية أقرب الجيران، وتعرف بـ k-NN، طريقة غير حدودية حيث يتم تحديد نتيجة نقطة من البيانات من خلال عدد k من البيانات المجاورة في مجموعة التدريب. ويمكن استخدامها للتصنيف والارتباط.
 
 <br>
 
 **74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
 
-&#10230;
+ملاحظة: كلما زاد المُعامل k، كلما زاد الانحياز (bias)، وكلما نقص k، زاد التباين (variance).
 
 <br>
 
 **75. Learning Theory**
 
-&#10230;
+نظرية التعلُّم
 
 <br>
 
 **76. Union bound ― Let A1,...,Ak be k events. We have:**
 
-&#10230;
+حدود الاتّحاد (Union bound) - لنجعل A1,...,Ak تمثل k حدث. فيكون لدينا:
 
 <br>
 
 **77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
 
-&#10230;
+لا مساواة هوفدينج (Hoeffding) - لنجعل Z1,..,Zm تمثل m متغير مستقلة وموزعة بشكل مماثل (iid) مأخوذة من توزيع برنولي (Bernoulli distribution) ذا معامل ϕ. لنجعل ˆϕ متوسط العينة (sample mean) و γ>0 ثابت. فيكون لدينا:
 
 <br>
 
 **78. Remark: this inequality is also known as the Chernoff bound.**
 
-&#10230;
+ملاحظة: هذه اللا مساواة تعرف كذلك بحد كيرنوف (Chernoff bound).
 
 <br>
 
 **79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
 
-&#10230;
+خطأ التدريب - ليكن لدينا خوارزمية التصنيف h، يمكن تعريف خطأ التدريب ˆϵ(h)، ويعرف كذلك بالخطر التجريبي أو الخطأ التجريبي، كالتالي:
 
 <br>
 
 **80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
 
-&#10230;
+Probably Approximately Correct (PAC) - PAC هو إطار يتم من خلاله إثبات العديد من نظريات التعلم، ويحتوي على الافتراضات التالية:
 
 <br>
 
 **81: the training and testing sets follow the same distribution **
 
-&#10230;
+مجموعتي التدريب والاختبار تتبعان نفس التوزيع.
 
 <br>
 
 **82. the training examples are drawn independently**
 
-&#10230;
+عينات التدريب تؤخذ بشكل مستقل.
 
 <br>
 
 **83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
 
-&#10230;
+Shattering - إذا كان لدينا المجموعة S={x(1),...,x(d)}، ومجموعة نماذج H، نقول أن H shatters S إذا كان لكل مجموعة أهداف (labels) {y(1),...,y(d)} لدينا:
 
 <br>
 
 **84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
 
-&#10230;
+نظرية الحد الأعلى (Upper bound theorem) - لنجعل H فئة فرضية محدودة (finite hypothesis class) بحيث |H|=k، و δ وحجم العينة m ثابتين. حينها سيكون لدينا، مع احتمال على الأقل 1−δ، التالي:
 
 <br>
 
 **85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
 
-&#10230;
+بُعد VC - بُعد فابنك-شيرفونينكز (Vapnik-Chervonenkis) لفئة فرضية محدودة (finite hypothesis class) H، ويرمز له بـ VC(H)، هو حجم أكبر مجموعة (set) التي shattered by H.
 
 <br>
 
 **86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
 
-&#10230;
+ملاحظة: بُعد VC لـ H = {مجموعة التصنيفات الخطية في بُعدين} يساوي 3.
 
 <br>
 
 **87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
 
-&#10230;
+نظرية فابنك (Vapnik) - ليكن لدينا H، مع VC(H)=d وعدد عيّنات التدريب m. سيكون لدينا، مع احتمال على الأقل 1−δ، التالي:
 
 <br>
 
 **88. [Introduction, Type of prediction, Type of model]**
 
-&#10230;
+[مقدمة، نوع التوقع، نوع النموذج]
 
 <br>
 
 **89. [Notations and general concepts, loss function, gradient descent, likelihood]**
 
-&#10230;
+[تعريفات ومفاهيم أساسية، دالة الفرق، الهبوط التفاضلي، الأرجحية]
 
 <br>
 
 **90. [Linear models, linear regression, logistic regression, generalized linear models]**
 
-&#10230;
+[النماذج الخطيّة، الارتباط الخطّي، الارتباط اللوجستي، النماذج الخطية العامة]
 
 <br>
 
 **91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
 
-&#10230;
+[Support vector machines، خوارزمية تصنيف الهامش الأمثل، الفرق المفصلي، النواة]
 
 <br>
 
 **92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
 
-&#10230;
+[التعلم التوليدي، تحليل التمايز الجاوسي، بايز البسيط]
 
 <br>
 
 **93. [Trees and ensemble methods, CART, Random forest, Boosting]**
 
-&#10230;
+[الطرق الشجرية والمجموعية، التصنيف والارتباط الشجري (CART)، الغابة العشوائية (Random forest)، التعزيز (Boosting)]
 
 <br>
 
 **94. [Other methods, k-NN]**
 
-&#10230;
+[طرق أخرى، خوارزمية أقرب الجيران (k-NN)]
 
 <br>
 
 **95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
 
-&#10230;
+[نظرية التعلُّم، لا مساواة هوفدينج (Hoeffding)، PAC، بُعد VC]

From cf4827a1359c65aa73b8521c1e60af1274702457 Mon Sep 17 00:00:00 2001
From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com>
Date: Mon, 29 Oct 2018 09:57:09 +0900
Subject: [PATCH 032/531] Update cheatsheet-unsupervised-learning.md

---
 ko/cheatsheet-unsupervised-learning.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md
index c76e2d3af..76a0d5361 100644
--- a/ko/cheatsheet-unsupervised-learning.md
+++ b/ko/cheatsheet-unsupervised-learning.md
@@ -192,7 +192,7 @@
 
 **33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230; 스펠트럼 정리 - A∈Rn×n 이라고 하자 만약 A가 대칭이라면, A는 실수 직교 행렬 U∈Rn×n에 의해 대각행렬로 만들 수 있다.
+&#10230; 스펙트럼 정리 - A∈Rn×n 이라고 하자 만약 A가 대칭이라면, A는 실수 직교 행렬 U∈Rn×n에 의해 대각행렬로 만들 수 있다.
 
 <br>
 
@@ -204,7 +204,7 @@
 
 **35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
 
-&#10230;
+&#10230; 
 
 <br>
 

From 5d50d992eedd3d8fae021edcfc3bb5ac9f8f760c Mon Sep 17 00:00:00 2001
From: Wooil <38076110+WooilJeong@users.noreply.github.com>
Date: Tue, 30 Oct 2018 08:16:38 +0900
Subject: [PATCH 033/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 27d30d4fc..6f55eb95c 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -65,6 +65,9 @@
 
 --hi
 
+--ko
+  Wooil Jeong (translation of machine learning tips and tricks)
+
 --ja
 
 --pt

From fe746f2d431a0f2fbe11c26d89b989126fdd99ed Mon Sep 17 00:00:00 2001
From: Wooil <38076110+WooilJeong@users.noreply.github.com>
Date: Tue, 30 Oct 2018 08:18:38 +0900
Subject: [PATCH 034/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 27d30d4fc..9c4b2c733 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -65,6 +65,9 @@
 
 --hi
 
+--ko
+  Wooil Jeong (translation of probabilities and statistics)
+
 --ja
 
 --pt

From f2439f278140f68e2364a600c39d850d8778a358 Mon Sep 17 00:00:00 2001
From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com>
Date: Thu, 1 Nov 2018 15:55:04 +0900
Subject: [PATCH 035/531] Update cheatsheet-unsupervised-learning.md

---
 ko/cheatsheet-unsupervised-learning.md | 44 +++++++++++++-------------
 1 file changed, 22 insertions(+), 22 deletions(-)

diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md
index 76a0d5361..aab39f85f 100644
--- a/ko/cheatsheet-unsupervised-learning.md
+++ b/ko/cheatsheet-unsupervised-learning.md
@@ -102,7 +102,7 @@
 
 **18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
 
-&#10230; 왜곡 함수 - 알고리즘이 수렴하는지를 확인하기 위해서는 아래와 같은 왜곡함수를 정의해야 합니다.
+&#10230; 왜곡 함수 - 알고리즘이 수렴하는지를 확인하기 위해서는 아래와 같은 왜곡함수를 정의해야 한다.
 
 <br>
 
@@ -120,7 +120,7 @@
 
 **21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
 
-&#10230; 종류 - 다양한 목적함수의 최적화를 목표로하는 다양한 종류의 계층적 군집분석 알고리즘들이 있으며, 아래 표와 같이 요약되어 있다.
+&#10230; 종류 - 다양한 목적함수의 최적화를 목표로하는 다양한 종류의 계층적 군집분석 알고리즘들이 있으며, 아래 표와 같이 요약되어있다.
 
 <br>
 
@@ -180,7 +180,7 @@
 
 **31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
 
-&#10230; 차원축소 기술은 데이터를 반영하는 최대 분산방향을 찾는 기술입니다.
+&#10230; 차원축소 기술은 데이터를 반영하는 최대 분산방향을 찾는 기술이다.
 
 <br>
 
@@ -204,50 +204,50 @@
 
 **35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
 
-&#10230; 
+&#10230; 참조: 가장 큰 고유값과 연관된 고유 벡터를 행렬 A의 주요 고유벡터라고 부른다
 
 <br>
 
 **36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
 dimensions by maximizing the variance of the data as follows:**
 
-&#10230; 알고리즘 - 주성분 분석 
+&#10230; 알고리즘 - 주성분 분석(PCA) 절차는 데이터 분산을 최대화하여 k 차원의 데이터를 투영하는 차원 축소 기술로 다음과 같이 따른다.
 
 <br>
 
 **37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
 
-&#10230;
+&#10230; 1단계: 평균을 0으로 표준편차가 1이되도록 데이터를 표준화한다. 
 
 <br>
 
 **38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
 
-&#10230;
+&#10230; 2단계: 실제 고유값과 대칭인 Σ=1mm∑i=1x(i)x(i)T∈Rn×n를 계산합니다. 
 
 <br>
 
 **39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
 
-&#10230;
+&#10230; 3단계: k 직교 고유벡터의 합을 u1,...,uk∈Rn와 같이 계산한다. 다시말하면, 가장 큰 고유값 k의 직교 고유벡터이다. 
 
 <br>
 
 **40. Step 4: Project the data on spanR(u1,...,uk).**
 
-&#10230;
+&#10230; 4단계: R(u1,...,uk) 범위에 데이터를 투영하자.
 
 <br>
 
 **41. This procedure maximizes the variance among all k-dimensional spaces.**
 
-&#10230;
+&#10230; 해당 절차는 모든 k-차원의 공간들 사이에 분산을 최대화 하는것이다. 
 
 <br>
 
 **42. [Data in feature space, Find principal components, Data in principal components space]**
 
-&#10230;
+&#10230; 변수공간의 데이터, 주요성분들 찾기, 주요성분공간의 데이터
 
 <br>
 
@@ -259,67 +259,67 @@ dimensions by maximizing the variance of the data as follows:**
 
 **44. It is a technique meant to find the underlying generating sources.**
 
-&#10230;
+&#10230; 근원적인 생성원을 찾기위한 기술을 의미한다.
 
 <br>
 
 **45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
 
-&#10230; 가정 - 우리는 data x가 n차원의 source vector s=(s1,...,sn)에서부터 생성되었음을 가정한다. 이때 si는 독립적인 확률변수에서 나왔으며, 
+&#10230; 가정 - 다음과 같이 우리는 데이터 x가 n차원의 소스벡터 s=(s1,...,sn)에서부터 생성되었음을 가정한다. 이때 si는 독립적인 확률변수에서 나왔으며, 혼합 및 비특이 행렬 A를 통해 생성된다고 가정한다. 
 
 <br>
 
 **46. The goal is to find the unmixing matrix W=A−1.**
 
-&#10230; 목표는 
+&#10230; 비혼합 행렬 W=A−1를 찾는 것을 목표로 한다.
 
 <br>
 
 **47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
 
-&#10230;
+&#10230; Bell과 Sejnowski 독립성분분석(ICA) 알고리즘 - 다음의 단계들을 따르는 비혼합 행렬 W를 찾는 알고리즘이다.
 
 <br>
 
 **48. Write the probability of x=As=W−1s as:**
 
-&#10230;
+&#10230; x=As=W−1s의 확률을 다음과 같이 기술한다.
 
 <br>
 
 **49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
 
-&#10230;
+&#10230; 주어진 학습데이터 {x(i),i∈[[1,m]]}에 로그우도를 기술하고 시그모이드 함수 g를 다음과 같이 표기한다.
 
 <br>
 
 **50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
 
-&#10230; 
+&#10230; 그러므로, 확률적 경사상승 학습 규칙은 각 학습예제 x(i)에 대해서 다음과 같이 W를 업데이트하는 것과 같다. 
 
 <br>
 
 **51. The Machine Learning cheatsheets are now available in Japanese.**
 
-&#10230; 머신러닝 cheatsheet들은 일본어로도 이용 가능하다
+&#10230; 머신러닝 cheatsheets는 현재 일본어로 제공된다.
 
 <br>
 
 **52. Original authors**
 
-&#10230; 원작자
+&#10230; 원저자
 
 <br>
 
 **53. Translated by X, Y and Z**
 
-&#10230;
+&#10230; X,Y,Z에 의해 번역되다. 
 
 <br>
 
 **54. Reviewed by X, Y and Z**
 
-&#10230;
+&#10230; X,Y,Z에 의해 검토되다.
 
 <br>
 

From 638941e55b72a919d841b75620c862541aa3288d Mon Sep 17 00:00:00 2001
From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com>
Date: Thu, 1 Nov 2018 16:00:15 +0900
Subject: [PATCH 036/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 27d30d4fc..5bc3ef12f 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -65,6 +65,9 @@
 
 --hi
 
+--ko
+  Kwang Hyeok Ahn (translation of Unsupervised Learning)
+
 --ja
 
 --pt

From 07ebeb3b6b08eab1de89d438434369e6ecd26b6a Mon Sep 17 00:00:00 2001
From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com>
Date: Thu, 1 Nov 2018 16:19:13 +0900
Subject: [PATCH 037/531] Update cheatsheet-unsupervised-learning.md

---
 ko/cheatsheet-unsupervised-learning.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md
index aab39f85f..b31611788 100644
--- a/ko/cheatsheet-unsupervised-learning.md
+++ b/ko/cheatsheet-unsupervised-learning.md
@@ -331,10 +331,10 @@ dimensions by maximizing the variance of the data as follows:**
 
 **56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
 
-&#10230; 군집화, 기댓값-최대화, k-means, 
+&#10230; 군집화, 기댓값-최대화, k-means, 계층 군집화, 측정지표
 
 <br>
 
 **57. [Dimension reduction, PCA, ICA]**
 
-&#10230; 차원축소, 주성분분석(PCA), 
+&#10230; 차원축소, 주성분분석(PCA), 독립성분분석(ICA) 

From 19ae7cc1cb4316833a97dc04c167477d74dd1b86 Mon Sep 17 00:00:00 2001
From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com>
Date: Thu, 1 Nov 2018 16:19:56 +0900
Subject: [PATCH 038/531] Update cheatsheet-unsupervised-learning.md

---
 ko/cheatsheet-unsupervised-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md
index b31611788..acf881d5d 100644
--- a/ko/cheatsheet-unsupervised-learning.md
+++ b/ko/cheatsheet-unsupervised-learning.md
@@ -331,7 +331,7 @@ dimensions by maximizing the variance of the data as follows:**
 
 **56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
 
-&#10230; 군집화, 기댓값-최대화, k-means, 계층 군집화, 측정지표
+&#10230; 군집화, 기댓값-최대화, k-means, 계층적 군집화, 측정지표
 
 <br>
 

From 23a0005b8f3406db269cd3fcbef515fa9cd3f81e Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Thu, 1 Nov 2018 15:20:58 -0700
Subject: [PATCH 039/531] Fix language name on template

---
 ko/cheatsheet-unsupervised-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md
index acf881d5d..47f231a98 100644
--- a/ko/cheatsheet-unsupervised-learning.md
+++ b/ko/cheatsheet-unsupervised-learning.md
@@ -299,7 +299,7 @@ dimensions by maximizing the variance of the data as follows:**
 
 <br>
 
-**51. The Machine Learning cheatsheets are now available in Japanese.**
+**51. The Machine Learning cheatsheets are now available in Korean.**
 
 &#10230; 머신러닝 cheatsheets는 현재 일본어로 제공된다.
 

From bbc6889a11ad8aa939d18c112400192606e99ff1 Mon Sep 17 00:00:00 2001
From: kwanghyeokahn <44485235+kwanghyeokahn@users.noreply.github.com>
Date: Fri, 2 Nov 2018 10:39:35 +0900
Subject: [PATCH 040/531] Update cheatsheet-unsupervised-learning.md

---
 ko/cheatsheet-unsupervised-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cheatsheet-unsupervised-learning.md
index 47f231a98..e961a88cc 100644
--- a/ko/cheatsheet-unsupervised-learning.md
+++ b/ko/cheatsheet-unsupervised-learning.md
@@ -301,7 +301,7 @@ dimensions by maximizing the variance of the data as follows:**
 
 **51. The Machine Learning cheatsheets are now available in Korean.**
 
-&#10230; 머신러닝 cheatsheets는 현재 일본어로 제공된다.
+&#10230; 머신러닝 cheatsheets는 현재 한국어로 제공된다.
 
 <br>
 

From 5b431586dd6c561e7b1b8d0a3ab1ac091b539374 Mon Sep 17 00:00:00 2001
From: sy95lee <37721312+sy95lee@users.noreply.github.com>
Date: Mon, 5 Nov 2018 17:57:16 +0900
Subject: [PATCH 041/531] Update refresher-linear-algebra.md

---
 ko/refresher-linear-algebra.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/ko/refresher-linear-algebra.md b/ko/refresher-linear-algebra.md
index 0f648361e..40068181e 100644
--- a/ko/refresher-linear-algebra.md
+++ b/ko/refresher-linear-algebra.md
@@ -180,19 +180,19 @@
 
 **31. Definitions**
 
-&#10230;
+&#10230; 정의
 
 <br>
 
 **32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
 
-&#10230;
+&#10230; 대칭 분해 - 주어진 행렬 A는 다음과 같이 대칭과 비대칭 부분으로 표현될 수 있다.
 
 <br>
 
 **33. [Symmetric, Antisymmetric]**
 
-&#10230;
+&#10230; [대칭, 비대칭]
 
 <br>
 
@@ -216,7 +216,7 @@
 
 **37. For x∈V, the most commonly used norms are summed up in the table below:**
 
-&#10230;
+&#10230; x∈V에 대해, 가장 일반적으로 사용되는 규범이 아래 표에 요약되어 있다.
 
 <br>
 
@@ -228,19 +228,19 @@
 
 **39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
 
-&#10230;
+&#10230; 일차 종속 - 집합 내의 벡터 중 하나가 다른 벡터들의 선형결합으로 정의될 수 있으면, 그 벡터 집합은 일차 종속이라고 한다.
 
 <br>
 
 **40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
 
-&#10230;
+&#10230; 비고 : 어느 벡터도 이런 방식으로 표현될 수 없다면, 그 벡터들은 일차 독립이라고 한다.
 
 <br>
 
 **41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
 
-&#10230;
+&#10230; 행렬 랭크 - 주어진 행렬 A의 랭크는 열에 의해 생성된 벡터공간의 차원이고, rank(A)라고 쓴다.
 
 <br>
 
@@ -258,7 +258,7 @@
 
 **44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
-&#10230;
+&#10230; 고유값, 고유벡터 - 주어진 행렬 A∈Rn×n에 대하여, 다음을 만족하는 벡터 z∈Rn∖{0}가 존재하면, z를 고유벡터라고 부르고, λ를 A의 고유값이라고 부른다.
 
 <br>
 

From 248dfbdc8e6a1acb3f84bd7e78fad9d1582dd7ff Mon Sep 17 00:00:00 2001
From: Redouane Lguensat <Redouane.lguensat@univ-grenoble-alpes.fr>
Date: Sun, 11 Nov 2018 01:11:19 +0100
Subject: [PATCH 042/531] Update cheatsheet-unsupervised-learning.md

---
 ar/cheatsheet-unsupervised-learning.md | 28 +++++++++++++++-----------
 1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/ar/cheatsheet-unsupervised-learning.md b/ar/cheatsheet-unsupervised-learning.md
index 8c91dabd3..b7ffe6002 100644
--- a/ar/cheatsheet-unsupervised-learning.md
+++ b/ar/cheatsheet-unsupervised-learning.md
@@ -87,14 +87,14 @@
 **13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
 
 <div dir="rtl">
-[تهيئة غاوسية، خطوة التوقع، خطوة التعظيم، التقاء]
+[ استهلالات غاوسية، خطوة التوقع، خطوة التعظيم، تقارب]
 </div>
 <br>
 
 **14. k-means clustering**
 
 <div dir="rtl">
-تجميع k-أوساط
+تجميع k-متوسطات
 </div>
 <br>
 
@@ -107,32 +107,36 @@
 
 **16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
 
-&#10230;
-
+<div dir="rtl">
+بعد الاستهلال العشوائي لمتوسطات التجمعات μ1,μ2,...,μk∈Rn، خوارزمية تجميع k-متوسطات تكرر الخطوة التالية حتى التقارب
+</div>
 <br>
 
 **17. [Means initialization, Cluster assignment, Means update, Convergence]**
 
-&#10230;
-
+<div dir="rtl">
+[استهلال المتوسطات، تعيين تجمع، تحديث المتوسطات، التقارب]</div>
 <br>
 
 **18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
 
-&#10230;
-
+<div dir="rtl">
+  دالة التشويه - لكي نتأكد من أن الخوارزمية تقاربت، ننظر إلى دالة التشويه المعرفة كما يلي:
+</div>
 <br>
 
 **19. Hierarchical clustering**
 
-&#10230;
-
+<div dir="rtl">
+  التجميع الهرمي
+</div>
 <br>
 
 **20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
 
-&#10230;
-
+<div dir="rtl">
+  خوارزمية - هي عبارة عن خوارزمية تجميع تعتمد على طريقة تجميعية هرمية تبني مجموعات متداخلة بشكل متتال
+</div>
 <br>
 
 **21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**

From c784af3b29bab655ccd8fe89921c1c20ef2e6baf Mon Sep 17 00:00:00 2001
From: Redouane Lguensat <Redouane.lguensat@univ-grenoble-alpes.fr>
Date: Sun, 11 Nov 2018 13:28:54 +0100
Subject: [PATCH 043/531] Update cheatsheet-unsupervised-learning.md

---
 ar/cheatsheet-unsupervised-learning.md | 69 +++++++++++++++-----------
 1 file changed, 39 insertions(+), 30 deletions(-)

diff --git a/ar/cheatsheet-unsupervised-learning.md b/ar/cheatsheet-unsupervised-learning.md
index b7ffe6002..05dce41e0 100644
--- a/ar/cheatsheet-unsupervised-learning.md
+++ b/ar/cheatsheet-unsupervised-learning.md
@@ -141,92 +141,101 @@
 
 **21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
 
-&#10230;
-
+<div dir="rtl">
+أنواع هنالك عدة أنواع من خوارزميات التجميع الهرمي التي ترمي إلى تحسين دوال هدف مختلفة، هاته الأنواع ملخصة في الجدول أسفله
+</div>
 <br>
 
 **22. [Ward linkage, Average linkage, Complete linkage]**
 
-&#10230;
-
+<div dir="rtl">
+[الربط البَينِي، الربط المتوسط، الربط الكامل]</div>
 <br>
 
 **23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
 
-&#10230;
-
+<div dir="rtl">
+[تقليل داخل مسافة التجمع، تقليل متوسط المسافات بين أزواج التجمعات، تقليل المسافة القصوى بين أزواج التجمعات]</div>
 <br>
 
 **24. Clustering assessment metrics**
 
-&#10230;
-
+<div dir="rtl">
+مقاييس تقدير التجميع
+</div>
 <br>
 
 **25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
 
-&#10230;
-
+<div dir="rtl">
+في إعداد للتعلم بدون إشراف، من الصعب غالبا تقدير أداء نموذج ما لأننا لا نتوفر على القيم الحقيقية كما كان الحال في إعداد التعلم تحت إشراف 
+</div>
 <br>
 
 **26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
 
-&#10230;
-
+<div dir="rtl">
+المعامل الظِلِّي - إذا رمزنا  aو b متوسط المسافة بين عينة و كل النقط المنتمية لنفس الصنف، و بين عينة  و كل النقط المنتمية لأقرب صنف، المعامل الظِلِّي s لعينة وحيدة معرف كالتالي:
+</div>
 <br>
 
 **27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
 
-&#10230;
-
+<div dir="rtl">
+مؤشر كالينسكي هاراباز - إذا رمزنا بk لعدد التجمعات، Bk و Wk مصفوفات التشتت بين التجمعات و داخلها معرفة كالتالي: </div>
 <br>
 
 **28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
 
-&#10230;
-
+<div dir="rtl">
+مؤشر كالينسكي هاراباز s(k) يعطي تقييما للتجمعات الناتجة عن نموذج تجميعي، بحيث كلما كان التقييم أعلى كلما دل ذلك على  أن التجمعات أكثر كثافة و أكثر انفصالا. هذا المؤشر معرّف كالتالي</div>
 <br>
 
 **29. Dimension reduction**
 
-&#10230;
-
+<div dir="rtl">
+تخفيض الأبعاد</div>
 <br>
 
 **30. Principal component analysis**
 
-&#10230;
-
+<div dir="rtl">
+تحليل المكون الرئيسي
+</div>
 <br>
 
 **31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
 
-&#10230;
-
+<div dir="rtl">
+إنها تقنية لخفض الأبعاد ترمي إلى إيجاد الاتجاهات المكبرة للتباين و التي تسقط عليها البيانات
+</div>
 <br>
 
 **32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
-&#10230;
-
+<div dir="rtl">
+  قيمة ذاتية، متجه ذاتي - لتكن A∈Rn×n مصفوفة ، نقول أن λ قيمة ذاتية للمصفوفة A إذا وُجِد متجه z∈Rn∖{0} يسمى متجها ذاتيا، بحيث:
+</div>
 <br>
 
 **33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230;
-
+<div dir="rtl">
+  نظرية الطّيف لتكن A∈Rn×n. إذا كانت A متماثلة فإنها شبه قطرية بمصفوفة متعامدة U∈Rn×n. إذا رمزنا Λ=diag(λ1,...,λn) ، لدينا:
+</div>
 <br>
 
 **34. diagonal**
 
-&#10230;
-
+<div dir="rtl">
+قطري
+</div>
 <br>
 
 **35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
 
-&#10230;
-
+<div dir="rtl">
+ملحوظة: المتجه الذاتي المرتبط بأكبر قيمة ذاتية يسمى بالمتجه الذاتي الرئيسي للمصفوفة A</div>
 <br>
 
 **36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k

From b06a4f6cfe84c4bda6a38f03e7053f7df2280afe Mon Sep 17 00:00:00 2001
From: Taichi Kato <me@taichikato.com>
Date: Thu, 15 Nov 2018 11:29:58 +0800
Subject: [PATCH 044/531] Created Japanese translation file for deepl learning.

---
 ja/cheatsheet-deep-learning.md | 321 +++++++++++++++++++++++++++++++++
 1 file changed, 321 insertions(+)
 create mode 100644 ja/cheatsheet-deep-learning.md

diff --git a/ja/cheatsheet-deep-learning.md b/ja/cheatsheet-deep-learning.md
new file mode 100644
index 000000000..5ba05ba91
--- /dev/null
+++ b/ja/cheatsheet-deep-learning.md
@@ -0,0 +1,321 @@
+**1. Deep Learning cheatsheet**
+
+&#10230; ディープラーニングチートシート
+
+<br>
+
+**2. Neural Networks**
+
+&#10230; ニューラルネットワーク
+
+<br>
+
+**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
+
+&#10230; ニューラルネットワークとは複数の層を用いて組まれる数学モデルです。代表的なネットワークとして畳み込みと再帰型ニューラルネットワークが挙げられます。
+
+<br>
+
+**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
+
+&#10230; 構造　－ ニューラルネットワークを組む上で重要な単語：
+
+<br>
+
+**5. [Input layer, hidden layer, output layer]**
+
+&#10230; [入力層, 隠れ層, 出力層]
+
+<br>
+
+**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
+
+&#10230; []
+
+<br>
+
+**7. where we note w, b, z the weight, bias and output respectively.**
+
+&#10230;
+
+<br>
+
+**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
+
+&#10230;
+
+<br>
+
+**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
+
+&#10230;
+
+<br>
+
+**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
+
+&#10230;
+
+<br>
+
+**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
+
+&#10230;
+
+<br>
+
+**13. As a result, the weight is updated as follows:**
+
+&#10230;
+
+<br>
+
+**14. Updating weights ― In a neural network, weights are updated as follows:**
+
+&#10230;
+
+<br>
+
+**15. Step 1: Take a batch of training data.**
+
+&#10230;
+
+<br>
+
+**16. Step 2: Perform forward propagation to obtain the corresponding loss.**
+
+&#10230;
+
+<br>
+
+**17. Step 3: Backpropagate the loss to get the gradients.**
+
+&#10230;
+
+<br>
+
+**18. Step 4: Use the gradients to update the weights of the network.**
+
+&#10230;
+
+<br>
+
+**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
+
+&#10230;
+
+<br>
+
+**20. Convolutional Neural Networks**
+
+&#10230;
+
+<br>
+
+**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
+
+&#10230;
+
+<br>
+
+**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
+
+&#10230;
+
+<br>
+
+**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
+
+&#10230;
+
+<br>
+
+**24. Recurrent Neural Networks**
+
+&#10230;
+
+<br>
+
+**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
+
+&#10230;
+
+<br>
+
+**26. [Input gate, forget gate, gate, output gate]**
+
+&#10230;
+
+<br>
+
+**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
+
+&#10230;
+
+<br>
+
+**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
+
+&#10230;
+
+<br>
+
+**29. Reinforcement Learning and Control**
+
+&#10230;
+
+<br>
+
+**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
+
+&#10230;
+
+<br>
+
+**31. Definitions**
+
+&#10230;
+
+<br>
+
+**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
+
+&#10230;
+
+<br>
+
+**33. S is the set of states**
+
+&#10230;
+
+<br>
+
+**34. A is the set of actions**
+
+&#10230;
+
+<br>
+
+**35. {Psa} are the state transition probabilities for s∈S and a∈A**
+
+&#10230;
+
+<br>
+
+**36. γ∈[0,1[ is the discount factor**
+
+&#10230;
+
+<br>
+
+**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
+
+&#10230;
+
+<br>
+
+**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
+
+&#10230;
+
+<br>
+
+**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
+
+&#10230;
+
+<br>
+
+**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
+
+&#10230;
+
+<br>
+
+**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
+
+&#10230;
+
+<br>
+
+**42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
+
+&#10230;
+
+<br>
+
+**43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
+
+&#10230;
+
+<br>
+
+**44. 1) We initialize the value:**
+
+&#10230;
+
+<br>
+
+**45. 2) We iterate the value based on the values before:**
+
+&#10230;
+
+<br>
+
+**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
+
+&#10230;
+
+<br>
+
+**47. times took action a in state s and got to s′**
+
+&#10230;
+
+<br>
+
+**48. times took action a in state s**
+
+&#10230;
+
+<br>
+
+**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
+
+&#10230;
+
+<br>
+
+**50. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+**51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
+
+&#10230;
+
+<br>
+
+**52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
+
+&#10230;
+
+<br>
+
+**53. [Recurrent Neural Networks, Gates, LSTM]**
+
+&#10230;
+
+<br>
+
+**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
+
+&#10230;

From 9f4fe2ad0540a484851e38ed66b9a92ffa9e6102 Mon Sep 17 00:00:00 2001
From: Taichi Kato <me@taichikato.com>
Date: Thu, 15 Nov 2018 18:24:32 +0800
Subject: [PATCH 045/531] Finished Neural Networks

---
 ja/cheatsheet-deep-learning.md | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/ja/cheatsheet-deep-learning.md b/ja/cheatsheet-deep-learning.md
index 5ba05ba91..631bdd5cd 100644
--- a/ja/cheatsheet-deep-learning.md
+++ b/ja/cheatsheet-deep-learning.md
@@ -18,7 +18,7 @@
 
 **4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
 
-&#10230; 構造　－ ニューラルネットワークを組む上で重要な単語：
+&#10230; 構造　－ ニューラルネットワークを組む上で重要な用語は以下の図により説明されます：
 
 <br>
 
@@ -30,85 +30,85 @@
 
 **6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
 
-&#10230; []
+&#10230; iをネットワーク上のi層目の層とし、隠れ層のj個目のユニットをjとすると：
 
 <br>
 
 **7. where we note w, b, z the weight, bias and output respectively.**
 
-&#10230;
+&#10230; この場合重み付けをw、バイアス項をb、出力をzとします。
 
 <br>
 
 **8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
 
-&#10230;
+&#10230; 活性化関数　ー ユニットの出力に非線形性を与える関数を活性化関数といいます。一般的には以下の関数がよく使われます：
 
 <br>
 
 **9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
 
-&#10230;
+&#10230; [Sigmoid(シグモイド関数), Tanh(双曲線関数), ReLU(ランプ関数), Leaky ReLU]
 
 <br>
 
 **10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
 
-&#10230;
+&#10230; 交差エントロピーロス　ー ニューラルネットにおいて交差エントロピーロスL(z,y)は頻繁に使われ、以下のように定義されています：
 
 <br>
 
 **11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
 
-&#10230;
+&#10230; 学習率　ー αやηで表される学習率は勾配法による重み付けのアップデートをする速度を表します。学習率は固定または適応的に変更することができます。現在一般的に使われている学習法はAdam（アダム）であり、学習率を適用させる方法です。
 
 <br>
 
 **12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
 
-&#10230;
+&#10230; 誤差逆伝播法（backpropagation）ー 誤差逆伝播法はニューラルネットの期待される出力値と実際の出力の差異を考慮し重み付けのアップデートをする方法の一つです。重みwに関する導関数は連鎖規則を使用して計算され、次の形式で表される：
 
 <br>
 
 **13. As a result, the weight is updated as follows:**
 
-&#10230;
+&#10230; 結果、重みは以下のようにアップデートされます：
 
 <br>
 
 **14. Updating weights ― In a neural network, weights are updated as follows:**
 
-&#10230;
+&#10230; 重みアップデート　ー ニューラルネットでは以下のように重みがアップデートされます：
 
 <br>
 
 **15. Step 1: Take a batch of training data.**
 
-&#10230;
+&#10230; ステップ１： 訓練データを１バッチ用意する。
 
 <br>
 
 **16. Step 2: Perform forward propagation to obtain the corresponding loss.**
 
-&#10230;
+&#10230; ステップ２：　フォワードプロパゲーションを行い誤差を求める。
 
 <br>
 
 **17. Step 3: Backpropagate the loss to get the gradients.**
 
-&#10230;
+&#10230; 求められた誤差を用い、傾斜を計算する。
 
 <br>
 
 **18. Step 4: Use the gradients to update the weights of the network.**
 
-&#10230;
+&#10230; 傾斜を使い誤差が小さくなるように重みを調整する。
 
 <br>
 
 **19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
 
-&#10230;
+&#10230;ドロップアウト　ー ドロップアウトはニューラルネット内の一部のユニットを非活性化させることにより過学習を防ぐテクニックである。実際には、ニューロンはある確率pで非活性、1-pの確率で活性化されるようになってる。
 
 <br>
 

From 2c18b0461fa89207310a446d725fa21c1f54841d Mon Sep 17 00:00:00 2001
From: Taichi Kato <me@taichikato.com>
Date: Thu, 15 Nov 2018 18:56:56 +0800
Subject: [PATCH 046/531] Added CNN and RNN

---
 ja/cheatsheet-deep-learning.md | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/ja/cheatsheet-deep-learning.md b/ja/cheatsheet-deep-learning.md
index 631bdd5cd..89eaa3039 100644
--- a/ja/cheatsheet-deep-learning.md
+++ b/ja/cheatsheet-deep-learning.md
@@ -114,61 +114,61 @@
 
 **20. Convolutional Neural Networks**
 
-&#10230;
+&#10230; 畳み込みニューラルネットワーク
 
 <br>
 
 **21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
 
-&#10230;
+&#10230; 畳み込みレイヤーの条件　ー Wを入力サイズ、Fを畳み込みレイヤーニューロンのサイズ、Pをゼロパディングの量とすると、与えられた体積に収まるニューロン数Nは次のようになります。
 
 <br>
 
 **22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
 
-&#10230;
+&#10230; バッチ正規化　ー バッチ{xi}を正規化するハイパーパラメータγ、βのステップです。バッチに修正したい平均値と分散値をμB,σ2Bとすると、正規化は以下のように行われます:
 
 <br>
 
 **23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
 
-&#10230;
+&#10230; これは通常、学習率を高め、初期値への依存性を減らすことを目的でFully Connected層と畳み込み層の後、非線形化を行う前に行われます。
 
 <br>
 
 **24. Recurrent Neural Networks**
 
-&#10230;
+&#10230; 再帰型ニューラルネットワーク (RNN)
 
 <br>
 
 **25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
 
-&#10230;
+&#10230; ゲートの種類　ー　典型的なRNNに使われるゲートです:
 
 <br>
 
 **26. [Input gate, forget gate, gate, output gate]**
 
-&#10230;
+&#10230; [入力ゲート, 忘却ゲート, ゲート, 出力ゲート]
 
 <br>
 
 **27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
 
-&#10230;
+&#10230; [セルに追加するべき？, セルを削除するべき？, 情報をどれだけセルに追加するべき？, セルをどの程度通すべき？]
 
 <br>
 
 **28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
 
-&#10230;
+&#10230; LSTM - A long short-term memory (LSTM) ネットワークは勾配消失問題を解決するために忘却ゲートが追加されているRNNの一種です。
 
 <br>
 
 **29. Reinforcement Learning and Control**
 
-&#10230;
+&#10230; 強化学習と
 
 <br>
 
@@ -294,28 +294,28 @@
 
 **50. View PDF version on GitHub**
 
-&#10230;
+&#10230; GitHubでPDF版を見る
 
 <br>
 
 **51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
 
-&#10230;
+&#10230; [ニューラルネットワーク, アーキテクチャ, 活性化関数, 誤差逆伝播法, ドロップアウト]
 
 <br>
 
 **52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
 
-&#10230;
+&#10230; [畳み込みニューラルネットワーク, 畳み込み層, バッチノーマライゼーション]
 
 <br>
 
 **53. [Recurrent Neural Networks, Gates, LSTM]**
 
-&#10230;
+&#10230; [再帰型ニューラルネットワーク, ゲート, LSTM]
 
 <br>
 
 **54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
 
-&#10230;
+&#10230; [強化学習, マルコフ決定過程, バリュー/ポリシー反復, 近似動的計画法, ポリシーサーチ]

From ab0586d1cb145be740004e983102618d0b31ab3e Mon Sep 17 00:00:00 2001
From: Taichi Kato <me@taichikato.com>
Date: Thu, 15 Nov 2018 19:29:38 +0800
Subject: [PATCH 047/531] Began Reinforcement learning

---
 ja/cheatsheet-deep-learning.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/ja/cheatsheet-deep-learning.md b/ja/cheatsheet-deep-learning.md
index 89eaa3039..0c81e075b 100644
--- a/ja/cheatsheet-deep-learning.md
+++ b/ja/cheatsheet-deep-learning.md
@@ -168,49 +168,49 @@
 
 **29. Reinforcement Learning and Control**
 
-&#10230; 強化学習と
+&#10230; 強化学習とコントロール
 
 <br>
 
 **30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
 
-&#10230;
+&#10230; 強化学習はある環境内においてエージェントが学習し、進化することを目標とします。
 
 <br>
 
 **31. Definitions**
 
-&#10230;
+&#10230; 定義
 
 <br>
 
 **32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
 
-&#10230;
+&#10230; マルコフ決定過程 ー  マルコフ決定過程(Markov decision process; MDP)を5タプル(S,A,{Psa},γ,R)としたとき：
 
 <br>
 
 **33. S is the set of states**
 
-&#10230;
+&#10230; Sは状態の有限集合
 
 <br>
 
 **34. A is the set of actions**
 
-&#10230;
+&#10230; Aは行動の有限集合
 
 <br>
 
 **35. {Psa} are the state transition probabilities for s∈S and a∈A**
 
-&#10230;
+&#10230; {Psa}は状態s∈Sと行動a∈Aの条件付き分布
 
 <br>
 
 **36. γ∈[0,1[ is the discount factor**
 
-&#10230;
+&#10230; γ∈[0,1[は割引因子と呼ばれる値
 
 <br>
 

From 3248c6c55934dc473f9142c8151ea9a0c983bc5d Mon Sep 17 00:00:00 2001
From: Taichi Kato <me@taichikato.com>
Date: Thu, 15 Nov 2018 20:58:28 +0800
Subject: [PATCH 048/531] Finished translating Reinforcement Learning

---
 ja/cheatsheet-deep-learning.md | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/ja/cheatsheet-deep-learning.md b/ja/cheatsheet-deep-learning.md
index 0c81e075b..322a9f2cc 100644
--- a/ja/cheatsheet-deep-learning.md
+++ b/ja/cheatsheet-deep-learning.md
@@ -204,7 +204,7 @@
 
 **35. {Psa} are the state transition probabilities for s∈S and a∈A**
 
-&#10230; {Psa}は状態s∈Sと行動a∈Aの条件付き分布
+&#10230; {Psa}は状態s∈Sと行動a∈Aの状態遷移確率
 
 <br>
 
@@ -216,79 +216,79 @@
 
 **37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
 
-&#10230;
+&#10230; R:S×A⟶R or R:S⟶Rはアルゴリズムが最大化したい報酬関数
 
 <br>
 
 **38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
 
-&#10230;
+&#10230; 政策 - 政策πは状態と行動を写像する関数π:S⟶A
 
 <br>
 
 **39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
 
-&#10230;
+&#10230; 備考: 状態sを与えられた際に行動a=π(s)を行うことを政策πを実行すると言う。
 
 <br>
 
 **40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
 
-&#10230;
+&#10230; 価値関数 - ある政策πとある状態sにおいて価値関数Vπを以下のように定義する：
 
 <br>
 
 **41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
 
-&#10230;
+&#10230; ベルマン方程式 - 政策πをとった価値関数Vπ∗に対する最適なベルマン方程式：
 
 <br>
 
 **42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
 
-&#10230;
+&#10230; 備考: 与えられた状態sに対する最適方針π*はこのようになります：
 
 <br>
 
 **43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
 
-&#10230;
+&#10230; 価値反復法アルゴリズム - 価値反復法アルゴリズムは２段階で行われます：
 
 <br>
 
 **44. 1) We initialize the value:**
 
-&#10230;
+&#10230; 1) 値を初期化する。
 
 <br>
 
 **45. 2) We iterate the value based on the values before:**
 
-&#10230;
+&#10230; 2) 前の値を元に値を繰り返す：
 
 <br>
 
 **46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
 
-&#10230;
+&#10230; 最尤推定 ー 状態遷移確率の最尤推定(maximum likelihood estimate; MLE)：
 
 <br>
 
 **47. times took action a in state s and got to s′**
 
-&#10230;
+&#10230; 状態sで行動aを行い状態s′に遷移した回数
 
 <br>
 
 **48. times took action a in state s**
 
-&#10230;
+&#10230; 状態sで行動aを行った回数
 
 <br>
 
 **49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
 
-&#10230;
+&#10230;  Q学習　ー Q学習は数学モデルを使わないQ値の評価手法であり、以下のように行われる：
 
 <br>
 

From 1a130bed9ec2c2472168d9734401bc6a1506ef0b Mon Sep 17 00:00:00 2001
From: sy95lee <37721312+sy95lee@users.noreply.github.com>
Date: Tue, 20 Nov 2018 12:58:17 +0900
Subject: [PATCH 049/531] Update refresher-linear-algebra.md

---
 ko/refresher-linear-algebra.md | 67 +++++++++++++++++-----------------
 1 file changed, 34 insertions(+), 33 deletions(-)

diff --git a/ko/refresher-linear-algebra.md b/ko/refresher-linear-algebra.md
index 40068181e..2342a1619 100644
--- a/ko/refresher-linear-algebra.md
+++ b/ko/refresher-linear-algebra.md
@@ -6,7 +6,7 @@
 
 **2. General notations**
 
-&#10230; 일반적인 개념
+&#10230; 일반적인 표기법
 
 <br>
 
@@ -30,7 +30,7 @@
 
 **6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
 
-&#10230; 위에서 정의된 벡터 x는 n×1행렬로 볼 수 있으며, 열벡터라고도 불린다.
+&#10230; 비고 : 위에서 정의된 벡터 x는 n×1행렬로 볼 수 있으며, 열벡터라고도 불린다.
 
 <br>
 
@@ -48,7 +48,7 @@
 
 **9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
 
-&#10230; remark: 모든 행렬 A∈Rn×n에 대하여, A×I=I×A=A를 만족한다.
+&#10230; 비고 : 모든 행렬 A∈Rn×n에 대하여, A×I=I×A=A를 만족한다.
 
 <br>
 
@@ -60,7 +60,7 @@
 
 **11. Remark: we also note D as diag(d1,...,dn).**
 
-&#10230; D를 diag(d1,...,dn)라고도 표시한다.
+&#10230; 비고 : D를 diag(d1,...,dn)라고도 표시한다.
 
 <br>
 
@@ -78,7 +78,7 @@
 
 **14. Vector-vector ― There are two types of vector-vector products:**
 
-&#10230; 벡터-벡터 - 벡터간 연산에는 두가지 종류가 있다.
+&#10230; 벡터-벡터 – 벡터 간 연산에는 두 가지 종류가 있다.
 
 <br>
 
@@ -108,7 +108,7 @@
 
 **19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
 
-&#10230; 행렬 A∈Rm×n와 행렬 B∈Rn×p의 곱은 다음을 만족하는 Rn×p크기의 행렬이다.
+&#10230; 행렬-행렬 - 행렬 A∈Rm×n와 행렬 B∈Rn×p의 곱은 다음을 만족하는 Rn×p크기의 행렬이다.
 
 <br>
 
@@ -132,7 +132,7 @@
 
 **23. Remark: for matrices A,B, we have (AB)T=BTAT**
 
-&#10230; 행렬 A,B에 대하여, (AB)T=BTAT가 성립힌다.
+&#10230; 비고 - 행렬 A,B에 대하여, (AB)T=BTAT가 성립힌다.
 
 <br>
 
@@ -150,25 +150,25 @@
 
 **26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
 
-&#10230;
+&#10230; 대각합 – 정사각행렬 A의 대각합 tr(A)는 대각성분의 합이다.
 
 <br>
 
 **27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
 
-&#10230; 행렬 A,B에 대하여, tr(AT)=tr(A)와 tr(AB)=tr(BA)가 성립힌다.
+&#10230; 비고 : 행렬 A,B에 대하여, tr(AT)=tr(A)와 tr(AB)=tr(BA)가 성립힌다.
 
 <br>
 
 **28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
 
-&#10230; 행렬식 - 정사각행렬 A∈Rn×n의 행렬식 |A| 또는 det(A)는 
+&#10230; 행렬식 - 정사각행렬 A∈Rn×n의 행렬식 |A| 또는 det(A)는 i번째 행과 j번째 열이 없는 행렬 A인 A∖i,∖j에 대해 재귀적으로 표현된다.
 
 <br>
 
 **29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
 
-&#10230;
+&#10230; 비고 : A가 가역일 필요충분조건은 |A|≠0이다. 또한 |AB|=|A||B|와 |AT|=|A|도 그렇다.
 
 <br>
 
@@ -196,21 +196,21 @@
 
 <br>
 
-**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
+**34. Norm ― A norm is a function N:V⟶[0,+∞] where V is a vector space, and such that for all x,y∈V, we have:**
 
-&#10230;
+&#10230; 노름 – V는 벡터공간일 때, 노름은 모든 x,y∈V에 대해 다음을 만족하는 함수 N:V⟶[0,+∞]이다.
 
 <br>
 
 **35. N(ax)=|a|N(x) for a scalar**
 
-&#10230;
+&#10230; scalar a에 대해서 N(ax)=|a|N(x)를 만족한다.
 
 <br>
 
 **36. if N(x)=0, then x=0**
 
-&#10230;
+&#10230; N(x)=0이면 x=0이다.
 
 <br>
 
@@ -222,7 +222,7 @@
 
 **38. [Norm, Notation, Definition, Use case]**
 
-&#10230;
+&#10230; [규범, 표기법, 정의, 유스케이스]
 
 <br>
 
@@ -240,19 +240,19 @@
 
 **41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
 
-&#10230; 행렬 랭크 - 주어진 행렬 A의 랭크는 열에 의해 생성된 벡터공간의 차원이고, rank(A)라고 쓴다.
+&#10230; 행렬 랭크 - 주어진 행렬 A의 랭크는 열에 의해 생성된 벡터공간의 차원이고, rank(A)라고 쓴다. 이는 A의 선형독립인 열의 최대 수와 동일하다.
 
 <br>
 
 **42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
 
-&#10230;
+&#10230; 양의 준정부호 행렬 – 행렬 A∈Rn×n는 다음을 만족하면 양의 준정부호(PSD)라고 하고 A⪰0라고 쓴다.
 
 <br>
 
 **43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
 
-&#10230;
+&#10230; 비고 : 마찬가지로 PSD 행렬이 모든 0이 아닌 벡터 x에 대하여 xTAx>0를 만족하면 행렬 A를 양의 정부호라고 말하고 A≻0라고 쓴다.
 
 <br>
 
@@ -264,35 +264,35 @@
 
 **45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230;
+&#10230; 스펙트럼 정리 – A∈Rn×n라고 하자. A가 대칭이면, A는 실수 직교행렬 U∈Rn×n에 의해 대각화 가능하다. Λ=diag(λ1,...,λn)인 것에 주목하면, 다음을 만족한다.
 
 <br>
 
 **46. diagonal**
 
-&#10230;
+&#10230; 대각
 
 <br>
 
 **47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
 
-&#10230;
+&#10230; 특이값 분해 – 주어진 m×n차원 행렬 A에 대하여, 특이값 분해(SVD)는 다음과 같이 U m×m 유니터리와 Σ m×n 대각 및 V n×n 유니터리 행렬의 존재를 보증하는 인수분해 기술이다.
 
 <br>
 
 **48. Matrix calculus**
 
-&#10230;
+&#10230; 행렬 미적분
 
 <br>
 
 **49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
 
-&#10230;
+&#10230; 그라디언트 – f:Rm×n→R는 함수이고 A∈Rm×n는 행렬이라 하자. A에 대한 f의 그라디언트 ∇Af(A)는 다음을 만족하는 m×n 행렬이다.
 
 <br>
 
-**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
+**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.** 비고 : f의 그라디언트는 f가 스칼라를 반환하는 함수일 때만 정의된다. 
 
 &#10230;
 
@@ -300,40 +300,41 @@
 
 **51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
 
-&#10230;
+&#10230; 헤시안 – f:Rn→R는 함수이고 x∈Rn는 벡터라고 하자. x에 대한 f의 헤시안 ∇2xf(x)는 다음을 만족하는 n×n 대칭행렬이다.
 
 <br>
 
-**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
+**52. Remark: the hessian of f is only defined when f is a function that returns a scalar** 
 
-&#10230;
+&#10230; 비고 : f의 헤시안은 f가 스칼라를 반환하는 함수일 때만 정의된다.
 
 <br>
 
 **53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
 
-&#10230;
+&#10230; 그라디언트 연산 – 행렬 A,B,C에 대하여, 다음 그라디언트 성질을 염두해두는 것이 좋다.
 
 <br>
 
 **54. [General notations, Definitions, Main matrices]**
 
-&#10230;
+&#10230; [일반적인 표기법, 정의, 주요 행렬]
 
 <br>
 
 **55. [Matrix operations, Multiplication, Other operations]**
 
-&#10230;
+&#10230; [행렬 연산, 곱셈, 다른 연산]
 
 <br>
 
 **56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
 
-&#10230;
+&#10230; [행렬 성질, 노름, 고유값/고유벡터, 특이값 분해]
 
 <br>
 
 **57. [Matrix calculus, Gradient, Hessian, Operations]**
 
-&#10230;
+&#10230; [행렬 미적분, 그라디언트, 헤시안, 연산]
+

From 16e8675fbea95dbb1c6dd98c41589c16ce12dd81 Mon Sep 17 00:00:00 2001
From: Zaid Alyafeai <alyafey22@gmail.com>
Date: Tue, 20 Nov 2018 21:30:05 +0300
Subject: [PATCH 050/531] Update CONTRIBUTORS

reviewers added as well.
---
 CONTRIBUTORS | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index e9adef78d..ed831caac 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -1,5 +1,8 @@
 --ar
-Zaid Alyafeai (translation fo linear algebra)
+  Amjad Khatabi (translation of deep learning)
+  Zaid Alyafeai (review of deep learning)
+  Zaid Alyafeai (translation for linear algebra)
+  Amjad Khatabi (review of linear algebra)
 
 --de
 

From 28a3b7dc287ffada21cfffdd5af359a27ca2c530 Mon Sep 17 00:00:00 2001
From: Zaid Alyafeai <alyafey22@gmail.com>
Date: Tue, 20 Nov 2018 21:35:34 +0300
Subject: [PATCH 051/531] Update refresher-linear-algebra.md

---
 ar/refresher-linear-algebra.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ar/refresher-linear-algebra.md b/ar/refresher-linear-algebra.md
index ca00694fb..7037782e7 100644
--- a/ar/refresher-linear-algebra.md
+++ b/ar/refresher-linear-algebra.md
@@ -180,7 +180,7 @@
 **25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
 
 <div dir="rtl">
-ملاحظة: ليس جميع المصفوفات يمكن أيجاد معكوس لها. كذلك لأي مصفوفتين $A$ و $B$ نستنتج $(AB)^{-1} = B^{-1} A^{-1}$.
+ملاحظة: ليس جميع المصفوفات يمكن إيجاد معكوس لها. كذلك لأي مصفوفتين $A$ و $B$ نستنتج $(AB)^{-1} = B^{-1} A^{-1}$.
 </div>
 
 <br>
@@ -280,7 +280,7 @@ $N(x) =0 \implies x = 0$
 **39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
 
 <div dir="rtl">
-الإتباع الخطي (Linear Dependence): مجموعة المتجهات تعتبر تابعة خطياً إذا وفقط إذا كل متجه يمكن كتابته بشكل خطي بإسخدام مجموعة من المتجهات الأخرى. 
+ الارتباط الخطي (Linear Dependence): مجموعة المتجهات تعتبر تابعة خطياً إذا وفقط إذا كل متجه يمكن كتابته بشكل خطي بإسخدام مجموعة من المتجهات الأخرى. 
 </div>
 <br>
 
@@ -336,7 +336,7 @@ $N(x) =0 \implies x = 0$
 **47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
 
 <div dir="rtl">
-  تفكيك القيمة المنفردة (singular value decomposition) : لأي مصفوفة $A$ من الشكل $n\times m$ ، تفكيك القيمة المنفردة (SVD) يعتبر طريقة تحليل تضمن وجود $U \in \mathbb{R}^{m \times m}$ , مصفوفة قطرية  $\Sigma \in \mathbb{R}^{m \times n}$ و $V \in \mathbb{R}^{n \times n}$ حيث أن : 
+  مجزئ القيمة المفرده (singular value decomposition) : لأي مصفوفة $A$ من الشكل $n\times m$ ، تفكيك القيمة المنفردة (SVD) يعتبر طريقة تحليل تضمن وجود $U \in \mathbb{R}^{m \times m}$ , مصفوفة قطرية  $\Sigma \in \mathbb{R}^{m \times n}$ و $V \in \mathbb{R}^{n \times n}$ حيث أن : 
 </div>
 <br>
 

From bfddc369f062ace5c2e4998fd77f3600840280fa Mon Sep 17 00:00:00 2001
From: Redouane Lguensat <Redouane.lguensat@univ-grenoble-alpes.fr>
Date: Thu, 13 Dec 2018 21:35:45 +0100
Subject: [PATCH 052/531] Update cheatsheet-unsupervised-learning.md

---
 ar/cheatsheet-unsupervised-learning.md | 125 +++++++++++++++----------
 1 file changed, 78 insertions(+), 47 deletions(-)

diff --git a/ar/cheatsheet-unsupervised-learning.md b/ar/cheatsheet-unsupervised-learning.md
index 05dce41e0..300fd8dfd 100644
--- a/ar/cheatsheet-unsupervised-learning.md
+++ b/ar/cheatsheet-unsupervised-learning.md
@@ -235,136 +235,167 @@
 **35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
 
 <div dir="rtl">
-ملحوظة: المتجه الذاتي المرتبط بأكبر قيمة ذاتية يسمى بالمتجه الذاتي الرئيسي للمصفوفة A</div>
+ملحوظة: المتجه الذاتي المرتبط بأكبر قيمة ذاتية يسمى بالمتجه الذاتي الرئيسي للمصفوفة A
+</div>
 <br>
 
 **36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
 dimensions by maximizing the variance of the data as follows:**
 
-&#10230;
-
+<div dir="rtl">
+خوارزمية - تحليل المكون الرئيسي تقنية لخفض الأبعاد تهدف إلى إسقاط البيانات على k بعد بحيث يتم تكبير التباين، خطواتها كالتالي:</div>
 <br>
 
 **37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
 
-&#10230;
-
-<br>
+<div dir="rtl">
+الخطوة 1: تسوية البيانات بحيث تصبح ذات متوسط يساوي صفر و انحراف معياري يساوي واحد
+ </div>
+ <br>
 
 **38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
 
-&#10230;
-
+<div dir="rtl">
+الخطوة 2: حساب Σ=1mm∑i=1x(i)x(i)T∈Rn×n ، و هي متماثلة و ذات قيم ذاتية حقيقية
+</div>
 <br>
 
 **39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
 
-&#10230;
-
-<br>
+<div dir="rtl">
+الخطوة 3: حساب u1,...,uk∈Rn المتجهات الذاتية الرئيسية المتعامدة لΣ و عددها k ، يعني k من المتجهات الذاتية المتعامدة ذات القيم الذاتية الأكبر
+ </div>
+ <br>
 
 **40. Step 4: Project the data on spanR(u1,...,uk).**
 
-&#10230;
-
+<div dir="rtl">
+الخطوة 4: إسقاط البيانات على spanR(u1,...,uk) 
+</div>
 <br>
 
 **41. This procedure maximizes the variance among all k-dimensional spaces.**
 
-&#10230;
-
+<div dir="rtl">
+هذا الإجراء يضخم التباين بين كل الفضاءات البعدية
+</div>
 <br>
 
 **42. [Data in feature space, Find principal components, Data in principal components space]**
 
-&#10230;
-
+<div dir="rtl">
+[بيانات في فضاء الخصائص, أوجد المكونات الرئيسية, بيانات في فضاء المكونات الرئيسية]
+</div>
 <br>
 
 **43. Independent component analysis**
 
-&#10230;
-
+<div dir="rtl">
+تحليل المكونات المستقلة
+</div>
 <br>
 
 **44. It is a technique meant to find the underlying generating sources.**
 
-&#10230;
-
+<div dir="rtl">
+هي تقنية تهدف إلى إيجاد المصادر التوليدية الكامنة
+</div>
 <br>
 
 **45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
 
-&#10230;
-
+<div dir="rtl">
+افتراضات - لنفترض أن بياناتنا x تم توليدها من طرف s=(s1,...,sn) المصدر المتجهي ال n بعدي، بحيث متغيرات عشوائية مستقلة، و ذلك عبر مصفوفة خلط غير منفردة A
+كالتالي
+</div>
 <br>
 
 **46. The goal is to find the unmixing matrix W=A−1.**
 
-&#10230;
-
+<div dir="rtl">
+الهدف هو العثور على مصفوفة الفصل W=A−1 </div>
 <br>
 
 **47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
-
-&#10230;
-
+<div dir="rtl">
+خوارزمية ICA
+Bell و Sejnowski ل 
+هاته الخوارزمية تجد مصفوفة الفصل W عن طريق الخطوات التالية
+</div>
 <br>
 
 **48. Write the probability of x=As=W−1s as:**
 
-&#10230;
-
+<div dir="rtl">
+اكتب احتمال x=As=W−1s كالتالي</div>
 <br>
 
 **49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
 
-&#10230;
-
+<div dir="rtl">
+ لتكن {x(i),i∈[[1,m]]}
+بيانات التمرن
+و  g دالة سيجمويد
+اكتب الأرجحية اللوغاريتمية كالتالي
+</div>
 <br>
 
 **50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
 
-&#10230;
-
+<div dir="rtl">
+و منه، قاعدة التعلم للصعود التفاضلي العشوائي تقتضي أن لكل مثال تمرين x(i) ، نقوم بتحديث W  كما يلي
+</div>
 <br>
 
 **51. The Machine Learning cheatsheets are now available in Arabic.**
 
-&#10230;
-
+<div dir="rtl">
+ورقات المراجعة للتعلم الآلي متوفرة حاليا باللغة العربية
+</div>
 <br>
 
 **52. Original authors**
 
-&#10230;
-
+<div dir="rtl">
+المحررون الأصليون
+</div>
 <br>
 
 **53. Translated by X, Y and Z**
 
-&#10230;
-
+<div dir="rtl">
+تم ترجمته بواسطة X, Y و Z </div>
+</div>
 <br>
 
 **54. Reviewed by X, Y and Z**
 
-&#10230;
-
+<div dir="rtl">
+تم مراجعته بواسطة X, Y و Z </div>
 <br>
 
 **55. [Introduction, Motivation, Jensen's inequality]**
 
-&#10230;
-
+<div dir="rtl">
+[تقديم، تحفيز، متفاوتة جنسن]
+</div>
 <br>
 
 **56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
 
-&#10230;
-
+<div dir="rtl">
+[تجميع,
+  التوقع-التعظيم
+  , k-متوسطات
+  , التجميع الهرمي
+  , مقاييس]
+  
+</div>
 <br>
 
 **57. [Dimension reduction, PCA, ICA]**
 
-&#10230;
+<div dir="rtl">
+[خفض الأبعاد, PCA, ICA]
+</div>
+<br>

From 199aef6db8cf5a7b4f0b05e6c24ad45a990239eb Mon Sep 17 00:00:00 2001
From: Redouane Lguensat <Redouane.lguensat@univ-grenoble-alpes.fr>
Date: Thu, 13 Dec 2018 21:40:59 +0100
Subject: [PATCH 053/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index afd1d1f12..8dfa394f9 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -1,5 +1,7 @@
 --ar
-
+  
+  Redouane Lguensat (translation of unsupervised learning)
+  
 --de
 
 --es 

From ebd577ba58fcd4c86f685a8f9818793ae1764d28 Mon Sep 17 00:00:00 2001
From: kevingo <kevingo75@gmail.com>
Date: Sun, 16 Dec 2018 09:48:55 +0800
Subject: [PATCH 054/531] Add unsupervised learning zh-tw translation

---
 CONTRIBUTORS                              |   5 +-
 zh-tw/cheatsheet-unsupervised-learning.md | 298 ++++++++++++++++++++++
 2 files changed, 302 insertions(+), 1 deletion(-)
 create mode 100644 zh-tw/cheatsheet-unsupervised-learning.md

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 5cef9939b..8be8ed9d6 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -111,4 +111,7 @@
   TobyOoO (review of deep learning)
 
   kevingo (translation of supervised learning)
-  accelsao (review of supervised learning)
\ No newline at end of file
+  accelsao (review of supervised learning)
+
+  kevingo (translation of unsupervised learning)
+
diff --git a/zh-tw/cheatsheet-unsupervised-learning.md b/zh-tw/cheatsheet-unsupervised-learning.md
new file mode 100644
index 000000000..6d4f760ed
--- /dev/null
+++ b/zh-tw/cheatsheet-unsupervised-learning.md
@@ -0,0 +1,298 @@
+1. **Unsupervised Learning cheatsheet**
+
+&#10230;
+非監督式學習參考手冊
+<br>
+
+2. **Introduction to Unsupervised Learning**
+
+&#10230;
+非監督式學習介紹
+<br>
+
+3. **Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
+
+&#10230;
+動機 - 非監督式學習的目的是要找出未標籤資料 {x(1),...,x(m)} 之間的隱藏模式
+<br>
+
+4. **Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
+
+&#10230;
+Jensen's 不等式 - 另 f 為一個凸函數、X 是一個隨機變數，我們可以得到底下這個不等式：
+<br>
+
+5. **Clustering**
+
+&#10230;
+分群
+<br>
+
+6. **Expectation-Maximization**
+
+&#10230;
+最大期望值
+<br>
+
+7. **Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
+
+&#10230;
+潛在變數 (Latent variables) - 潛在變數指的是隱藏/沒有觀察到的變數，這會讓問題的估計變得困難，我們通常使用 z 來代表它。底下是潛在變數的常見設定：
+<br>
+
+8. **[Setting, Latent variable z, Comments]**
+
+&#10230;
+[設定, 潛在變數 z, 評論]
+<br>
+
+9. **[Mixture of k Gaussians, Factor analysis]**
+
+&#10230;
+[k 元高斯模型, 因素分析]
+<br>
+
+10. **Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230;
+演算法 - 最大期望演算法 (EM Algorithm) 透過重複建構一個概似函數的下界 (E-step) 和最佳化下界 (M-step) 來進行最大概似估計給出參數 θ 的高效率估計方法：
+<br>
+
+11. **E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
+
+&#10230;
+E-step: 評估後驗機率 Qi(z(i))，其中每個資料點 x(i) 來自於一個特定的群集 z(i)，如下：
+<br>
+
+12. **M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
+
+&#10230;
+M-step: 使用後驗機率 Qi(z(i)) 作為資料點 x(i) 在群集中特定的權重，用來分別重新估計每個群集，如下：
+<br>
+
+13. **[Gaussians initialization, Expectation step, Maximization step, Convergence]**
+
+&#10230;
+[高斯分佈初始化, E-Step, M-Step, 收斂]
+<br>
+
+14. **k-means clustering**
+
+&#10230;
+k-means 分群法
+<br>
+
+15. **We note c(i) the cluster of data point i and μj the center of cluster j.**
+
+&#10230;
+我們使用 c(i) 表示資料 i 屬於某群，而 μj 則是群 j 的中心
+<br>
+
+16. **Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230;
+演算法 - 在隨機初始化群集中心點 μ1,μ2,...,μk∈Rn 後，k-means 演算法重複以下步驟直到收斂：
+<br>
+
+17. **[Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230;
+[中心點初始化, 指定群集, 更新中心點, 收斂]
+<br>
+
+18. **Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
+
+&#10230;
+畸變函數 - 為了確認演算法是否收斂，我們定義以下的畸變函數：
+<br>
+
+19. **Hierarchical clustering**
+
+&#10230;
+階層式分群法
+<br>
+
+20. **Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
+
+&#10230;
+演算法 - 階層式分群法是透過一種階層架構的方式，將資料建立為一種連續層狀結構的形式。
+<br>
+
+21. **Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
+
+&#10230;
+類型 - 底下是幾種不同類型的階層式分群法，差別在於要最佳化的目標函式的不同，請參考底下：
+<br>
+
+22. **[Ward linkage, Average linkage, Complete linkage]**
+
+&#10230;
+[Ward 鏈結距離, 平均鏈結距離, 完整鏈結距離]
+<br>
+
+23. **[Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
+
+&#10230;
+[最小化群內距離, 最小化各群彼此的平均距離, 最小化各群彼此的最大距離]
+<br>
+
+24. **Clustering assessment metrics**
+
+&#10230;
+分群衡量指標
+<br>
+
+25. **In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
+
+&#10230;
+在無監督式學習中，通常很難去評估一個模型的好壞，因為我們沒有像在監督式學習中正確答案的標籤。
+<br>
+
+26. **Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
+
+&#10230;
+輪廓係數 (Silhouette coefficient) - 我們指定 a 為一個樣本點和相同群集中其他資料點的平均距離、b 為一個樣本點和下一個最接近群集其他資料點的平均距離，輪廓係數 s 對於此一樣本點的定義為：
+<br>
+
+27. **Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
+
+&#10230;
+Calinski-Harabaz 指標 - 定義 k 是群集的數量，Bk 和 Wk 分別是群內和群集之間的離差矩陣 (dispersion matrices)：
+<br>
+
+28. **the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
+
+&#10230;
+Calinski-Harabaz 指標 s(k) 指出分群模型的好壞，此指標的值越高，代表分群模型的表現越好。定義如下：
+<br>
+
+29. **Dimension reduction**
+
+&#10230;
+維度縮減
+<br>
+
+30. **Principal component analysis**
+
+&#10230;
+主成份分析
+<br>
+
+31. **It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
+
+&#10230;
+這是一個維度縮減的技巧，在於找到投影資料的最大方差
+<br>
+
+32. **Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+特徵值、特徵向量 - 給定一個矩陣 A∈Rn×n，我們說 λ 是 A 的特徵值，當存在一個特徵向量 z∈Rn∖{0}，使得：
+<br>
+
+33. **Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+譜定理 - 令 A∈Rn×n，如果 A 是對稱的，則 A 可以可以透過正交矩陣 U∈Rn×n 對角化。當 Λ=diag(λ1,...,λn)，我們得到：
+<br>
+
+34. **diagonal**
+
+&#10230;
+對角線
+<br>
+
+35. **Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230;
+注意：與特徵值所關聯的特徵向量就是 A 矩陣的主特徵向量
+<br>
+
+36. **Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:**
+
+&#10230;
+演算法 - 主成份分析 (PCA) 是一種維度縮減的技巧，它會透過尋找資料最大變異的方式，將資料投影在 k 維空間上：
+<br>
+
+37. **Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230;
+第一步：正規化資料，讓資料平均為 0，變異數為 1
+<br>
+
+38. **Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
+
+&#10230;
+第二步：計算 Σ=1mm∑i=1x(i)x(i)T∈Rn×n，即對稱實際特徵值
+<br>
+
+39. **Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
+
+&#10230;
+第三步：計算 u1,...,uk∈Rn，k 個正交主特徵向量的總和 Σ，即是 k 個最大特徵值的正交特徵向量
+<br>
+
+40. **Step 4: Project the data on spanR(u1,...,uk).**
+
+&#10230;
+第四部：將資料投影到 spanR(u1,...,uk)
+<br>
+
+41. **This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230;
+這個步驟會最大化所有 k 為空間為空間的變異數
+<br>
+
+42. **[Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230;
+[資料在特徵空間, 尋找主成分, 資料在主成分空間]
+<br>
+
+43. **Independent component analysis**
+
+&#10230;
+獨立成分分析
+<br>
+
+44. **It is a technique meant to find the underlying generating sources.**
+
+&#10230;
+這是用來尋找潛在生成來源的技巧
+<br>
+
+45. **Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
+
+&#10230;
+假設 - 我們假設資料 x 是從 n 維的來源向量 s=(s1,...,sn) 產生，si 為獨立變數，透過一個混合與非奇異矩陣 A 產生如下：
+<br>
+
+46. **The goal is to find the unmixing matrix W=A−1.**
+
+&#10230;
+目的在於找到一個 unmixing 矩陣 W=A−1
+<br>
+
+47. **Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
+
+&#10230;
+Bell 和 Sejnowski 獨立成份分析演算法 - 此演算法透過以下步驟來找到 unmixing 矩陣：
+<br>
+
+48. **Write the probability of x=As=W−1s as:**
+
+&#10230;
+紀錄 x=As=W−1s 的機率如下：
+<br>
+
+49. **Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
+
+&#10230;
+在給定訓練資料 {x(i),i∈[[1,m]]} 的情況下，其對數概似估計函數與定義 g  為 sigmoid 函數如下：
+<br>
+
+50. **Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
+
+&#10230;
+因此，梯度隨機下降學習規則對每個訓練樣本 x(i) 來說，我們透過以下方法來更新 W：

From b1325d00a551cf67a39492898ff961ad0d3d6ec9 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sat, 5 Jan 2019 18:45:30 -0800
Subject: [PATCH 055/531] Update recurrent-neural-networks.md

---
 template/recurrent-neural-networks.md | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/template/recurrent-neural-networks.md b/template/recurrent-neural-networks.md
index 9d83fcac2..191e400a1 100644
--- a/template/recurrent-neural-networks.md
+++ b/template/recurrent-neural-networks.md
@@ -401,6 +401,13 @@
 <br>
 
 
+**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
+
+&#10230;
+
+<br>
+
+
 **58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
 Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
 

From cb5720bceead4362d81c915a2db34893f9126037 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sat, 5 Jan 2019 18:48:59 -0800
Subject: [PATCH 056/531] Update recurrent-neural-networks.md

---
 fr/recurrent-neural-networks.md | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/fr/recurrent-neural-networks.md b/fr/recurrent-neural-networks.md
index 468653712..88f1ccec3 100644
--- a/fr/recurrent-neural-networks.md
+++ b/fr/recurrent-neural-networks.md
@@ -401,6 +401,13 @@
 <br>
 
 
+**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
+
+&#10230; GloVe ― Le modèle GloVe (en anglais <i>global vectors for word representation</i>) est une technique de représentation des mots qui utilise une matrice de co-occurrence X où chaque Xi,j correspond au nombre de fois qu'une cible i se produit avec un contexte j. Sa fonction de coût J est telle que :
+
+<br>
+
+
 **58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
 Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
 

From 60c3ea8fb51b3f45cd78b5d53ec0e49f0ee64d49 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 6 Jan 2019 14:43:55 -0800
Subject: [PATCH 057/531] Update README.md

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 6e486f75f..a3bb264ff 100644
--- a/README.md
+++ b/README.md
@@ -50,9 +50,9 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 |Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|
 |:---|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|not started|not started|not started|not started|
-|Recurrent Neural Nets|not started|not started|not started|not started|not started|
-|DL tips and tricks|not started|not started|not started|not started|not started|
+|Convolutional Neural Nets|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|
+|Recurrent Neural Nets|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|
+|DL tips and tricks|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
 
 ## Progression for CS 229 (Machine Learning)
 |Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|

From dc4ba0d2a10a94a8fccbf057578d9923a3ba6d91 Mon Sep 17 00:00:00 2001
From: kevingo <kevingo75@gmail.com>
Date: Sun, 6 Jan 2019 23:26:36 +0800
Subject: [PATCH 058/531] Update content by reviewer

---
 CONTRIBUTORS                              |   1 +
 zh-tw/cheatsheet-unsupervised-learning.md |   6 +-
 zh-tw/refresher-probability.md            | 382 ++++++++++++++++++++++
 3 files changed, 386 insertions(+), 3 deletions(-)
 create mode 100644 zh-tw/refresher-probability.md

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 8be8ed9d6..b866c9e8a 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -114,4 +114,5 @@
   accelsao (review of supervised learning)
 
   kevingo (translation of unsupervised learning)
+  imironhead (review of unsupervised learning)
 
diff --git a/zh-tw/cheatsheet-unsupervised-learning.md b/zh-tw/cheatsheet-unsupervised-learning.md
index 6d4f760ed..0f6d5ee34 100644
--- a/zh-tw/cheatsheet-unsupervised-learning.md
+++ b/zh-tw/cheatsheet-unsupervised-learning.md
@@ -19,7 +19,7 @@
 4. **Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
 
 &#10230;
-Jensen's 不等式 - 另 f 為一個凸函數、X 是一個隨機變數，我們可以得到底下這個不等式：
+Jensen's 不等式 - 令 f 為一個凸函數、X 為一個隨機變數，我們可以得到底下這個不等式：
 <br>
 
 5. **Clustering**
@@ -145,7 +145,7 @@ k-means 分群法
 25. **In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
 
 &#10230;
-在無監督式學習中，通常很難去評估一個模型的好壞，因為我們沒有像在監督式學習中正確答案的標籤。
+在非監督式學習中，通常很難去評估一個模型的好壞，因為我們沒有擁有像在監督式學習任務中正確答案的標籤
 <br>
 
 26. **Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
@@ -241,7 +241,7 @@ Calinski-Harabaz 指標 s(k) 指出分群模型的好壞，此指標的值越高
 41. **This procedure maximizes the variance among all k-dimensional spaces.**
 
 &#10230;
-這個步驟會最大化所有 k 為空間為空間的變異數
+這個步驟會最大化所有 k 維空間的變異數
 <br>
 
 42. **[Data in feature space, Find principal components, Data in principal components space]**
diff --git a/zh-tw/refresher-probability.md b/zh-tw/refresher-probability.md
new file mode 100644
index 000000000..4f95609e6
--- /dev/null
+++ b/zh-tw/refresher-probability.md
@@ -0,0 +1,382 @@
+1. **Probabilities and Statistics refresher**
+
+&#10230;
+機率和統計回顧
+<br>
+
+2. **Introduction to Probability and Combinatorics**
+
+&#10230;
+幾率與組合數學介紹
+<br>
+
+3. **Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
+
+&#10230;
+樣本空間 - 一個實驗的所有可能結果的集合稱之為這個實驗的樣本空間，記做 S
+<br>
+
+4. **Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
+
+&#10230;
+事件 - 樣本空間的任何子集合 E 被稱之為一個事件。也就是說，一個事件是實驗的可能結果的集合。如果該實驗的結果包含 E，我們稱我們稱 E 發生
+<br>
+
+5. **Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
+
+&#10230;
+機率公理。對於每個事件 E，我們用 P(E) 表示事件 E 發生的機率
+<br>
+
+6. **Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
+
+&#10230;
+公理 1 - 每一個機率值介於 0 到 1 之間，包含兩端點。即：
+<br>
+
+7. **Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
+
+&#10230;
+公理 2 - 至少一個基本事件出現在整個樣本空間中的機率是 1。即：
+<br>
+
+8. **Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
+
+&#10230;
+公理 3 - 對於任何互斥的事件 E1,...,En，我們定義如下：
+<br>
+
+9. **Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
+
+&#10230;
+排列 - 排列指的是從 n 個相異的物件中，取出 r 個物件按照固定順序重新安排，這樣安排的數量用 P(n,r) 來表示，定義為：
+<br>
+
+10. **Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
+
+&#10230;
+組合 - 組合指的是從 n 個物件中，取出 r 個物件，但不考慮他的順序。這樣組合要考慮的數量用 C(n,r) 來表示，定義為：
+<br>
+
+11. **Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
+
+&#10230;
+注意：對於 0⩽r⩽n，我們會有 P(n,r)⩾C(n,r)
+<br>
+
+12. **Conditional Probability**
+
+&#10230;
+條件機率
+<br>
+
+13. **Bayes' rule ― For events A and B such that P(B)>0, we have:**
+
+&#10230;
+貝氏定理 - 對於事件 A 和 B 滿足 P(B)>0 時，我們定義如下：
+<br>
+
+14. **Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
+
+&#10230;
+注意：P(A∩B)=P(A)P(B|A)=P(A|B)P(B)
+<br>
+
+15. **Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
+
+&#10230;
+分割 - 令 {Ai,i∈[[1,n]]} 對所有的 i，Ai≠∅，我們說 {Ai} 是一個分割，當底下成立時：
+<br>
+
+16. **Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
+
+&#10230;
+注意：對於任何在樣本空間的事件 B 來說，P(B)=n∑i=1P(B|Ai)P(Ai)
+<br>
+
+17. **Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
+
+&#10230;
+貝氏定理的擴展 - 令 {Ai,i∈[[1,n]]} 為樣本空間的一個分割，我們定義：
+<br>
+
+18. **Independence ― Two events A and B are independent if and only if we have:**
+
+&#10230;
+獨立 - 當以下條件滿足時，兩個事件 A 和 B 為獨立事件：
+<br>
+
+19. **Random Variables**
+
+&#10230;
+隨機變數
+<br>
+
+20. **Definitions**
+
+&#10230;
+定義
+<br>
+
+21. **Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
+
+&#10230;
+隨機變數 - 一個隨機變數 X，它是一個將樣本空間中的每個元素映射到實數域的函數
+<br>
+
+22. **Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
+
+&#10230;
+累積分佈函數 (CDF) - 累積分佈函數 F 是單調且不減的函數，其 limx→−∞F(x)=0 且 limx→+∞F(x)=1，定義如下：
+<br>
+
+23. **Remark: we have P(a<X⩽B)=F(b)−F(a).**
+
+&#10230;
+注意：P(a<X⩽B)=F(b)−F(a)
+<br>
+
+24. **Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
+
+&#10230;
+機率密度函數 - 機率密度函數 f 是隨機變數 X 在兩個相鄰的實數值附近取值的機率
+<br>
+
+25. **Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
+
+&#10230;
+機率密度函數和累積分佈函數的關係 - 底下是一些關於離散 (D) 和連續 (C) 的情況下的重要屬性
+<br>
+
+26. **[Case, CDF F, PDF f, Properties of PDF]**
+
+&#10230;
+[情況, 累積分佈函數 F, 機率密度函數 f, 機率密度函數的屬性]
+<br>
+
+27. **Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
+
+&#10230;
+分佈的期望值和動差 - 底下是期望值 E[X]、一般期望值  E[g(X)]、第 k 個動差和特徵函數 ψ(ω) 在離散和連續的情況下的表示式：
+<br>
+
+28. **Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
+
+&#10230;
+變異數 - 隨機變數的變異數通常表示為 Var(X) 或 σ2，用來衡量一個分佈離散程度的指標。其表示如下：
+<br>
+
+29. **Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
+
+&#10230;
+標準差 - 一個隨機變數的標準差通常表示為 σ，用來衡量一個分佈離散程度的指標，其單位和實際的隨機變數相容，表示如下：
+<br>
+
+30. **Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
+
+&#10230;
+隨機變數的轉換 - 令變數 X 和 Y 由某個函式連結在一起。我們定義 fX 和 fY 是 X 和 Y 的分佈函式，可以得到：
+<br>
+
+31. **Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
+
+&#10230;
+萊布尼茲積分法則 - 令 g 為 x 和 c 的函數，a 和 b 是依賴於 c 的的邊界，我們得到：
+<br>
+
+32. **Probability Distributions**
+
+&#10230;
+機率分佈
+<br>
+
+33. **Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
+
+&#10230;
+柴比雪夫不等式 - 令 X 是一隨機變數，期望值為 μ。對於 k, σ>0，我們有以下不等式：
+<br>
+
+34. **Main distributions ― Here are the main distributions to have in mind:**
+
+&#10230;
+主要的分佈 - 底下是我們需要熟悉的幾個主要的不等式：
+<br>
+
+35. **[Type, Distribution]**
+
+&#10230;
+[種類, 分佈]
+<br>
+
+36. **Jointly Distributed Random Variables**
+
+&#10230;
+聯合分佈隨機變數
+<br>
+
+37. **Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
+
+&#10230;
+邊緣密度和累積分佈 - 從聯合密度機率函數 fXY 中我們可以得到：
+<br>
+
+38. **[Case, Marginal density, Cumulative function]**
+
+&#10230;
+[種類, 邊緣密度函數, 累積函數]
+<br>
+
+39. **Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
+
+&#10230;
+條件密度 - X 對於 Y 的條件密度，通常用 fX|Y 表示如下：
+<br>
+
+40. **Independence ― Two random variables X and Y are said to be independent if we have:**
+
+&#10230;
+獨立 - 當滿足以下條件時，我們稱隨機變數 X 和 Y 互相獨立：
+<br>
+
+41. **Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
+
+&#10230;
+共變異數 - 我們定義隨機變數 X 和 Y 的共變異數為 σ2XY 或 Cov(X,Y) 如下：
+<br>
+
+42. **Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
+
+&#10230;
+相關性 - 我們定義 σX、σY 為 X 和 Y 的標準差，而 X 和 Y 的相關係數 ρXY 定義如下：
+<br>
+
+43. **Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
+
+&#10230;
+注意一：對於任何隨機變數 X 和 Y 來說，ρXY∈[−1,1] 成立
+<br>
+
+44. **Remark 2: If X and Y are independent, then ρXY=0.**
+
+&#10230;
+注意二：當 X 和 Y 獨立時，ρXY=0
+<br>
+
+45. **Parameter estimation**
+
+&#10230;
+參數估計
+<br>
+
+46. **Definitions**
+
+&#10230;
+定義
+<br>
+
+47. **Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
+
+&#10230;
+隨機抽樣 - 隨機抽樣指的是 n 個隨機變數 X1,...,Xn 和 X 獨立且同分佈的集合
+<br>
+
+48. **Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
+
+&#10230;
+估計量 - 估計量是一個資料的函數，用來推斷在統計模型中未知參數的值
+<br>
+
+49. **Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
+
+&#10230;
+偏差 - 一個估計量的偏差 ^θ 定義為 ^θ 分佈期望值和真實值之間的差距：
+<br>
+
+50. **Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
+
+&#10230;
+注意：當 E[^θ]=θ 時，我們稱為不偏估計量
+<br>
+
+51. **Estimating the mean**
+
+&#10230;
+預估平均數
+<br>
+
+52. **Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯X and is defined as follows:**
+
+&#10230;
+樣本平均 - 一個隨機樣本的樣本平均是用來預估一個分佈的真實平均 μ，通常我們用 ¯X 來表示，定義如下：
+<br>
+
+53. **Remark: the sample mean is unbiased, i.e E[¯X]=μ.**
+
+&#10230;
+注意：當 E[¯X]=μ 時，則為不偏樣本平均
+<br>
+
+54. **Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
+
+&#10230;
+中央極限定理 - 當我們有一個隨機樣本 X1,...,Xn 滿足一個給定的分佈，其平均數為 μ，變異數為 σ2，我們有：
+<br>
+
+55. **Estimating the variance**
+
+&#10230;
+估計變異數
+<br>
+
+56. **Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
+
+&#10230;
+樣本變異數 - 一個隨機樣本的樣本變異數是用來估計一個分佈的真實變異數 σ2，通常使用 s2 或 ^σ2 來表示，定義如下：
+<br>
+
+57. **Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
+
+&#10230;
+注意：當 E[s2]=σ2 時，稱之為不偏樣本變異數
+<br>
+
+58. **Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
+
+&#10230;
+與樣本變異數的卡方關聯 - 令 s2 是一個隨機樣本的樣本變異數，我們可以得到：
+<br>
+
+**59. [Introduction, Sample space, Event, Permutation]**
+
+&#10230;
+[介紹, 樣本空間, 事件, 排列]
+<br>
+
+**60. [Conditional probability, Bayes' rule, Independence]**
+
+&#10230;
+[條件機率, 貝氏定理, 獨立性]
+<br>
+
+**61. [Random variables, Definitions, Expectation, Variance]**
+
+&#10230;
+[隨機變數, 定義, 期望值, 變異數]
+<br>
+
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
+
+&#10230;
+[機率分佈, 柴比雪夫不等式, 主要分佈]
+<br>
+
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
+
+&#10230;
+[聯合分佈隨機變數, 密度, 共變異數, 相關]
+<br>
+
+**64. [Parameter estimation, Mean, Variance]**
+
+&#10230;
+[參數估計, 平均數, 變異數]
\ No newline at end of file

From 7b349e2e55dc1e0302769b4c6786b83b24275a74 Mon Sep 17 00:00:00 2001
From: kevingo <kevingo75@gmail.com>
Date: Mon, 7 Jan 2019 22:52:15 +0800
Subject: [PATCH 059/531] Add contributor

---
 CONTRIBUTORS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index b866c9e8a..55bc41c31 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -115,4 +115,5 @@
 
   kevingo (translation of unsupervised learning)
   imironhead (review of unsupervised learning)
+  johnnychhsu  (review of unsupervised learning)
 

From 994a5326fa3edbdda2bb032dc5a74090737fa84c Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 7 Jan 2019 12:06:12 -0800
Subject: [PATCH 060/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 55bc41c31..5a3380228 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -115,5 +115,5 @@
 
   kevingo (translation of unsupervised learning)
   imironhead (review of unsupervised learning)
-  johnnychhsu  (review of unsupervised learning)
+  johnnychhsu (review of unsupervised learning)
 

From 1003c173322e062a6d2159cfcb056f96e2308dbc Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 7 Jan 2019 21:22:23 -0800
Subject: [PATCH 061/531] Delete refresher-probability.md

---
 zh-tw/refresher-probability.md | 382 ---------------------------------
 1 file changed, 382 deletions(-)
 delete mode 100644 zh-tw/refresher-probability.md

diff --git a/zh-tw/refresher-probability.md b/zh-tw/refresher-probability.md
deleted file mode 100644
index 4f95609e6..000000000
--- a/zh-tw/refresher-probability.md
+++ /dev/null
@@ -1,382 +0,0 @@
-1. **Probabilities and Statistics refresher**
-
-&#10230;
-機率和統計回顧
-<br>
-
-2. **Introduction to Probability and Combinatorics**
-
-&#10230;
-幾率與組合數學介紹
-<br>
-
-3. **Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-樣本空間 - 一個實驗的所有可能結果的集合稱之為這個實驗的樣本空間，記做 S
-<br>
-
-4. **Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-事件 - 樣本空間的任何子集合 E 被稱之為一個事件。也就是說，一個事件是實驗的可能結果的集合。如果該實驗的結果包含 E，我們稱我們稱 E 發生
-<br>
-
-5. **Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-機率公理。對於每個事件 E，我們用 P(E) 表示事件 E 發生的機率
-<br>
-
-6. **Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-公理 1 - 每一個機率值介於 0 到 1 之間，包含兩端點。即：
-<br>
-
-7. **Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-公理 2 - 至少一個基本事件出現在整個樣本空間中的機率是 1。即：
-<br>
-
-8. **Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-公理 3 - 對於任何互斥的事件 E1,...,En，我們定義如下：
-<br>
-
-9. **Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-排列 - 排列指的是從 n 個相異的物件中，取出 r 個物件按照固定順序重新安排，這樣安排的數量用 P(n,r) 來表示，定義為：
-<br>
-
-10. **Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-組合 - 組合指的是從 n 個物件中，取出 r 個物件，但不考慮他的順序。這樣組合要考慮的數量用 C(n,r) 來表示，定義為：
-<br>
-
-11. **Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-注意：對於 0⩽r⩽n，我們會有 P(n,r)⩾C(n,r)
-<br>
-
-12. **Conditional Probability**
-
-&#10230;
-條件機率
-<br>
-
-13. **Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-貝氏定理 - 對於事件 A 和 B 滿足 P(B)>0 時，我們定義如下：
-<br>
-
-14. **Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-注意：P(A∩B)=P(A)P(B|A)=P(A|B)P(B)
-<br>
-
-15. **Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-分割 - 令 {Ai,i∈[[1,n]]} 對所有的 i，Ai≠∅，我們說 {Ai} 是一個分割，當底下成立時：
-<br>
-
-16. **Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-注意：對於任何在樣本空間的事件 B 來說，P(B)=n∑i=1P(B|Ai)P(Ai)
-<br>
-
-17. **Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-貝氏定理的擴展 - 令 {Ai,i∈[[1,n]]} 為樣本空間的一個分割，我們定義：
-<br>
-
-18. **Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-獨立 - 當以下條件滿足時，兩個事件 A 和 B 為獨立事件：
-<br>
-
-19. **Random Variables**
-
-&#10230;
-隨機變數
-<br>
-
-20. **Definitions**
-
-&#10230;
-定義
-<br>
-
-21. **Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-隨機變數 - 一個隨機變數 X，它是一個將樣本空間中的每個元素映射到實數域的函數
-<br>
-
-22. **Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-累積分佈函數 (CDF) - 累積分佈函數 F 是單調且不減的函數，其 limx→−∞F(x)=0 且 limx→+∞F(x)=1，定義如下：
-<br>
-
-23. **Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-注意：P(a<X⩽B)=F(b)−F(a)
-<br>
-
-24. **Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-機率密度函數 - 機率密度函數 f 是隨機變數 X 在兩個相鄰的實數值附近取值的機率
-<br>
-
-25. **Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-機率密度函數和累積分佈函數的關係 - 底下是一些關於離散 (D) 和連續 (C) 的情況下的重要屬性
-<br>
-
-26. **[Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-[情況, 累積分佈函數 F, 機率密度函數 f, 機率密度函數的屬性]
-<br>
-
-27. **Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-分佈的期望值和動差 - 底下是期望值 E[X]、一般期望值  E[g(X)]、第 k 個動差和特徵函數 ψ(ω) 在離散和連續的情況下的表示式：
-<br>
-
-28. **Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-變異數 - 隨機變數的變異數通常表示為 Var(X) 或 σ2，用來衡量一個分佈離散程度的指標。其表示如下：
-<br>
-
-29. **Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-標準差 - 一個隨機變數的標準差通常表示為 σ，用來衡量一個分佈離散程度的指標，其單位和實際的隨機變數相容，表示如下：
-<br>
-
-30. **Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-隨機變數的轉換 - 令變數 X 和 Y 由某個函式連結在一起。我們定義 fX 和 fY 是 X 和 Y 的分佈函式，可以得到：
-<br>
-
-31. **Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-萊布尼茲積分法則 - 令 g 為 x 和 c 的函數，a 和 b 是依賴於 c 的的邊界，我們得到：
-<br>
-
-32. **Probability Distributions**
-
-&#10230;
-機率分佈
-<br>
-
-33. **Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-柴比雪夫不等式 - 令 X 是一隨機變數，期望值為 μ。對於 k, σ>0，我們有以下不等式：
-<br>
-
-34. **Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-主要的分佈 - 底下是我們需要熟悉的幾個主要的不等式：
-<br>
-
-35. **[Type, Distribution]**
-
-&#10230;
-[種類, 分佈]
-<br>
-
-36. **Jointly Distributed Random Variables**
-
-&#10230;
-聯合分佈隨機變數
-<br>
-
-37. **Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-邊緣密度和累積分佈 - 從聯合密度機率函數 fXY 中我們可以得到：
-<br>
-
-38. **[Case, Marginal density, Cumulative function]**
-
-&#10230;
-[種類, 邊緣密度函數, 累積函數]
-<br>
-
-39. **Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-條件密度 - X 對於 Y 的條件密度，通常用 fX|Y 表示如下：
-<br>
-
-40. **Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-獨立 - 當滿足以下條件時，我們稱隨機變數 X 和 Y 互相獨立：
-<br>
-
-41. **Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-共變異數 - 我們定義隨機變數 X 和 Y 的共變異數為 σ2XY 或 Cov(X,Y) 如下：
-<br>
-
-42. **Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-相關性 - 我們定義 σX、σY 為 X 和 Y 的標準差，而 X 和 Y 的相關係數 ρXY 定義如下：
-<br>
-
-43. **Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-注意一：對於任何隨機變數 X 和 Y 來說，ρXY∈[−1,1] 成立
-<br>
-
-44. **Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-注意二：當 X 和 Y 獨立時，ρXY=0
-<br>
-
-45. **Parameter estimation**
-
-&#10230;
-參數估計
-<br>
-
-46. **Definitions**
-
-&#10230;
-定義
-<br>
-
-47. **Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-隨機抽樣 - 隨機抽樣指的是 n 個隨機變數 X1,...,Xn 和 X 獨立且同分佈的集合
-<br>
-
-48. **Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-估計量 - 估計量是一個資料的函數，用來推斷在統計模型中未知參數的值
-<br>
-
-49. **Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-偏差 - 一個估計量的偏差 ^θ 定義為 ^θ 分佈期望值和真實值之間的差距：
-<br>
-
-50. **Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-注意：當 E[^θ]=θ 時，我們稱為不偏估計量
-<br>
-
-51. **Estimating the mean**
-
-&#10230;
-預估平均數
-<br>
-
-52. **Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯X and is defined as follows:**
-
-&#10230;
-樣本平均 - 一個隨機樣本的樣本平均是用來預估一個分佈的真實平均 μ，通常我們用 ¯X 來表示，定義如下：
-<br>
-
-53. **Remark: the sample mean is unbiased, i.e E[¯X]=μ.**
-
-&#10230;
-注意：當 E[¯X]=μ 時，則為不偏樣本平均
-<br>
-
-54. **Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-中央極限定理 - 當我們有一個隨機樣本 X1,...,Xn 滿足一個給定的分佈，其平均數為 μ，變異數為 σ2，我們有：
-<br>
-
-55. **Estimating the variance**
-
-&#10230;
-估計變異數
-<br>
-
-56. **Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-樣本變異數 - 一個隨機樣本的樣本變異數是用來估計一個分佈的真實變異數 σ2，通常使用 s2 或 ^σ2 來表示，定義如下：
-<br>
-
-57. **Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-注意：當 E[s2]=σ2 時，稱之為不偏樣本變異數
-<br>
-
-58. **Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-與樣本變異數的卡方關聯 - 令 s2 是一個隨機樣本的樣本變異數，我們可以得到：
-<br>
-
-**59. [Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-[介紹, 樣本空間, 事件, 排列]
-<br>
-
-**60. [Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-[條件機率, 貝氏定理, 獨立性]
-<br>
-
-**61. [Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-[隨機變數, 定義, 期望值, 變異數]
-<br>
-
-**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-[機率分佈, 柴比雪夫不等式, 主要分佈]
-<br>
-
-**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-[聯合分佈隨機變數, 密度, 共變異數, 相關]
-<br>
-
-**64. [Parameter estimation, Mean, Variance]**
-
-&#10230;
-[參數估計, 平均數, 變異數]
\ No newline at end of file

From 2fb63879d29751c504e5f1819d4ce9cc59656b14 Mon Sep 17 00:00:00 2001
From: kevingo <kevingo75@gmail.com>
Date: Tue, 8 Jan 2019 13:46:47 +0800
Subject: [PATCH 062/531] add zh-tw Probabilities and Statistics translation

---
 CONTRIBUTORS                   |   2 +
 zh-tw/refresher-probability.md | 382 +++++++++++++++++++++++++++++++++
 2 files changed, 384 insertions(+)
 create mode 100644 zh-tw/refresher-probability.md

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 5a3380228..f8358432b 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -117,3 +117,5 @@
   imironhead (review of unsupervised learning)
   johnnychhsu (review of unsupervised learning)
 
+  kevingo (translation of probabilities and statistics)
+
diff --git a/zh-tw/refresher-probability.md b/zh-tw/refresher-probability.md
new file mode 100644
index 000000000..4f95609e6
--- /dev/null
+++ b/zh-tw/refresher-probability.md
@@ -0,0 +1,382 @@
+1. **Probabilities and Statistics refresher**
+
+&#10230;
+機率和統計回顧
+<br>
+
+2. **Introduction to Probability and Combinatorics**
+
+&#10230;
+幾率與組合數學介紹
+<br>
+
+3. **Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
+
+&#10230;
+樣本空間 - 一個實驗的所有可能結果的集合稱之為這個實驗的樣本空間，記做 S
+<br>
+
+4. **Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
+
+&#10230;
+事件 - 樣本空間的任何子集合 E 被稱之為一個事件。也就是說，一個事件是實驗的可能結果的集合。如果該實驗的結果包含 E，我們稱我們稱 E 發生
+<br>
+
+5. **Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
+
+&#10230;
+機率公理。對於每個事件 E，我們用 P(E) 表示事件 E 發生的機率
+<br>
+
+6. **Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
+
+&#10230;
+公理 1 - 每一個機率值介於 0 到 1 之間，包含兩端點。即：
+<br>
+
+7. **Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
+
+&#10230;
+公理 2 - 至少一個基本事件出現在整個樣本空間中的機率是 1。即：
+<br>
+
+8. **Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
+
+&#10230;
+公理 3 - 對於任何互斥的事件 E1,...,En，我們定義如下：
+<br>
+
+9. **Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
+
+&#10230;
+排列 - 排列指的是從 n 個相異的物件中，取出 r 個物件按照固定順序重新安排，這樣安排的數量用 P(n,r) 來表示，定義為：
+<br>
+
+10. **Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
+
+&#10230;
+組合 - 組合指的是從 n 個物件中，取出 r 個物件，但不考慮他的順序。這樣組合要考慮的數量用 C(n,r) 來表示，定義為：
+<br>
+
+11. **Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
+
+&#10230;
+注意：對於 0⩽r⩽n，我們會有 P(n,r)⩾C(n,r)
+<br>
+
+12. **Conditional Probability**
+
+&#10230;
+條件機率
+<br>
+
+13. **Bayes' rule ― For events A and B such that P(B)>0, we have:**
+
+&#10230;
+貝氏定理 - 對於事件 A 和 B 滿足 P(B)>0 時，我們定義如下：
+<br>
+
+14. **Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
+
+&#10230;
+注意：P(A∩B)=P(A)P(B|A)=P(A|B)P(B)
+<br>
+
+15. **Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
+
+&#10230;
+分割 - 令 {Ai,i∈[[1,n]]} 對所有的 i，Ai≠∅，我們說 {Ai} 是一個分割，當底下成立時：
+<br>
+
+16. **Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
+
+&#10230;
+注意：對於任何在樣本空間的事件 B 來說，P(B)=n∑i=1P(B|Ai)P(Ai)
+<br>
+
+17. **Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
+
+&#10230;
+貝氏定理的擴展 - 令 {Ai,i∈[[1,n]]} 為樣本空間的一個分割，我們定義：
+<br>
+
+18. **Independence ― Two events A and B are independent if and only if we have:**
+
+&#10230;
+獨立 - 當以下條件滿足時，兩個事件 A 和 B 為獨立事件：
+<br>
+
+19. **Random Variables**
+
+&#10230;
+隨機變數
+<br>
+
+20. **Definitions**
+
+&#10230;
+定義
+<br>
+
+21. **Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
+
+&#10230;
+隨機變數 - 一個隨機變數 X，它是一個將樣本空間中的每個元素映射到實數域的函數
+<br>
+
+22. **Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
+
+&#10230;
+累積分佈函數 (CDF) - 累積分佈函數 F 是單調且不減的函數，其 limx→−∞F(x)=0 且 limx→+∞F(x)=1，定義如下：
+<br>
+
+23. **Remark: we have P(a<X⩽B)=F(b)−F(a).**
+
+&#10230;
+注意：P(a<X⩽B)=F(b)−F(a)
+<br>
+
+24. **Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
+
+&#10230;
+機率密度函數 - 機率密度函數 f 是隨機變數 X 在兩個相鄰的實數值附近取值的機率
+<br>
+
+25. **Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
+
+&#10230;
+機率密度函數和累積分佈函數的關係 - 底下是一些關於離散 (D) 和連續 (C) 的情況下的重要屬性
+<br>
+
+26. **[Case, CDF F, PDF f, Properties of PDF]**
+
+&#10230;
+[情況, 累積分佈函數 F, 機率密度函數 f, 機率密度函數的屬性]
+<br>
+
+27. **Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
+
+&#10230;
+分佈的期望值和動差 - 底下是期望值 E[X]、一般期望值  E[g(X)]、第 k 個動差和特徵函數 ψ(ω) 在離散和連續的情況下的表示式：
+<br>
+
+28. **Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
+
+&#10230;
+變異數 - 隨機變數的變異數通常表示為 Var(X) 或 σ2，用來衡量一個分佈離散程度的指標。其表示如下：
+<br>
+
+29. **Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
+
+&#10230;
+標準差 - 一個隨機變數的標準差通常表示為 σ，用來衡量一個分佈離散程度的指標，其單位和實際的隨機變數相容，表示如下：
+<br>
+
+30. **Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
+
+&#10230;
+隨機變數的轉換 - 令變數 X 和 Y 由某個函式連結在一起。我們定義 fX 和 fY 是 X 和 Y 的分佈函式，可以得到：
+<br>
+
+31. **Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
+
+&#10230;
+萊布尼茲積分法則 - 令 g 為 x 和 c 的函數，a 和 b 是依賴於 c 的的邊界，我們得到：
+<br>
+
+32. **Probability Distributions**
+
+&#10230;
+機率分佈
+<br>
+
+33. **Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
+
+&#10230;
+柴比雪夫不等式 - 令 X 是一隨機變數，期望值為 μ。對於 k, σ>0，我們有以下不等式：
+<br>
+
+34. **Main distributions ― Here are the main distributions to have in mind:**
+
+&#10230;
+主要的分佈 - 底下是我們需要熟悉的幾個主要的不等式：
+<br>
+
+35. **[Type, Distribution]**
+
+&#10230;
+[種類, 分佈]
+<br>
+
+36. **Jointly Distributed Random Variables**
+
+&#10230;
+聯合分佈隨機變數
+<br>
+
+37. **Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
+
+&#10230;
+邊緣密度和累積分佈 - 從聯合密度機率函數 fXY 中我們可以得到：
+<br>
+
+38. **[Case, Marginal density, Cumulative function]**
+
+&#10230;
+[種類, 邊緣密度函數, 累積函數]
+<br>
+
+39. **Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
+
+&#10230;
+條件密度 - X 對於 Y 的條件密度，通常用 fX|Y 表示如下：
+<br>
+
+40. **Independence ― Two random variables X and Y are said to be independent if we have:**
+
+&#10230;
+獨立 - 當滿足以下條件時，我們稱隨機變數 X 和 Y 互相獨立：
+<br>
+
+41. **Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
+
+&#10230;
+共變異數 - 我們定義隨機變數 X 和 Y 的共變異數為 σ2XY 或 Cov(X,Y) 如下：
+<br>
+
+42. **Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
+
+&#10230;
+相關性 - 我們定義 σX、σY 為 X 和 Y 的標準差，而 X 和 Y 的相關係數 ρXY 定義如下：
+<br>
+
+43. **Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
+
+&#10230;
+注意一：對於任何隨機變數 X 和 Y 來說，ρXY∈[−1,1] 成立
+<br>
+
+44. **Remark 2: If X and Y are independent, then ρXY=0.**
+
+&#10230;
+注意二：當 X 和 Y 獨立時，ρXY=0
+<br>
+
+45. **Parameter estimation**
+
+&#10230;
+參數估計
+<br>
+
+46. **Definitions**
+
+&#10230;
+定義
+<br>
+
+47. **Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
+
+&#10230;
+隨機抽樣 - 隨機抽樣指的是 n 個隨機變數 X1,...,Xn 和 X 獨立且同分佈的集合
+<br>
+
+48. **Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
+
+&#10230;
+估計量 - 估計量是一個資料的函數，用來推斷在統計模型中未知參數的值
+<br>
+
+49. **Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
+
+&#10230;
+偏差 - 一個估計量的偏差 ^θ 定義為 ^θ 分佈期望值和真實值之間的差距：
+<br>
+
+50. **Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
+
+&#10230;
+注意：當 E[^θ]=θ 時，我們稱為不偏估計量
+<br>
+
+51. **Estimating the mean**
+
+&#10230;
+預估平均數
+<br>
+
+52. **Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯X and is defined as follows:**
+
+&#10230;
+樣本平均 - 一個隨機樣本的樣本平均是用來預估一個分佈的真實平均 μ，通常我們用 ¯X 來表示，定義如下：
+<br>
+
+53. **Remark: the sample mean is unbiased, i.e E[¯X]=μ.**
+
+&#10230;
+注意：當 E[¯X]=μ 時，則為不偏樣本平均
+<br>
+
+54. **Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
+
+&#10230;
+中央極限定理 - 當我們有一個隨機樣本 X1,...,Xn 滿足一個給定的分佈，其平均數為 μ，變異數為 σ2，我們有：
+<br>
+
+55. **Estimating the variance**
+
+&#10230;
+估計變異數
+<br>
+
+56. **Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
+
+&#10230;
+樣本變異數 - 一個隨機樣本的樣本變異數是用來估計一個分佈的真實變異數 σ2，通常使用 s2 或 ^σ2 來表示，定義如下：
+<br>
+
+57. **Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
+
+&#10230;
+注意：當 E[s2]=σ2 時，稱之為不偏樣本變異數
+<br>
+
+58. **Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
+
+&#10230;
+與樣本變異數的卡方關聯 - 令 s2 是一個隨機樣本的樣本變異數，我們可以得到：
+<br>
+
+**59. [Introduction, Sample space, Event, Permutation]**
+
+&#10230;
+[介紹, 樣本空間, 事件, 排列]
+<br>
+
+**60. [Conditional probability, Bayes' rule, Independence]**
+
+&#10230;
+[條件機率, 貝氏定理, 獨立性]
+<br>
+
+**61. [Random variables, Definitions, Expectation, Variance]**
+
+&#10230;
+[隨機變數, 定義, 期望值, 變異數]
+<br>
+
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
+
+&#10230;
+[機率分佈, 柴比雪夫不等式, 主要分佈]
+<br>
+
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
+
+&#10230;
+[聯合分佈隨機變數, 密度, 共變異數, 相關]
+<br>
+
+**64. [Parameter estimation, Mean, Variance]**
+
+&#10230;
+[參數估計, 平均數, 變異數]
\ No newline at end of file

From 3fa6df07f5f04a89b5ae5066ea035d4756a47e9a Mon Sep 17 00:00:00 2001
From: kevingo <kevingo75@gmail.com>
Date: Thu, 10 Jan 2019 17:12:24 +0800
Subject: [PATCH 063/531] Modify content suggested by reviewer

---
 zh-tw/refresher-probability.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/zh-tw/refresher-probability.md b/zh-tw/refresher-probability.md
index 4f95609e6..0db481cf5 100644
--- a/zh-tw/refresher-probability.md
+++ b/zh-tw/refresher-probability.md
@@ -127,7 +127,7 @@
 22. **Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
 
 &#10230;
-累積分佈函數 (CDF) - 累積分佈函數 F 是單調且不減的函數，其 limx→−∞F(x)=0 且 limx→+∞F(x)=1，定義如下：
+累積分佈函數 (CDF) - 累積分佈函數 F 是單調遞增的函數，其 limx→−∞F(x)=0 且 limx→+∞F(x)=1，定義如下：
 <br>
 
 23. **Remark: we have P(a<X⩽B)=F(b)−F(a).**

From bc3057b8e6649726ea1231e10e5e2ffbe63078a9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Yavuz=20K=C3=B6me=C3=A7o=C4=9Flu?=
 <komecoglu.yavuz@gmail.com>
Date: Thu, 10 Jan 2019 17:11:38 +0300
Subject: [PATCH 064/531] [tr] Unsupervised Learning

all translated
---
 tr/cheatsheet-unsupervised-learning.md | 114 ++++++++++++-------------
 1 file changed, 57 insertions(+), 57 deletions(-)

diff --git a/tr/cheatsheet-unsupervised-learning.md b/tr/cheatsheet-unsupervised-learning.md
index 5eae29ed8..3d3d17e26 100644
--- a/tr/cheatsheet-unsupervised-learning.md
+++ b/tr/cheatsheet-unsupervised-learning.md
@@ -1,340 +1,340 @@
 **1. Unsupervised Learning cheatsheet**
 
-&#10230;
+&#10230; Gözetimsiz Öğrenme El Kitabı
 
 <br>
 
 **2. Introduction to Unsupervised Learning**
 
-&#10230;
+&#10230; Gözetimsiz Öğrenmeye Giriş
 
 <br>
 
 **3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
 
-&#10230;
+&#10230; Motivasyon ― Gözetimsiz öğrenmenin amacı etiketlenmemiş verilerdeki gizli örüntüleri bulmaktır {x (1), ..., x (m)}.
 
 <br>
 
 **4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
 
-&#10230;
+&#10230; Jensen eşitsizliği - Bir konveks fonksiyon ve X bir rastgele değişken olsun. Aşağıdaki eşitsizliklerimiz:
 
 <br>
 
 **5. Clustering**
 
-&#10230;
+&#10230; Kümeleme
 
 <br>
 
 **6. Expectation-Maximization**
 
-&#10230;
+&#10230; Beklenti-Enbüyütme (Maksimizasyon) 
 
 <br>
 
 **7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
 
-&#10230;
+&#10230; Gizli değişkenler - Gizli değişkenler, tahmin problemlerini zorlaştıran ve çoğunlukla z olarak adlandırılan gizli / gözlemlenmemiş değişkenlerdir. Gizli değişkenlerin bulunduğu en yaygın ortamlar şunlardır:
 
 <br>
 
 **8. [Setting, Latent variable z, Comments]**
 
-&#10230;
+&#10230; Ortam, Gizli değişken z, Yorumlar
 
 <br>
 
 **9. [Mixture of k Gaussians, Factor analysis]**
 
-&#10230;
+&#10230; [K Gaussianların karışımı, Faktör analizi]
 
 <br>
 
 **10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
 
-&#10230;
+&#10230; Algoritma - Beklenti-Enbüyütme (Maksimizasyon) (BE) algoritması, parametrenin maksimum olabilirlik kestirimiyle, olasılığa (E-adımı) tekrar tekrar bir alt-yapı inşa ederek ve bu alt sınırı (M-adımı) aşağıdaki gibi optimize ederek tahmin etmede etkili bir yöntem sunar:
 
 <br>
 
 **11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
 
-&#10230;
+&#10230; E-adımı: Her bir veri noktasının x(i)'in belirli bir kümeden z(i) aşağıdaki gibi olduğunu gösteren posterior olasılık Qi(z(i)) değerlendiriniz:
 
 <br>
 
 **12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
 
-&#10230;
+&#10230; M-adımı: Her bir küme modelini aşağıdaki gibi ayrı ayrı yeniden tahmin etmek için x(i) veri noktalarındaki kümeye özgü ağırlıklar olarak posterior olasılıkları Qi(z(i)) kullanın:
 
 <br>
 
 **13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
 
-&#10230;
+&#10230; [Gauss ilklendirme, Beklenti adımı, Maksimizasyon adımı, Yakınsaklık]
 
 <br>
 
 **14. k-means clustering**
 
-&#10230;
+&#10230; k-ortalamalar (k-means) kümeleme
 
 <br>
 
 **15. We note c(i) the cluster of data point i and μj the center of cluster j.**
 
-&#10230;
+&#10230; C(i), i veri noktasının bulunduğu küme olmak üzere, μj j kümesinin merkez noktasıdır.
 
 <br>
 
 **16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
 
-&#10230;
+&#10230; Algoritma - Küme ortalamaları μ1, μ2, ..., μk∈Rn rasgele olarak başlatıldıktan sonra, k-ortalama algoritması yakınsayana kadar aşağıdaki adımı tekrar eder:
 
 <br>
 
 **17. [Means initialization, Cluster assignment, Means update, Convergence]**
 
-&#10230;
+&#10230; [Başlangıç ortalaması, Küme Tanımlama, Ortalama Güncelleme, Yakınsama]
 
 <br>
 
 **18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
 
-&#10230;
+&#10230; Bozulma fonksiyonu - Algoritmanın yakınsadığını görmek için aşağıdaki gibi tanımlanan bozulma fonksiyonuna bakarız:
 
 <br>
 
 **19. Hierarchical clustering**
 
-&#10230;
+&#10230; Hiyerarşik kümeleme
 
 <br>
 
 **20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
 
-&#10230;
+&#10230; Algoritma - Ardışık olarak iç içe geçmiş kümelerden oluşturan hiyerarşik bir yaklaşıma sahip bir kümeleme algoritmasıdır.
 
 <br>
 
 **21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
 
-&#10230;
+&#10230; Türler - Aşağıdaki tabloda özetlenen farklı amaç fonksiyonlarını optimize etmeyi amaçlayan farklı hiyerarşik kümeleme algoritmaları vardır:
 
 <br>
 
 **22. [Ward linkage, Average linkage, Complete linkage]**
 
-&#10230;
+&#10230; [Ward bağlantı, Ortalama bağlantı, Tam bağlantı]
 
 <br>
 
 **23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
 
-&#10230;
+&#10230; [Küme mesafesi içinde minimize edin, Küme çiftleri arasındaki ortalama uzaklığı en aza indirin, Küme çiftleri arasındaki maksimum uzaklığı en aza indirin]
 
 <br>
 
 **24. Clustering assessment metrics**
 
-&#10230;
+&#10230; Kümeleme değerlendirme metrikleri
 
 <br>
 
 **25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
 
-&#10230;
+&#10230; Gözetimsiz bir öğrenme ortamında, bir modelin performansını değerlendirmek çoğu zaman zordur, çünkü gözetimli öğrenme ortamında olduğu gibi, gerçek referans etiketlere sahip değiliz.
 
 <br>
 
 **26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
 
-&#10230;
+&#10230; Siluet katsayısı - Bir örnek ile aynı sınıftaki diğer tüm noktalar arasındaki ortalama mesafeyi ve bir örnek ile bir sonraki en yakın kümedeki diğer tüm noktalar arasındaki ortalama mesafeyi not ederek, tek bir örnek için siluet katsayısı aşağıdaki gibi tanımlanır:
 
 <br>
 
 **27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
 
-&#10230;
+&#10230; Calinski-Harabaz indeksi - Kümelerin sayısını, Bk ve Wk'yi, sırasıyla, küme olarak adlandırılan küme ve dağılma matrisleri olarak tanımlayarak
 
 <br>
 
 **28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
 
-&#10230;
+&#10230; Calinski-Harabaz indeksi s(k), kümelenme modelinin kümeleri ne kadar iyi tanımladığını gösterir, böylece skor ne kadar yüksek olursa, kümeler daha yoğun ve iyi ayrılır. Aşağıdaki şekilde tanımlanmıştır:
 
 <br>
 
 **29. Dimension reduction**
 
-&#10230;
+&#10230; Boyut küçültme
 
 <br>
 
 **30. Principal component analysis**
 
-&#10230;
+&#10230; Temel bileşenler analizi
 
 <br>
 
 **31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
 
-&#10230;
+&#10230; Verilerin yansıtılacağı yönleri maksimize eden varyansı bulan bir boyut küçültme tekniğinidir.
 
 <br>
 
 **32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
-&#10230;
+&#10230; Özdeğer, özvektör - Bir matris A∈Rn×n verildiğinde λ'nın, özvektör olarak adlandırılan bir vektör z∈Rn∖{0} varsa, A'nın bir özdeğeri olduğu söylenir:
 
 <br>
 
 **33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230;
+&#10230; Spektral teorem - A∈Rn×n olsun. Eğer A simetrik ise, o zaman A gerçek ortogonal matris U∈Rn×n n ile diyagonalleştirilebilir. Λ=diag(λ1, ..., λn) yazarak, bizde:
 
 <br>
 
 **34. diagonal**
 
-&#10230;
+&#10230; diyagonal
 
 <br>
 
 **35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
 
-&#10230;
+&#10230; Not: En büyük özdeğere sahip özvektör, matris A'nın temel özvektörü olarak adlandırılır.
 
 <br>
 
 **36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
 dimensions by maximizing the variance of the data as follows:**
 
-&#10230;
+&#10230; Algoritma - Temel Bileşen Analizi (TBA) yöntemi, verilerin aşağıdaki gibi varyansı en üst düzeye çıkararak veriyi k boyutlarına yansıtan bir boyut azaltma tekniğidir:
 
 <br>
 
 **37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
 
-&#10230;
+&#10230; Adım 1: Verileri ortalama 0 ve standart sapma 1 olacak şekilde normalleştirin.
 
 <br>
 
 **38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
 
-&#10230;
+&#10230; Adım 2: Gerçek özdeğerler ile simetrik olan Σ=1mm∑i=1x(i)x(i)T∈Rn×n hesaplayın.
 
 <br>
 
 **39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
 
-&#10230;
+&#10230; Adım 3: u1, ...,uk∈Rn 'yi hesaplayın, Σ ort'nin ortogonal ana özvektörlerini, yani k en büyük özdeğerlerin ortogonal özvektörlerini.
 
 <br>
 
 **40. Step 4: Project the data on spanR(u1,...,uk).**
 
-&#10230;
+&#10230; Adım 4: spanR (u1, ..., uk) üzerindeki verileri gösterin.
 
 <br>
 
 **41. This procedure maximizes the variance among all k-dimensional spaces.**
 
-&#10230;
+&#10230; Bu yöntem tüm k-boyutlu uzaylar arasındaki varyansı en üst düzeye çıkarır.
 
 <br>
 
 **42. [Data in feature space, Find principal components, Data in principal components space]**
 
-&#10230;
+&#10230; [Öznitelik uzayında veri, Temel bileşenleri bul, Temel bileşenler uzayında veri]
 
 <br>
 
 **43. Independent component analysis**
 
-&#10230;
+&#10230; Bağımsız bileşen analizi
 
 <br>
 
 **44. It is a technique meant to find the underlying generating sources.**
 
-&#10230;
+&#10230; Temel oluşturan kaynakları bulmak için kullanılan bir tekniktir.
 
 <br>
 
 **45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
 
-&#10230;
+&#10230; Varsayımlar - Verilerin x'in n boyutlu kaynak vektörü s=(s1, ..., sn) tarafından üretildiğini varsayıyoruz, burada si bağımsız rasgele değişkenler, bir karışım ve tekil olmayan bir matris A ile aşağıdaki gibi:
 
 <br>
 
 **46. The goal is to find the unmixing matrix W=A−1.**
 
-&#10230;
+&#10230; Amaç, işlem görmemiş matrisini W=A−1 bulmaktır.
 
 <br>
 
 **47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
 
-&#10230;
+&#10230; Bell ve Sejnowski ICA algoritması - Bu algoritma, aşağıdaki adımları izleyerek işlem görmemiş matrisi W'yi bulur:
 
 <br>
 
 **48. Write the probability of x=As=W−1s as:**
 
-&#10230;
+&#10230; X=As=W−1s olasılığını aşağıdaki gibi yazınız:
 
 <br>
 
 **49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
 
-&#10230;
+&#10230; Eğitim verisi {x(i),i∈[[1, m]]} ve sigmoid fonksiyonunu g olarak not ederek log olasılığını yazınız:
 
 <br>
 
 **50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
 
-&#10230;
+&#10230; Bu nedenle, rassal (stokastik) eğim yükselme öğrenme kuralı, her bir eğitim örneği için x(i), W'yi aşağıdaki gibi güncelleştiririz:
 
 <br>
 
 **51. The Machine Learning cheatsheets are now available in Turkish.**
 
-&#10230;
+&#10230; Makine Öğrenmesi El Kitabı artık Türkçe dilinde mevcuttur.
 
 <br>
 
 **52. Original authors**
 
-&#10230;
+&#10230; Orjinal yazarlar
 
 <br>
 
 **53. Translated by X, Y and Z**
 
-&#10230;
+&#10230; X, Y ve Z ile çevrilmiştir.
 
 <br>
 
 **54. Reviewed by X, Y and Z**
 
-&#10230;
+&#10230; X, Y ve Z tarafından yorumlandı
 
 <br>
 
 **55. [Introduction, Motivation, Jensen's inequality]**
 
-&#10230;
+&#10230; [Giriş, Motivasyon, Jensen'in eşitsizliği]
 
 <br>
 
 **56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
 
-&#10230;
+&#10230; [Kümeleme, Beklenti-Enbüyütme(Maksimizasyon), k-ortalamalar, Hiyerarşik kümeleme, Metrikler]
 
 <br>
 
 **57. [Dimension reduction, PCA, ICA]**
 
-&#10230;
+&#10230; [Boyut küçültme, TBA(PCA), BBA(ICA)]

From 5a7224cbce129141485aac38c79a86bcbd42c748 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Yavuz=20K=C3=B6me=C3=A7o=C4=9Flu?=
 <komecoglu.yavuz@gmail.com>
Date: Thu, 10 Jan 2019 17:13:33 +0300
Subject: [PATCH 065/531] [tr] Deep learning

deficiencies completed and some minor corrections were made
---
 tr/cheatsheet-deep-learning.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/tr/cheatsheet-deep-learning.md b/tr/cheatsheet-deep-learning.md
index da5226222..7c8b3e29e 100644
--- a/tr/cheatsheet-deep-learning.md
+++ b/tr/cheatsheet-deep-learning.md
@@ -24,7 +24,7 @@
 
 **5. [Input layer, hidden layer, output layer]**
 
-&#10230; [Giriş katmanı, gizli katman, ürün katmanı]
+&#10230; [Giriş katmanı, gizli katman, çıkış katmanı]
 
 <br>
 
@@ -60,7 +60,7 @@
 
 **11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
 
-&#10230; Öğrenme derecesi ― Öğrenme derecesi, sıklıkla α veya bazen η olarak belirtilir, ağırlıkların hangi tempoda güncellendiğini gösterir. Bu derece sabit olabilir veya uyarlamalı olarak değişebilir. Mevcut en gözde yöntem Adam olarak adlandırılan ve öğrenme oranını uyarlayan bir yöntemdir.
+&#10230; Öğrenme oranı ― Öğrenme oranı, sıklıkla α veya bazen η olarak belirtilir, ağırlıkların hangi tempoda güncellendiğini gösterir. Bu derece sabit olabilir veya uyarlamalı olarak değişebilir. Mevcut en gözde yöntem Adam olarak adlandırılan ve öğrenme oranını uyarlayan bir yöntemdir.
 
 <br>
 
@@ -150,7 +150,7 @@
 
 **26. [Input gate, forget gate, gate, output gate]**
 
-&#10230; [Girdi kapısı, unutma kapısı, kapı, ürün kapısı]
+&#10230; [Girdi kapısı, unutma kapısı, kapı, çıktı kapısı]
 
 <br>
 
@@ -294,28 +294,28 @@
 
 **50. View PDF version on GitHub**
 
-&#10230;
+&#10230; GitHub'da PDF sürümünü görüntüle
 
 <br>
 
 **51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
 
-&#10230;
+&#10230; [Yapay Sinir Ağları, Mimari, Aktivasyon fonksiyonu, Geri yayılım, Seyreltme]
 
 <br>
 
 **52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
 
-&#10230;
+&#10230; [Evrişimsel Sinir Ağları, Evreşim katmanı, Toplu normalizasyon]
 
 <br>
 
 **53. [Recurrent Neural Networks, Gates, LSTM]**
 
-&#10230;
+&#10230; [Yinelenen Sinir Ağları, Kapılar, LSTM]
 
 <br>
 
 **54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
 
-&#10230;
+&#10230; [Pekiştirmeli öğrenme, Markov karar süreçleri, Değer/politika iterasyonu, Yaklaşık dinamik programlama, Politika araştırması]

From ff0a4120c3bdc7388290b59225a33a89ccc2ef23 Mon Sep 17 00:00:00 2001
From: kevingo <kevingo75@gmail.com>
Date: Sat, 12 Jan 2019 11:04:40 +0800
Subject: [PATCH 066/531] Update CONTRIBUTOR

---
 CONTRIBUTORS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index f8358432b..02a5d71e8 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -118,4 +118,5 @@
   johnnychhsu (review of unsupervised learning)
 
   kevingo (translation of probabilities and statistics)
+  johnnychhsu (review of probabilities and statistics)
 

From b2b556145c20106ecad3db28cf64e439d2840141 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 13 Jan 2019 00:18:57 -0800
Subject: [PATCH 067/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index a3bb264ff..ee0647276 100644
--- a/README.md
+++ b/README.md
@@ -84,4 +84,4 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/94)|
 
 ## Acknowledgements
-Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching.html).
+Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From 38d303c797cd13c6fb757e50dd97fa9dcd1a600f Mon Sep 17 00:00:00 2001
From: Zaid Alyafeai <alyafey22@gmail.com>
Date: Fri, 18 Jan 2019 00:08:38 +0300
Subject: [PATCH 068/531] Respond to review comments

---
 ar/refresher-linear-algebra.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/ar/refresher-linear-algebra.md b/ar/refresher-linear-algebra.md
index 7037782e7..d0e88a543 100644
--- a/ar/refresher-linear-algebra.md
+++ b/ar/refresher-linear-algebra.md
@@ -1,7 +1,7 @@
 **1. Linear Algebra and Calculus refresher**
 
 <div dir="rtl">
-  ملخص عن الجبر الخطي
+ملخص الجبر الخطي و التفاضل و التكامل
 </div>
 <br>
 
@@ -36,7 +36,7 @@
 
 **6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
 <div dir="rtl">
-ملاحظة : المتجه $x$ المعرف مسبقا يمكن اعتباره مصفوفة من الشكل $n \times 1$ والذي يتم تسميته ب مصفوفة من عمود واحد.
+ملاحظة : المتجه $x$ المعرف مسبقا يمكن اعتباره مصفوفة من الشكل $n \times 1$ والذي يسمى ب مصفوفة من عمود واحد.
 </div>
 
 <br>
@@ -95,7 +95,7 @@
 **14. Vector-vector ― There are two types of vector-vector products:**
 
 <div dir="rtl">
-  متجه و متجه - هناك نوعين من الضرب ل متجه - متجه : 
+  ضرب المتجهات - توجد طريقتين لضرب متجه بمتجه : 
 </div>
 
 <br>
@@ -134,7 +134,7 @@
 **19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
 
 <div dir="rtl">
-  مصفوفة - مصفوفة - ضرب المصفوفة $A \in \mathbb{R}^{n \times m}$ و $A \in \mathbb{R}^{n \times p}$ ينتجه عنه المصفوفة $A \in \mathbb{R}^{n \times p}$ حيث أن : 
+  ضرب مصفوفة ومصفوفة - ضرب المصفوفة $A \in \mathbb{R}^{n \times m}$ و $A \in \mathbb{R}^{n \times p}$ ينتجه عنه المصفوفة $A \in \mathbb{R}^{n \times p}$ حيث أن : 
 </div>
 
 <br>

From 023b4f07eb7a262267d28034aad97f2954a9fff6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Yavuz=20K=C3=B6me=C3=A7o=C4=9Flu?=
 <komecoglu.yavuz@gmail.com>
Date: Sun, 20 Jan 2019 23:56:17 +0300
Subject: [PATCH 069/531] Fixed missing in review

---
 tr/cheatsheet-unsupervised-learning.md | 26 +++++++++++++-------------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/tr/cheatsheet-unsupervised-learning.md b/tr/cheatsheet-unsupervised-learning.md
index 3d3d17e26..6ef596702 100644
--- a/tr/cheatsheet-unsupervised-learning.md
+++ b/tr/cheatsheet-unsupervised-learning.md
@@ -18,7 +18,7 @@
 
 **4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
 
-&#10230; Jensen eşitsizliği - Bir konveks fonksiyon ve X bir rastgele değişken olsun. Aşağıdaki eşitsizliklerimiz:
+&#10230; Jensen eşitsizliği - f bir konveks fonksiyon ve X bir rastgele değişken olsun. Aşağıdaki eşitsizliklerimiz:
 
 <br>
 
@@ -30,37 +30,37 @@
 
 **6. Expectation-Maximization**
 
-&#10230; Beklenti-Enbüyütme (Maksimizasyon) 
+&#10230; Beklenti-Ençoklama (Maksimizasyon)
 
 <br>
 
 **7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
 
-&#10230; Gizli değişkenler - Gizli değişkenler, tahmin problemlerini zorlaştıran ve çoğunlukla z olarak adlandırılan gizli / gözlemlenmemiş değişkenlerdir. Gizli değişkenlerin bulunduğu en yaygın ortamlar şunlardır:
+&#10230; Gizli değişkenler - Gizli değişkenler, tahmin problemlerini zorlaştıran ve çoğunlukla z olarak adlandırılan gizli / gözlemlenmemiş değişkenlerdir. Gizli değişkenlerin bulunduğu yerlerdeki en yaygın ayarlar şöyledir:
 
 <br>
 
 **8. [Setting, Latent variable z, Comments]**
 
-&#10230; Ortam, Gizli değişken z, Yorumlar
+&#10230; Yöntem, Gizli değişken z, Açıklamalar
 
 <br>
 
 **9. [Mixture of k Gaussians, Factor analysis]**
 
-&#10230; [K Gaussianların karışımı, Faktör analizi]
+&#10230; [K Gaussianların birleşimi, Faktör analizi]
 
 <br>
 
 **10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
 
-&#10230; Algoritma - Beklenti-Enbüyütme (Maksimizasyon) (BE) algoritması, parametrenin maksimum olabilirlik kestirimiyle, olasılığa (E-adımı) tekrar tekrar bir alt-yapı inşa ederek ve bu alt sınırı (M-adımı) aşağıdaki gibi optimize ederek tahmin etmede etkili bir yöntem sunar:
+&#10230; Algoritma - Beklenti-Ençoklama (Maksimizasyon) (BE) algoritması, θ parametresinin maksimum olabilirlik kestirimiyle tahmin edilmesinde, olasılığa ard arda alt sınırlar oluşturan (E-adımı) ve bu alt sınırın (M-adımı) aşağıdaki gibi optimize edildiği etkin bir yöntem sunar:
 
 <br>
 
 **11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
 
-&#10230; E-adımı: Her bir veri noktasının x(i)'in belirli bir kümeden z(i) aşağıdaki gibi olduğunu gösteren posterior olasılık Qi(z(i)) değerlendiriniz:
+&#10230; E-adımı: Her bir veri noktasının x(i)'in belirli bir kümeden z(i) geldiğinin sonsal olasılık değerinin Qi(z(i)) hesaplanması aşağıdaki gibidir:
 
 <br>
 
@@ -90,7 +90,7 @@
 
 **16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
 
-&#10230; Algoritma - Küme ortalamaları μ1, μ2, ..., μk∈Rn rasgele olarak başlatıldıktan sonra, k-ortalama algoritması yakınsayana kadar aşağıdaki adımı tekrar eder:
+&#10230; Algoritma - Küme ortalamaları μ1, μ2, ..., μk∈Rn rasgele olarak başlatıldıktan sonra, k-ortalamalar algoritması yakınsayana kadar aşağıdaki adımı tekrar eder:
 
 <br>
 
@@ -156,7 +156,7 @@
 
 **27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
 
-&#10230; Calinski-Harabaz indeksi - Kümelerin sayısını, Bk ve Wk'yi, sırasıyla, küme olarak adlandırılan küme ve dağılma matrisleri olarak tanımlayarak
+&#10230; Calinski-Harabaz indeksi - k kümelerin sayısını belirtmek üzere Bk ve Wk sırasıyla, kümeler arası ve küme içi dağılım matrisleri olarak aşağıdaki gibi tanımlanır
 
 <br>
 
@@ -229,7 +229,7 @@ dimensions by maximizing the variance of the data as follows:**
 
 **39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
 
-&#10230; Adım 3: u1, ...,uk∈Rn 'yi hesaplayın, Σ ort'nin ortogonal ana özvektörlerini, yani k en büyük özdeğerlerin ortogonal özvektörlerini.
+&#10230; u1, ...,uk∈Rn olmak üzere Σ ort'nin ortogonal ana özvektörlerini, yani k en büyük özdeğerlerin ortogonal özvektörlerini hesaplayın.
 
 <br>
 
@@ -247,7 +247,7 @@ dimensions by maximizing the variance of the data as follows:**
 
 **42. [Data in feature space, Find principal components, Data in principal components space]**
 
-&#10230; [Öznitelik uzayında veri, Temel bileşenleri bul, Temel bileşenler uzayında veri]
+&#10230; [Öznitelik uzayındaki veri, Temel bileşenleri bul, Temel bileşenler uzayındaki veri]
 
 <br>
 
@@ -289,7 +289,7 @@ dimensions by maximizing the variance of the data as follows:**
 
 **49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
 
-&#10230; Eğitim verisi {x(i),i∈[[1, m]]} ve sigmoid fonksiyonunu g olarak not ederek log olasılığını yazınız:
+&#10230; Eğitim verisi {x(i),i∈[[1, m]]} ve g sigmoid fonksiyonunu not ederek log olasılığını yazınız:
 
 <br>
 
@@ -331,7 +331,7 @@ dimensions by maximizing the variance of the data as follows:**
 
 **56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
 
-&#10230; [Kümeleme, Beklenti-Enbüyütme(Maksimizasyon), k-ortalamalar, Hiyerarşik kümeleme, Metrikler]
+&#10230; [Kümeleme, Beklenti-Ençoklama (Maksimizasyon), k-ortalamalar, Hiyerarşik kümeleme, Metrikler]
 
 <br>
 

From 3a5ae2d492af8e405c6069aa177c96d06218537e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Yavuz=20K=C3=B6me=C3=A7o=C4=9Flu?=
 <komecoglu.yavuz@gmail.com>
Date: Sun, 20 Jan 2019 23:57:48 +0300
Subject: [PATCH 070/531] Fixed missing in review

---
 tr/cheatsheet-unsupervised-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tr/cheatsheet-unsupervised-learning.md b/tr/cheatsheet-unsupervised-learning.md
index 6ef596702..c6392c414 100644
--- a/tr/cheatsheet-unsupervised-learning.md
+++ b/tr/cheatsheet-unsupervised-learning.md
@@ -66,7 +66,7 @@
 
 **12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
 
-&#10230; M-adımı: Her bir küme modelini aşağıdaki gibi ayrı ayrı yeniden tahmin etmek için x(i) veri noktalarındaki kümeye özgü ağırlıklar olarak posterior olasılıkları Qi(z(i)) kullanın:
+&#10230; M-adımı: Her bir küme modelini ayrı ayrı yeniden tahmin etmek için x(i) veri noktalarındaki kümeye özgü ağırlıklar olarak Qi(z(i)) sonsal olasılıklarının kullanımı aşağıdaki gibidir:
 
 <br>
 

From dde8ed5889c812a8001978aeececbb3fd268fdf7 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 20 Jan 2019 13:13:04 -0800
Subject: [PATCH 071/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index ee0647276..f927a488d 100644
--- a/README.md
+++ b/README.md
@@ -68,7 +68,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |Deep learning|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|
 |Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|
-|Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|not started|not started|not started|not started|not started|
+|Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|not started|not started|done|not started|not started|
 |ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/39)|not started|not started|
 |Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|not started|not started|not started|not started|not started|
 |Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|not started|not started|done|not started|not started|

From 589af7241f4b0a523adfb765ee9129a5d8feacfb Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 20 Jan 2019 13:14:32 -0800
Subject: [PATCH 072/531] Add [tr] contributors

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 02a5d71e8..caf2f5495 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -97,6 +97,9 @@
   Kadir Tekeli (translation of linear algebra)
   Ekrem Çetinkaya (review of linear algebra)
   
+  Yavuz Kömeçoğlu (translation of unsupervised learning)
+  Başak Buluz (review of unsupervised learning)
+  
 --uk
   Gregory Reshetniak (translation of probabilities and statistics)
   Denys (review of probabilities and statistics)

From 0d12ec0b49e518d7deeb3f53fccf06cdc5ed921e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ba=C5=9Fak=20Buluz?=
 <41359672+basakbuluz@users.noreply.github.com>
Date: Thu, 24 Jan 2019 13:46:36 +0300
Subject: [PATCH 073/531] [tr] Supervised Learning

---
 tr/cheatsheet-supervised-learning.md | 258 +++++++++++++--------------
 1 file changed, 129 insertions(+), 129 deletions(-)

diff --git a/tr/cheatsheet-supervised-learning.md b/tr/cheatsheet-supervised-learning.md
index a6b19ea1c..fe66b2f48 100644
--- a/tr/cheatsheet-supervised-learning.md
+++ b/tr/cheatsheet-supervised-learning.md
@@ -1,567 +1,567 @@
 **1. Supervised Learning cheatsheet**
 
-&#10230;
+&#10230; Gözetimli Öğrenme El kitabı
 
-<br>
+<br> 
 
 **2. Introduction to Supervised Learning**
 
-&#10230;
+&#10230; Gözetimli Öğrenmeye Giriş
 
-<br>
+<br> 
 
 **3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
 
-&#10230;
+&#10230; {y(1),...,y(m)} çıktı kümesi ile ilişkili olan {x(1),...,x(m)} veri noktalarının kümesi göz önüne alındığında, y'den x'i nasıl tahmin edebileceğimizi öğrenen bir sınıflandırıcı tasarlamak istiyoruz. 
 
-<br>
+<br> 
 
 **4. Type of prediction ― The different types of predictive models are summed up in the table below:**
 
-&#10230;
+&#10230; Tahmin türü ― Farklı tahmin modelleri aşağıdaki tabloda özetlenmiştir: 
 
-<br>
+<br> 
 
 **5. [Regression, Classifier, Outcome, Examples]**
 
-&#10230;
+&#10230; [Regresyon, Sınıflandırıcı, Çıktı , Örnekler]
 
-<br>
+<br> 
 
 **6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
 
-&#10230;
+&#10230; Sürekli, Sınıf, Lineer regresyon, Lojistik regresyon, Destek Vektör Makineleri (DVM), Naive Bayes]
 
-<br>
+<br> 
 
 **7. Type of model ― The different models are summed up in the table below:**
 
-&#10230;
+&#10230; Model türleri ― Farklı modeller aşağıdaki tabloda özetlenmiştir:
 
-<br>
+<br> 
 
 **8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
 
-&#10230;
+&#10230; [Ayırt edici model, Üretici model, Amaç, Öğrenilenler, Örnekleme, Örnekler]
 
-<br>
+<br> 
 
 **9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
 
-&#10230;
+&#10230; [ Doğrudan tahmin P (y|x), P (y|x)'i tahmin etmek için P(x|y)'i tahmin etme, Karar Sınırı, Verilerin olasılık dağılımı, Regresyon, Destek Vektör Makineleri, Gauss Diskriminant Analizi, Naive Bayes] 
 
-<br>
+<br> 
 
 **10. Notations and general concepts**
 
-&#10230;
+&#10230; Gösterimler ve genel konsept
 
 <br>
 
 **11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
 
-&#10230;
+&#10230;  Hipotez ― Hipotez hθ olarak belirtilmiştir ve bu bizim seçtiğimiz modeldir. Verilen x(i) verisi için modelin tahminlediği çıktı hθ(x(i))'dir.
 
-<br>
+<br> 
 
 **12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
 
-&#10230;
+&#10230;  Kayıp fonksiyonu ― L:(z,y)∈R×Y⟼L(z,y)∈R şeklinde tanımlanan bir kayıp fonksiyonu y gerçek değerine karşılık geleceği öngörülen z değerini girdi olarak alan ve ne kadar farklı olduklarını gösteren bir fonksiyondur. Yaygın kayıp fonksiyonları aşağıdaki tabloda özetlenmiştir:
 
-<br>
+<br> 
 
 **13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
 
-&#10230;
+&#10230; [En küçük kareler hatası, Lojistik kaybı, Menteşe yitimi, Çapraz entropi]
 
 <br>
 
 **14. [Linear regression, Logistic regression, SVM, Neural Network]**
 
-&#10230;
+&#10230; [Lineer regresyon, Lojistik regresyon, Destek Vektör Makineleri, Sinir Ağı]
 
 <br>
 
 **15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
 
-&#10230;
+&#10230; Maliyet fonksiyonu ― J maliyet fonksiyonu genellikle bir modelin performansını değerlendirmek için kullanılır ve L kayıp fonksiyonu aşağıdaki gibi tanımlanır:
 
 <br>
 
 **16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
 
-&#10230;
+&#10230; Bayır inişi ― α∈R öğrenme oranı olmak üzere, bayır inişi için güncelleme kuralı olarak ifade edilen öğrenme oranı ve J maliyet fonksiyonu aşağıdaki gibi ifade edilir:
+<br> 
 
-<br>
 
 **17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
 
-&#10230;
+&#10230; Not: Stokastik bayır inişi her eğitim örneğine bağlı olarak parametreyi günceller, ve yığın bayır inişi bir dizi eğitim örneği üzerindedir.
 
 <br>
 
 **18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
 
-&#10230;
+&#10230; Olabilirlik - θ parametreleri verilen bir L (θ) modelinin olabilirliğini,olabilirliği maksimize ederek en uygun θ  parametrelerini bulmak için kullanılır. bulmak için kullanılır. Uygulamada, optimize edilmesi daha kolay olan log-olabilirlik ℓ (θ) = log (L (θ))'i kullanıyoruz. Sahip olduklarımız:
 
-<br>
+<br>      
 
 **19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
 
-&#10230;
+&#10230; Newton'un algoritması - ℓ′(θ)=0 olacak şekilde bir θ bulan nümerik bir yöntemdir. Güncelleme kuralı aşağıdaki gibidir:
 
 <br>
 
 **20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
 
-&#10230;
+&#10230; Not: Newton-Raphson yöntemi olarak da bilinen çok boyutlu genelleme aşağıdaki güncelleme kuralına sahiptir:
 
 <br>
 
 **21. Linear models**
 
-&#10230;
+&#10230; Lineer modeller
 
 <br>
 
 **22. Linear regression**
 
-&#10230;
+&#10230; Lineer regresyon
 
 <br>
 
 **23. We assume here that y|x;θ∼N(μ,σ2)**
 
-&#10230;
+&#10230;y|x;θ∼N(μ,σ2) olduğunu varsayıyoruz
 
 <br>
 
 **24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
 
-&#10230;
+&#10230; Normal denklemler - X matris tasarımı olmak üzere, maliyet fonksiyonunu en aza indiren θ değeri X'in matris tasarımını not ederek, maliyet fonksiyonunu en aza indiren θ değeri kapalı formlu bir çözümdür:
 
-<br>
+<br>  
 
 **25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
 
-&#10230;
+&#10230; En Küçük Ortalama Kareler algoritması (Least Mean Squares-LMS) - α öğrenme oranı olmak üzere, m veri noktasını içeren eğitim kümesi için Widrow-Hoff öğrenme oranı olarak bilinen En Küçük Ortalama Kareler Algoritmasının güncelleme kuralı aşağıdaki gibidir:
 
-<br>
+<br> 
 
 **26. Remark: the update rule is a particular case of the gradient ascent.**
 
-&#10230;
+&#10230; Not: güncelleme kuralı, bayır yükselişinin özel bir halidir.
 
-<br>
+<br> 
 
 **27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
 
-&#10230;
+&#10230; Yerel Ağırlıklı Regresyon (Locally Weighted Regression-LWR) - LWR olarak da bilinen Yerel Ağırlıklı Regresyon ağırlıkları her eğitim örneğini maliyet fonksiyonunda w (i) (x) ile ölçen doğrusal regresyonun bir çeşididir.
 
-<br>
+<br> 
 
 **28. Classification and logistic regression**
 
-&#10230;
+&#10230; Sınıflandırma ve lojistik regresyon
 
 <br>
 
 **29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
 
-&#10230;
+&#10230; Sigmoid fonksiyonu - Lojistik fonksiyonu olarak da bilinen sigmoid fonksiyonu g, aşağıdaki gibi tanımlanır:
 
-<br>
+<br> 
 
 **30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
 
-&#10230;
+&#10230; Lojistik regresyon - y|x;θ∼Bernoulli(ϕ) olduğunu varsayıyoruz. Aşağıdaki forma sahibiz:
 
 <br>
 
 **31. Remark: there is no closed form solution for the case of logistic regressions.**
 
-&#10230;
+&#10230; Not: Lojistik regresyon durumunda kapalı form çözümü yoktur.
 
-<br>
+<br> 
 
 **32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
 
-&#10230;
+&#10230; Softmax regresyonu - Çok sınıflı lojistik regresyon olarak da adlandırılan Softmax regresyonu 2'den fazla sınıf olduğunda lojistik regresyonu genelleştirmek için kullanılır. Genel kabul olarak, her i sınıfı için Bernoulli parametresi ϕi'nin eşit olmasını sağlaması için θK=0 olarak ayarlanır.
 
 <br>
 
 **33. Generalized Linear Models**
 
-&#10230;
+&#10230; Genelleştirilmiş Lineer Modeller
 
 <br>
 
 **34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
 
-&#10230;
+&#10230; Üstel aile - Eğer kanonik parametre veya bağlantı fonksiyonu olarak adlandırılan doğal bir parametre η, yeterli bir istatistik T (y) ve aşağıdaki gibi bir log-partition fonksiyonu a (η) şeklinde yazılabilirse, dağılım sınıfının üstel ailede olduğu söylenir:
 
-<br>
+<br> 
 
 **35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
 
-&#10230;
+&#10230; Not: Sık sık T (y) = y olur. Ayrıca, exp (−a (η)), olasılıkların birleştiğinden emin olan normalleştirme parametresi olarak görülebilir.
 
 <br>
 
 **36. Here are the most common exponential distributions summed up in the following table:**
 
-&#10230;
+&#10230; Aşağıdaki tabloda özetlenen en yaygın üstel dağılımlar:
 
 <br>
 
 **37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
 
-&#10230;
+&#10230; [Dağılım, Bernoulli, Gauss, Poisson, Geometrik]
 
 <br>
 
 **38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
 
-&#10230;
+&#10230; Genelleştirilmiş Lineer Modellerin  (Generalized Linear Models-GLM) Yaklaşımları - Genelleştirilmiş Lineer Modeller x∈Rn+1 için rastgele bir y değişkenini tahminlemeyi hedeflen ve aşağıdaki 3 varsayıma dayanan bir fonksiyondur:
 
 <br>
 
 **39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
 
-&#10230;
+&#10230; Not: sıradan en küçük kareler ve lojistik regresyon, genelleştirilmiş doğrusal modellerin özel durumlarıdır.
 
 <br>
 
 **40. Support Vector Machines**
 
-&#10230;
+&#10230; Destek Vektör Makineleri
 
 <br>
 
 **41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
 
-&#10230;
+&#10230; Destek Vektör Makinelerinin amacı minimum mesafeyi maksimuma çıkaran doğruyu bulmaktır.
 
 <br>
 
 **42: Optimal margin classifier ― The optimal margin classifier h is such that:**
 
-&#10230;
+&#10230; Optimal marj sınıflandırıcısı - h optimal marj sınıflandırıcısı şöyledir:
 
-<br>
+<br> 
 
 **43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
 
-&#10230;
+&#10230; burada (w,b)∈Rn×R, aşağıdaki optimizasyon probleminin çözümüdür:
 
 <br>
 
 **44. such that**
 
-&#10230;
+&#10230; öyle ki
 
 <br>
 
 **45. support vectors**
 
-&#10230;
+&#10230; destek vektörleri
 
 <br>
 
 **46. Remark: the line is defined as wTx−b=0.**
 
-&#10230;
+&#10230; Not: doğru wTx−b=0 şeklinde tanımlanır.
 
 <br>
 
 **47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
 
-&#10230;
+&#10230; Menteşe yitimi - Menteşe yitimi Destek Vektör Makinelerinin ayarlarında kullanılır ve aşağıdaki gibi tanımlanır:
 
 <br>
 
 **48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
 
-&#10230;
-
-<br>
+&#10230; Çekirdek - ϕ gibi bir özellik haritası verildiğinde, K olarak tanımlanacak çekirdeği tanımlarız:
 
+<br>  
 **49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
 
-&#10230;
+&#10230; Uygulamada, K (x, z) = exp (- || x − z || 22σ2) tarafından tanımlanan çekirdek K, Gauss çekirdeği olarak adlandırılır ve yaygın olarak kullanılır.
 
 <br>
 
 **50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
 
-&#10230;
+&#10230; [Lineer olmayan ayrılabilirlik, Çekirdek Haritalamının Kullanımı, Orjinal uzayda karar sınırı]
 
-<br>
+<br> 
 
 **51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
 
-&#10230;
+&#10230; Not: Çekirdeği kullanarak maliyet fonksiyonunu hesaplamak için "çekirdek numarası" nı kullandığımızı söylüyoruz çünkü genellikle çok karmaşık olan ϕ açık haritalamasını bilmeye gerek yok. Bunun yerine, yalnızca K(x,z) değerlerine ihtiyacımız vardır.
 
-<br>
+<br> 
 
 **52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
 
-&#10230;
+&#10230; Lagranj - Lagranj L(w,b) şeklinde şöyle tanımlanır: 
 
 <br>
 
 **53. Remark: the coefficients βi are called the Lagrange multipliers.**
 
-&#10230;
+&#10230; Not: βi katsayılarına Lagranj çarpanları denir.
 
 <br>
 
 **54. Generative Learning**
 
-&#10230;
+&#10230; Üretici Öğrenme
 
 <br>
 
 **55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
 
-&#10230;
+&#10230; Üretken bir model, önce Bayes kuralını kullanarak P (y | x) değerini tahmin etmek için kullanabileceğimiz P (x | y) değerini tahmin ederek verilerin nasıl üretildiğini öğrenmeye çalışır.
 
 <br>
 
 **56. Gaussian Discriminant Analysis**
 
-&#10230;
+&#10230; Gauss Diskriminant Analizi
 
 <br>
 
 **57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
 
-&#10230;
+&#10230; Yöntem - Gauss Diskriminant Analizi y ve x|y=0 ve x|y=1 'in şu şekilde olduğunu varsayar:
 
 <br>
 
 **58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
 
-&#10230;
+&#10230; Tahmin - Aşağıdaki tablo, olasılığı en üst düzeye çıkarırken bulduğumuz tahminleri özetlemektedir:
 
 <br>
 
 **59. Naive Bayes**
 
-&#10230;
+&#10230; Naive Bayes
 
 <br>
 
 **60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
 
-&#10230;
+&#10230; Varsayım - Naive Bayes modeli, her veri noktasının özelliklerinin tamamen bağımsız olduğunu varsayar:
 
 <br>
 
 **61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
 
-&#10230;
+&#10230; Çözümler - Log-olabilirliğinin k∈{0,1},l∈[[1,L]] ile birlikte aşağıdaki çözümlerle maksimize edilmesi:
 
 <br>
 
 **62. Remark: Naive Bayes is widely used for text classification and spam detection.**
 
-&#10230;
+&#10230; Not: Naive Bayes, metin sınıflandırması ve spam tespitinde yaygın olarak kullanılır.
 
 <br>
 
 **63. Tree-based and ensemble methods**
 
-&#10230;
+&#10230; Ağaç temelli ve topluluk yöntemleri
 
 <br>
 
 **64. These methods can be used for both regression and classification problems.**
 
-&#10230;
+&#10230; Bu yöntemler hem regresyon hem de sınıflandırma problemleri için kullanılabilir.
 
 <br>
 
 **65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
 
-&#10230;
+&#10230; CART - Sınıflandırma ve Regresyon Ağaçları (Classification and Regression Trees (CART)), genellikle karar ağaçları olarak bilinir, ikili ağaçlar olarak temsil edilirler.
 
-<br>
+<br> 
 
 **66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
 
-&#10230;
+&#10230; Rastgele orman - Rastgele seçilen özelliklerden oluşan çok sayıda karar ağacı kullanan ağaç tabanlı bir tekniktir.
+Basit karar ağacının tersine, oldukça yorumlanamaz bir yapıdadır ancak genel olarak iyi performansı onu popüler bir algoritma yapar.
 
 <br>
 
 **67. Remark: random forests are a type of ensemble methods.**
 
-&#10230;
+&#10230; Not: Rastgele ormanlar topluluk yöntemlerindendir.
 
 <br>
 
 **68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
 
-&#10230;
+&#10230; Artırım - Artırım yöntemlerinin temel fikri bazı zayıf öğrenicileri biraraya getirerek güçlü bir öğrenici oluşturmaktır. Temel yöntemler aşağıdaki tabloda özetlenmiştir:
 
-<br>
+<br> 
 
 **69. [Adaptive boosting, Gradient boosting]**
 
-&#10230;
+&#10230; [Adaptif artırma, Gradyan artırma]
 
 <br>
 
 **70. High weights are put on errors to improve at the next boosting step**
 
-&#10230;
+&#10230; Yüksek ağırlıklar bir sonraki artırma adımında iyileşmesi için hatalara maruz kalır.
 
 <br>
 
 **71. Weak learners trained on remaining errors**
 
-&#10230;
+&#10230; Zayıf öğreniciler kalan hatalar üzerinde eğitildi
 
 <br>
 
 **72. Other non-parametric approaches**
 
-&#10230;
+&#10230; Diğer parametrik olmayan yaklaşımlar
 
 <br>
 
 **73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
 
-&#10230;
+&#10230; k-en yakın komşular - genellikle k-NN olarak adlandırılan k- en yakın komşular algoritması, bir veri noktasının tepkisi eğitim kümesindeki kendi k komşularının doğası ile belirlenen parametrik olmayan bir yaklaşımdır. Hem sınıflandırma hem de regresyon yöntemleri için kullanılabilir.
 
 <br>
 
 **74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
 
-&#10230;
+&#10230; Not: k parametresi ne kadar yüksekse, yanlılık okadar  yüksek ve k parametresi ne kadar düşükse, varyans o kadar yüksek olur.
 
-<br>
+<br>  
 
 **75. Learning Theory**
 
-&#10230;
+&#10230; Öğrenme Teorisi
 
 <br>
 
 **76. Union bound ― Let A1,...,Ak be k events. We have:**
 
-&#10230;
+&#10230; Birleşim sınırı - A1,...,Ak k olayları olsun. Sahip olduklarımız:
 
 <br>
 
 **77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
 
-&#10230;
+&#10230; Hoeffding eşitsizliği - Z1, .., Zm, ϕ parametresinin Bernoulli dağılımından çizilen değişkenler olsun. Örnek ortalamaları mean ve γ>0 sabit olsun. Sahip olduklarımız:
 
-<br>
+<br> 
 
 **78. Remark: this inequality is also known as the Chernoff bound.**
 
-&#10230;
+&#10230; Not: Bu eşitsizlik, Chernoff sınırı olarak da bilinir.
 
 <br>
 
 **79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
 
-&#10230;
+&#10230; Eğitim hatası - Belirli bir h sınıflandırıcısı için, ampirik risk veya ampirik hata olarak da bilinen eğitim hatasını ˆϵ (h) şöyle tanımlarız:
 
-<br>
+<br> 
 
 **80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
 
-&#10230;
+&#10230; Muhtemel Yaklaşık Doğru (Probably Approximately Correct (PAC)) ― PAC, öğrenme teorisi üzerine sayısız sonuçların kanıtlandığı ve aşağıdaki varsayımlara sahip olan bir çerçevedir:
+<br> 
 
-<br>
 
 **81: the training and testing sets follow the same distribution **
 
-&#10230;
+&#10230; eğitim ve test kümeleri aynı dağılımı takip ediyor
 
 <br>
 
 **82. the training examples are drawn independently**
 
-&#10230;
+&#10230; eğitim örnekleri bağımsız olarak çizilir
 
 <br>
 
 **83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
 
-&#10230;
+&#10230; Parçalanma ― S={x(1),...,x(d)} kümesi ve H sınıflandırıcıların kümesi verildiğinde, H herhangi bir etiketler kümesi S'e parçalar.
 
 <br>
 
 **84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
 
-&#10230;
+&#10230; Üst sınır teoremi ― |H|=k , δ ve örneklem sayısı m'nin sabit olduğu sonlu bir hipotez sınıfı H olsun. Ardından, en az 1−δ olasılığı ile elimizde:
 
 <br>
 
 **85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
 
-&#10230;
+&#10230; VC boyutu ― VC(H) olarak ifade edilen belirli bir sonsuz H hipotez sınıfının Vapnik-Chervonenkis (VC) boyutu,  H tarafından parçalanan en büyük kümenin boyutudur.
 
-<br>
+<br> 
 
 **86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
 
-&#10230;
+&#10230; Not: H = {2 boyutta doğrusal sınıflandırıcılar kümesi}'nin VC boyutu 3'tür.
 
-<br>
+<br> 
 
 **87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
 
-&#10230;
+&#10230; Teorem (Vapnik) - H, VC(H)=d ve eğitim örneği sayısı m verilmiş olsun. En az 1−δ olasılığı ile, sahip olduklarımız:
 
 <br>
 
 **88. [Introduction, Type of prediction, Type of model]**
 
-&#10230;
+&#10230; [Giriş, Tahmin türü, Model türü]
 
 <br>
 
 **89. [Notations and general concepts, loss function, gradient descent, likelihood]**
 
-&#10230;
+&#10230; [Notasyonlar ve genel kavramlar,kayıp fonksiyonu, bayır inişi, olabilirlik]
 
-<br>
+<br> 
 
 **90. [Linear models, linear regression, logistic regression, generalized linear models]**
 
-&#10230;
+&#10230; [Lineer modeller, Lineer regresyon, lojistik regresyon, genelleştirilmiş lineer modeller]
 
 <br>
 
 **91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
 
-&#10230;
+&#10230; [Destek vektör makineleri, optimal marj sınıflandırıcı, Menteşe yitimi, Çekirdek]
 
 <br>
 
 **92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
 
-&#10230;
+&#10230; [Üretici öğrenme, Gauss Diskriminant Analizi, Naive Bayes]
 
 <br>
 
 **93. [Trees and ensemble methods, CART, Random forest, Boosting]**
 
-&#10230;
+&#10230; [Ağaçlar ve topluluk yöntemleri, CART, Rastegele orman, Artırma]
 
 <br>
 
 **94. [Other methods, k-NN]**
 
-&#10230;
+&#10230; [Diğer yöntemler, k-NN]
 
 <br>
 
 **95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
 
-&#10230;
+&#10230; [Öğrenme teorisi, Hoeffding eşitsizliği, PAC, VC boyutu]

From 68e5916b81a479880964c4472e91b5286d4d2135 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Thu, 24 Jan 2019 14:39:41 +0300
Subject: [PATCH 074/531] Update refresher-probability.md

---
 tr/refresher-probability.md | 70 ++++++++++++++++++-------------------
 1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/tr/refresher-probability.md b/tr/refresher-probability.md
index 5c9b34656..9ec9331c5 100644
--- a/tr/refresher-probability.md
+++ b/tr/refresher-probability.md
@@ -1,204 +1,204 @@
 **1. Probabilities and Statistics refresher**
 
-&#10230;
+&#10230; 1. Olasılık ve İstatistik hatırlatma
 
 <br>
 
 **2. Introduction to Probability and Combinatorics**
 
-&#10230;
+&#10230; 2. Olasılık ve Kombinasyonlara Giriş
 
 <br>
 
 **3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
 
-&#10230;
+&#10230; 3. Örnek alanı - Bir deneyin olası tüm sonuçlarının kümesidir, deneyin örnek alanı olarak bilinir ve S ile gösterilir.
 
 <br>
 
 **4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
 
-&#10230;
+&#10230; 4. Olay - Örnek alanın herhangi bir E alt kümesi, olay olarak bilinir. Yani bir olay, deneyin olası sonuçlarından oluşan bir kümedir. Deneyin sonucu E'de varsa, E'nin gerçekleştiğini söyleriz.
 
 <br>
 
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
+**5. Axioms of probability: For each event E, we denote P(E) as the probability of event E occuring.**
 
-&#10230;
+&#10230; 5. Olasılık aksiyomları: Her olay E için, E olayının meydana gelme olasılığı olarak P (E) anlamına gelir.
 
 <br>
 
 **6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
 
-&#10230;
+&#10230; 6. Aksiyom 1 - Her olasılık dahil 0 ile 1 arasındadır, yani:
 
 <br>
 
 **7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
 
-&#10230;
+&#10230; 7. Aksiyom 2 - Tüm örnek uzayındaki temel olaylardan en az birinin ortaya çıkma olasılığı 1'dir, yani:
 
 <br>
 
 **8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
 
-&#10230;
+&#10230; 8.  Aksiyom 3 - Karşılıklı özel olayların herhangi bir dizisi için, E1, ..., En,
 
 <br>
 
 **9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
 
-&#10230;
+&#10230; 9. Permütasyon - Permütasyon, n nesneler havuzundan r nesnelerinin belirli bir sıra ile düzenlenmesidir. Bu tür düzenlemelerin sayısı P (n, r) tarafından aşağıdaki gibi tanımlanır:
 
 <br>
 
 **10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
 
-&#10230;
+&#10230; 10. Kombinasyon - Bir kombinasyon, sıranın önemli olmadığı n nesneler havuzundan r nesnelerinin bir düzenlemesidir. Bu tür düzenlemelerin sayısı C (n, r) tarafından aşağıdaki gibi tanımlanır:
 
 <br>
 
 **11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
 
-&#10230;
+&#10230; 11. Not: 0⩽r⩽n için P (n, r) ⩾C (n, r) değerine sahibiz.
 
 <br>
 
 **12. Conditional Probability**
 
-&#10230;
+&#10230; 12. Koşullu Olasılık
 
 <br>
 
 **13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
 
-&#10230;
+&#10230;13. Bayes kuralı - A ve B olayları için P (B)> 0 olacak şekilde:
 
 <br>
 
 **14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
 
-&#10230;
+&#10230; 14. Not: P(A∩B)=P(A)P(B|A)=P(A|B)P(B)
 
 <br>
 
 **15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
 
-&#10230;
+&#10230; 15. Parça - {Ai,i∈[[1,n]]} olsun; {Ai}'nın bir parçası olduğunu söyleriz:
 
 <br>
 
 **16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
 
-&#10230;
+&#10230; 16. Not: Örnek uzaydaki herhangi bir B olayı için P(B)=n∑i=1P(B|Ai)P(Ai)'ye sahibiz.
 
 <br>
 
 **17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
 
-&#10230;
+&#10230; 17. Genişletilmiş Bayes kuralı formu - {Ai,i∈[[1,n]]} örneklemenin bir bölümü olsun. Elde edilen:
 
 <br>
 
 **18. Independence ― Two events A and B are independent if and only if we have:**
 
-&#10230;
+&#10230; 18. Bağımsızlık - İki olay A ve B birbirinden bağımısz ise, elde edilen: 
 
 <br>
 
 **19. Random Variables**
 
-&#10230;
+&#10230; 19. Rastgele Değişkenler
 
 <br>
 
 **20. Definitions**
 
-&#10230;
+&#10230; 20. Tanımlamalar
 
 <br>
 
 **21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
 
-&#10230;
+&#10230; 21. 21. Rastgele değişken - Genellikle X işaretli rastgele bir değişken, bir örnek uzayındaki her öğeyi gerçek bir çizgiye eşleyen bir fonksiyondur (işlevdir).
 
 <br>
 
 **22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
 
-&#10230;
+&#10230; 22. Kümülatif dağılım fonksiyonu (KDF/CDF) - Monotonik olarak azalmayan ve limx→−∞F(x)=0 ve limx→+∞F(x)=1 olacak şekilde kümülatif dağılım fonksiyonu F:
 
 <br>
 
 **23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
 
-&#10230;
+&#10230; 23. Not: P(a<X⩽B)=F(b)−F(a).
 
 <br>
 
 **24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
 
-&#10230;
+&#10230; 24. Olasılık yoğunluğu fonksiyonu (OYF/PDF) - Olasılık yoğunluğu fonksiyonu f, X'in rastgele değişkenin iki bitişik gerçekleşmesi arasındaki değerleri alması ihtimalidir.
 
 <br>
 
 **25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
 
-&#10230;
+&#10230; 25. PDF ve CDF'yi içeren ilişkiler - Ayrık (D) ve sürekli (C) olaylarında bilmeniz gereken önemli özellikler.
 
 <br>
 
 **26. [Case, CDF F, PDF f, Properties of PDF]**
 
-&#10230;
+&#10230; 26. [Olay, CDF F, PDF f, PDF Özellikleri]
 
 <br>
 
 **27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
 
-&#10230;
+&#10230; 27. Beklenti ve Dağılım Momentleri - Burada, ayrık ve sürekli durumlar için beklenen değer E[X], genelleştirilmiş beklenen değer E[g(X)], k. Moment E[Xk] ve karakteristik fonksiyon ψ(ω) ifadeleri verilmiştir :
 
 <br>
 
 **28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
 
-&#10230;
+&#10230; 28. Varyans - Genellikle Var(X) veya σ2 olarak not edilen rastgele değişkenin varyansı, dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
 
 <br>
 
 **29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
 
-&#10230;
+&#10230; 29. Standart sapma - Genellikle σ olarak not edilen rastgele bir değişkenin standart sapması, gerçek rastgele değişkenin birimleriyle uyumlu olan dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
 
 <br>
 
 **30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
 
-&#10230;
+&#10230;30. Rastgele değişkenlerin dönüşümü - X ve Y değişkenlerinin bazı fonksiyonlarla bağlanır. FX ve fY'ye sırasıyla X ve Y'nin dağılım fonksiyonu şöyledir:
 
 <br>
 
 **31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
 
-&#10230;
+&#10230; 31. Leibniz integral kuralı - g, x'e ve potansiyel olarak c'nin, c'ye bağlı olabilecek potansiyel c ve a, b sınırlarının bir fonksiyonu olsun. Elde edilen:
 
 <br>
 
 **32. Probability Distributions**
 
-&#10230;
+&#10230; 32. Olasılık Dağılımları
 
 <br>
 
 **33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
 
-&#10230;
+&#10230; 33. Chebyshev'in eşitsizliği - X'in beklenen değeri value olan rastgele bir değişken olmasına izin verin. K, σ>0 için aşağıdaki eşitsizliği elde edilir:
 
 <br>
 
 **34. Main distributions ― Here are the main distributions to have in mind:**
 
-&#10230;
+&#10230; 34. Ana dağıtımlar - İşte akılda tutulması gereken ana dağıtımlar:
 
 <br>
 

From 115e6b86b3086cf6b5cad171c51c20cd56903034 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Thu, 24 Jan 2019 14:54:32 +0300
Subject: [PATCH 075/531] Update refresher-probability.md

---
 tr/refresher-probability.md | 68 ++++++++++++++++++-------------------
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/tr/refresher-probability.md b/tr/refresher-probability.md
index 9ec9331c5..bbe81f651 100644
--- a/tr/refresher-probability.md
+++ b/tr/refresher-probability.md
@@ -1,204 +1,204 @@
 **1. Probabilities and Statistics refresher**
 
-&#10230; 1. Olasılık ve İstatistik hatırlatma
+&#10230;Olasılık ve İstatistik hatırlatma
 
 <br>
 
 **2. Introduction to Probability and Combinatorics**
 
-&#10230; 2. Olasılık ve Kombinasyonlara Giriş
+&#10230;Olasılık ve Kombinasyonlara Giriş
 
 <br>
 
 **3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
 
-&#10230; 3. Örnek alanı - Bir deneyin olası tüm sonuçlarının kümesidir, deneyin örnek alanı olarak bilinir ve S ile gösterilir.
+&#10230;Örnek alanı - Bir deneyin olası tüm sonuçlarının kümesidir, deneyin örnek alanı olarak bilinir ve S ile gösterilir.
 
 <br>
 
 **4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
 
-&#10230; 4. Olay - Örnek alanın herhangi bir E alt kümesi, olay olarak bilinir. Yani bir olay, deneyin olası sonuçlarından oluşan bir kümedir. Deneyin sonucu E'de varsa, E'nin gerçekleştiğini söyleriz.
+&#10230;Olay - Örnek alanın herhangi bir E alt kümesi, olay olarak bilinir. Yani bir olay, deneyin olası sonuçlarından oluşan bir kümedir. Deneyin sonucu E'de varsa, E'nin gerçekleştiğini söyleriz.
 
 <br>
 
 **5. Axioms of probability: For each event E, we denote P(E) as the probability of event E occuring.**
 
-&#10230; 5. Olasılık aksiyomları: Her olay E için, E olayının meydana gelme olasılığı olarak P (E) anlamına gelir.
+&#10230;Olasılık aksiyomları: Her olay E için, E olayının meydana gelme olasılığı olarak P (E) anlamına gelir.
 
 <br>
 
 **6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
 
-&#10230; 6. Aksiyom 1 - Her olasılık dahil 0 ile 1 arasındadır, yani:
+&#10230;Aksiyom 1 - Her olasılık dahil 0 ile 1 arasındadır, yani:
 
 <br>
 
 **7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
 
-&#10230; 7. Aksiyom 2 - Tüm örnek uzayındaki temel olaylardan en az birinin ortaya çıkma olasılığı 1'dir, yani:
+&#10230;Aksiyom 2 - Tüm örnek uzayındaki temel olaylardan en az birinin ortaya çıkma olasılığı 1'dir, yani:
 
 <br>
 
 **8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
 
-&#10230; 8.  Aksiyom 3 - Karşılıklı özel olayların herhangi bir dizisi için, E1, ..., En,
+&#10230;Aksiyom 3 - Karşılıklı özel olayların herhangi bir dizisi için, E1, ..., En,
 
 <br>
 
 **9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
 
-&#10230; 9. Permütasyon - Permütasyon, n nesneler havuzundan r nesnelerinin belirli bir sıra ile düzenlenmesidir. Bu tür düzenlemelerin sayısı P (n, r) tarafından aşağıdaki gibi tanımlanır:
+&#10230;Permütasyon - Permütasyon, n nesneler havuzundan r nesnelerinin belirli bir sıra ile düzenlenmesidir. Bu tür düzenlemelerin sayısı P (n, r) tarafından aşağıdaki gibi tanımlanır:
 
 <br>
 
 **10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
 
-&#10230; 10. Kombinasyon - Bir kombinasyon, sıranın önemli olmadığı n nesneler havuzundan r nesnelerinin bir düzenlemesidir. Bu tür düzenlemelerin sayısı C (n, r) tarafından aşağıdaki gibi tanımlanır:
+&#10230;Kombinasyon - Bir kombinasyon, sıranın önemli olmadığı n nesneler havuzundan r nesnelerinin bir düzenlemesidir. Bu tür düzenlemelerin sayısı C (n, r) tarafından aşağıdaki gibi tanımlanır:
 
 <br>
 
 **11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
 
-&#10230; 11. Not: 0⩽r⩽n için P (n, r) ⩾C (n, r) değerine sahibiz.
+&#10230;Not: 0⩽r⩽n için P (n, r) ⩾C (n, r) değerine sahibiz.
 
 <br>
 
 **12. Conditional Probability**
 
-&#10230; 12. Koşullu Olasılık
+&#10230;Koşullu Olasılık
 
 <br>
 
 **13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
 
-&#10230;13. Bayes kuralı - A ve B olayları için P (B)> 0 olacak şekilde:
+&#10230;Bayes kuralı - A ve B olayları için P (B)> 0 olacak şekilde:
 
 <br>
 
 **14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
 
-&#10230; 14. Not: P(A∩B)=P(A)P(B|A)=P(A|B)P(B)
+&#10230;Not: P(A∩B)=P(A)P(B|A)=P(A|B)P(B)
 
 <br>
 
 **15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
 
-&#10230; 15. Parça - {Ai,i∈[[1,n]]} olsun; {Ai}'nın bir parçası olduğunu söyleriz:
+&#10230;Parça - {Ai,i∈[[1,n]]} olsun; {Ai}'nın bir parçası olduğunu söyleriz:
 
 <br>
 
 **16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
 
-&#10230; 16. Not: Örnek uzaydaki herhangi bir B olayı için P(B)=n∑i=1P(B|Ai)P(Ai)'ye sahibiz.
+&#10230;Not: Örnek uzaydaki herhangi bir B olayı için P(B)=n∑i=1P(B|Ai)P(Ai)'ye sahibiz.
 
 <br>
 
 **17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
 
-&#10230; 17. Genişletilmiş Bayes kuralı formu - {Ai,i∈[[1,n]]} örneklemenin bir bölümü olsun. Elde edilen:
+&#10230;Genişletilmiş Bayes kuralı formu - {Ai,i∈[[1,n]]} örneklemenin bir bölümü olsun. Elde edilen:
 
 <br>
 
 **18. Independence ― Two events A and B are independent if and only if we have:**
 
-&#10230; 18. Bağımsızlık - İki olay A ve B birbirinden bağımısz ise, elde edilen: 
+&#10230;Bağımsızlık - İki olay A ve B birbirinden bağımısz ise, elde edilen: 
 
 <br>
 
 **19. Random Variables**
 
-&#10230; 19. Rastgele Değişkenler
+&#10230;Rastgele Değişkenler
 
 <br>
 
 **20. Definitions**
 
-&#10230; 20. Tanımlamalar
+&#10230;Tanımlamalar
 
 <br>
 
 **21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
 
-&#10230; 21. 21. Rastgele değişken - Genellikle X işaretli rastgele bir değişken, bir örnek uzayındaki her öğeyi gerçek bir çizgiye eşleyen bir fonksiyondur (işlevdir).
+&#10230;Rastgele değişken - Genellikle X işaretli rastgele bir değişken, bir örnek uzayındaki her öğeyi gerçek bir çizgiye eşleyen bir fonksiyondur (işlevdir).
 
 <br>
 
 **22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
 
-&#10230; 22. Kümülatif dağılım fonksiyonu (KDF/CDF) - Monotonik olarak azalmayan ve limx→−∞F(x)=0 ve limx→+∞F(x)=1 olacak şekilde kümülatif dağılım fonksiyonu F:
+&#10230;Kümülatif dağılım fonksiyonu (KDF/CDF) - Monotonik olarak azalmayan ve limx→−∞F(x)=0 ve limx→+∞F(x)=1 olacak şekilde kümülatif dağılım fonksiyonu F:
 
 <br>
 
 **23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
 
-&#10230; 23. Not: P(a<X⩽B)=F(b)−F(a).
+&#10230;Not: P(a<X⩽B)=F(b)−F(a).
 
 <br>
 
 **24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
 
-&#10230; 24. Olasılık yoğunluğu fonksiyonu (OYF/PDF) - Olasılık yoğunluğu fonksiyonu f, X'in rastgele değişkenin iki bitişik gerçekleşmesi arasındaki değerleri alması ihtimalidir.
+&#10230;Olasılık yoğunluğu fonksiyonu (OYF/PDF) - Olasılık yoğunluğu fonksiyonu f, X'in rastgele değişkenin iki bitişik gerçekleşmesi arasındaki değerleri alması ihtimalidir.
 
 <br>
 
 **25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
 
-&#10230; 25. PDF ve CDF'yi içeren ilişkiler - Ayrık (D) ve sürekli (C) olaylarında bilmeniz gereken önemli özellikler.
+&#10230;PDF ve CDF'yi içeren ilişkiler - Ayrık (D) ve sürekli (C) olaylarında bilmeniz gereken önemli özellikler.
 
 <br>
 
 **26. [Case, CDF F, PDF f, Properties of PDF]**
 
-&#10230; 26. [Olay, CDF F, PDF f, PDF Özellikleri]
+&#10230;[Olay, CDF F, PDF f, PDF Özellikleri]
 
 <br>
 
 **27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
 
-&#10230; 27. Beklenti ve Dağılım Momentleri - Burada, ayrık ve sürekli durumlar için beklenen değer E[X], genelleştirilmiş beklenen değer E[g(X)], k. Moment E[Xk] ve karakteristik fonksiyon ψ(ω) ifadeleri verilmiştir :
+&#10230;Beklenti ve Dağılım Momentleri - Burada, ayrık ve sürekli durumlar için beklenen değer E[X], genelleştirilmiş beklenen değer E[g(X)], k. Moment E[Xk] ve karakteristik fonksiyon ψ(ω) ifadeleri verilmiştir :
 
 <br>
 
 **28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
 
-&#10230; 28. Varyans - Genellikle Var(X) veya σ2 olarak not edilen rastgele değişkenin varyansı, dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
+&#10230;Varyans - Genellikle Var(X) veya σ2 olarak not edilen rastgele değişkenin varyansı, dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
 
 <br>
 
 **29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
 
-&#10230; 29. Standart sapma - Genellikle σ olarak not edilen rastgele bir değişkenin standart sapması, gerçek rastgele değişkenin birimleriyle uyumlu olan dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
+&#10230;Standart sapma - Genellikle σ olarak not edilen rastgele bir değişkenin standart sapması, gerçek rastgele değişkenin birimleriyle uyumlu olan dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
 
 <br>
 
 **30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
 
-&#10230;30. Rastgele değişkenlerin dönüşümü - X ve Y değişkenlerinin bazı fonksiyonlarla bağlanır. FX ve fY'ye sırasıyla X ve Y'nin dağılım fonksiyonu şöyledir:
+&#10230;Rastgele değişkenlerin dönüşümü - X ve Y değişkenlerinin bazı fonksiyonlarla bağlanır. FX ve fY'ye sırasıyla X ve Y'nin dağılım fonksiyonu şöyledir:
 
 <br>
 
 **31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
 
-&#10230; 31. Leibniz integral kuralı - g, x'e ve potansiyel olarak c'nin, c'ye bağlı olabilecek potansiyel c ve a, b sınırlarının bir fonksiyonu olsun. Elde edilen:
+&#10230;Leibniz integral kuralı - g, x'e ve potansiyel olarak c'nin, c'ye bağlı olabilecek potansiyel c ve a, b sınırlarının bir fonksiyonu olsun. Elde edilen:
 
 <br>
 
 **32. Probability Distributions**
 
-&#10230; 32. Olasılık Dağılımları
+&#10230;Olasılık Dağılımları
 
 <br>
 
 **33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
 
-&#10230; 33. Chebyshev'in eşitsizliği - X'in beklenen değeri value olan rastgele bir değişken olmasına izin verin. K, σ>0 için aşağıdaki eşitsizliği elde edilir:
+&#10230;Chebyshev'in eşitsizliği - X'in beklenen değeri value olan rastgele bir değişken olmasına izin verin. K, σ>0 için aşağıdaki eşitsizliği elde edilir:
 
 <br>
 
 **34. Main distributions ― Here are the main distributions to have in mind:**
 
-&#10230; 34. Ana dağıtımlar - İşte akılda tutulması gereken ana dağıtımlar:
+&#10230;Ana dağıtımlar - İşte akılda tutulması gereken ana dağıtımlar:
 
 <br>
 

From cbf5fb22119fdb3fc80b0af3c01589f1389e2047 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Thu, 24 Jan 2019 15:15:18 +0300
Subject: [PATCH 076/531] Update refresher-probability.md

---
 tr/refresher-probability.md | 128 ++++++++++++++++++------------------
 1 file changed, 64 insertions(+), 64 deletions(-)

diff --git a/tr/refresher-probability.md b/tr/refresher-probability.md
index bbe81f651..d9aceea77 100644
--- a/tr/refresher-probability.md
+++ b/tr/refresher-probability.md
@@ -1,381 +1,381 @@
 **1. Probabilities and Statistics refresher**
 
-&#10230;Olasılık ve İstatistik hatırlatma
+&#10230; Olasılık ve İstatistik hatırlatma
 
 <br>
 
 **2. Introduction to Probability and Combinatorics**
 
-&#10230;Olasılık ve Kombinasyonlara Giriş
+&#10230; Olasılık ve Kombinasyonlara Giriş
 
 <br>
 
 **3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
 
-&#10230;Örnek alanı - Bir deneyin olası tüm sonuçlarının kümesidir, deneyin örnek alanı olarak bilinir ve S ile gösterilir.
+&#10230; Örnek alanı - Bir deneyin olası tüm sonuçlarının kümesidir, deneyin örnek alanı olarak bilinir ve S ile gösterilir.
 
 <br>
 
 **4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
 
-&#10230;Olay - Örnek alanın herhangi bir E alt kümesi, olay olarak bilinir. Yani bir olay, deneyin olası sonuçlarından oluşan bir kümedir. Deneyin sonucu E'de varsa, E'nin gerçekleştiğini söyleriz.
+&#10230; Olay - Örnek alanın herhangi bir E alt kümesi, olay olarak bilinir. Yani bir olay, deneyin olası sonuçlarından oluşan bir kümedir. Deneyin sonucu E'de varsa, E'nin gerçekleştiğini söyleriz.
 
 <br>
 
 **5. Axioms of probability: For each event E, we denote P(E) as the probability of event E occuring.**
 
-&#10230;Olasılık aksiyomları: Her olay E için, E olayının meydana gelme olasılığı olarak P (E) anlamına gelir.
+&#10230; Olasılık aksiyomları: Her olay E için, E olayının meydana gelme olasılığı olarak P (E) anlamına gelir.
 
 <br>
 
 **6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
 
-&#10230;Aksiyom 1 - Her olasılık dahil 0 ile 1 arasındadır, yani:
+&#10230; Aksiyom 1 - Her olasılık dahil 0 ile 1 arasındadır, yani:
 
 <br>
 
 **7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
 
-&#10230;Aksiyom 2 - Tüm örnek uzayındaki temel olaylardan en az birinin ortaya çıkma olasılığı 1'dir, yani:
+&#10230; Aksiyom 2 - Tüm örnek uzayındaki temel olaylardan en az birinin ortaya çıkma olasılığı 1'dir, yani:
 
 <br>
 
 **8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
 
-&#10230;Aksiyom 3 - Karşılıklı özel olayların herhangi bir dizisi için, E1, ..., En,
+&#10230; Aksiyom 3 - Karşılıklı özel olayların herhangi bir dizisi için, E1, ..., En,
 
 <br>
 
 **9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
 
-&#10230;Permütasyon - Permütasyon, n nesneler havuzundan r nesnelerinin belirli bir sıra ile düzenlenmesidir. Bu tür düzenlemelerin sayısı P (n, r) tarafından aşağıdaki gibi tanımlanır:
+&#10230; Permütasyon - Permütasyon, n nesneler havuzundan r nesnelerinin belirli bir sıra ile düzenlenmesidir. Bu tür düzenlemelerin sayısı P (n, r) tarafından aşağıdaki gibi tanımlanır:
 
 <br>
 
 **10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
 
-&#10230;Kombinasyon - Bir kombinasyon, sıranın önemli olmadığı n nesneler havuzundan r nesnelerinin bir düzenlemesidir. Bu tür düzenlemelerin sayısı C (n, r) tarafından aşağıdaki gibi tanımlanır:
+&#10230; Kombinasyon - Bir kombinasyon, sıranın önemli olmadığı n nesneler havuzundan r nesnelerinin bir düzenlemesidir. Bu tür düzenlemelerin sayısı C (n, r) tarafından aşağıdaki gibi tanımlanır:
 
 <br>
 
 **11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
 
-&#10230;Not: 0⩽r⩽n için P (n, r) ⩾C (n, r) değerine sahibiz.
+&#10230; Not: 0⩽r⩽n için P (n, r) ⩾C (n, r) değerine sahibiz.
 
 <br>
 
 **12. Conditional Probability**
 
-&#10230;Koşullu Olasılık
+&#10230; Koşullu Olasılık
 
 <br>
 
 **13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
 
-&#10230;Bayes kuralı - A ve B olayları için P (B)> 0 olacak şekilde:
+&#10230; Bayes kuralı - A ve B olayları için P (B)> 0 olacak şekilde:
 
 <br>
 
 **14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
 
-&#10230;Not: P(A∩B)=P(A)P(B|A)=P(A|B)P(B)
+&#10230; Not: P(A∩B)=P(A)P(B|A)=P(A|B)P(B)
 
 <br>
 
 **15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
 
-&#10230;Parça - {Ai,i∈[[1,n]]} olsun; {Ai}'nın bir parçası olduğunu söyleriz:
+&#10230; Parça - {Ai,i∈[[1,n]]} olsun; {Ai}'nın bir parçası olduğunu söyleriz:
 
 <br>
 
 **16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
 
-&#10230;Not: Örnek uzaydaki herhangi bir B olayı için P(B)=n∑i=1P(B|Ai)P(Ai)'ye sahibiz.
+&#10230; Not: Örnek uzaydaki herhangi bir B olayı için P(B)=n∑i=1P(B|Ai)P(Ai)'ye sahibiz.
 
 <br>
 
 **17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
 
-&#10230;Genişletilmiş Bayes kuralı formu - {Ai,i∈[[1,n]]} örneklemenin bir bölümü olsun. Elde edilen:
+&#10230; Genişletilmiş Bayes kuralı formu - {Ai,i∈[[1,n]]} örneklemenin bir bölümü olsun. Elde edilen:
 
 <br>
 
 **18. Independence ― Two events A and B are independent if and only if we have:**
 
-&#10230;Bağımsızlık - İki olay A ve B birbirinden bağımısz ise, elde edilen: 
+&#10230; Bağımsızlık - İki olay A ve B birbirinden bağımısz ise, elde edilen: 
 
 <br>
 
 **19. Random Variables**
 
-&#10230;Rastgele Değişkenler
+&#10230; Rastgele Değişkenler
 
 <br>
 
 **20. Definitions**
 
-&#10230;Tanımlamalar
+&#10230; Tanımlamalar
 
 <br>
 
 **21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
 
-&#10230;Rastgele değişken - Genellikle X işaretli rastgele bir değişken, bir örnek uzayındaki her öğeyi gerçek bir çizgiye eşleyen bir fonksiyondur (işlevdir).
+&#10230; Rastgele değişken - Genellikle X işaretli rastgele bir değişken, bir örnek uzayındaki her öğeyi gerçek bir çizgiye eşleyen bir fonksiyondur (işlevdir).
 
 <br>
 
 **22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
 
-&#10230;Kümülatif dağılım fonksiyonu (KDF/CDF) - Monotonik olarak azalmayan ve limx→−∞F(x)=0 ve limx→+∞F(x)=1 olacak şekilde kümülatif dağılım fonksiyonu F:
+&#10230; Kümülatif dağılım fonksiyonu (KDF/CDF) - Monotonik olarak azalmayan ve limx→−∞F(x)=0 ve limx→+∞F(x)=1 olacak şekilde kümülatif dağılım fonksiyonu F:
 
 <br>
 
 **23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
 
-&#10230;Not: P(a<X⩽B)=F(b)−F(a).
+&#10230; Not: P(a<X⩽B)=F(b)−F(a).
 
 <br>
 
 **24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
 
-&#10230;Olasılık yoğunluğu fonksiyonu (OYF/PDF) - Olasılık yoğunluğu fonksiyonu f, X'in rastgele değişkenin iki bitişik gerçekleşmesi arasındaki değerleri alması ihtimalidir.
+&#10230; Olasılık yoğunluğu fonksiyonu (OYF/PDF) - Olasılık yoğunluğu fonksiyonu f, X'in rastgele değişkenin iki bitişik gerçekleşmesi arasındaki değerleri alması ihtimalidir.
 
 <br>
 
 **25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
 
-&#10230;PDF ve CDF'yi içeren ilişkiler - Ayrık (D) ve sürekli (C) olaylarında bilmeniz gereken önemli özellikler.
+&#10230; PDF ve CDF'yi içeren ilişkiler - Ayrık (D) ve sürekli (C) olaylarında bilmeniz gereken önemli özellikler.
 
 <br>
 
 **26. [Case, CDF F, PDF f, Properties of PDF]**
 
-&#10230;[Olay, CDF F, PDF f, PDF Özellikleri]
+&#10230; [Olay, CDF F, PDF f, PDF Özellikleri]
 
 <br>
 
 **27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
 
-&#10230;Beklenti ve Dağılım Momentleri - Burada, ayrık ve sürekli durumlar için beklenen değer E[X], genelleştirilmiş beklenen değer E[g(X)], k. Moment E[Xk] ve karakteristik fonksiyon ψ(ω) ifadeleri verilmiştir :
+&#10230; Beklenti ve Dağılım Momentleri - Burada, ayrık ve sürekli durumlar için beklenen değer E[X], genelleştirilmiş beklenen değer E[g(X)], k. Moment E[Xk] ve karakteristik fonksiyon ψ(ω) ifadeleri verilmiştir :
 
 <br>
 
 **28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
 
-&#10230;Varyans - Genellikle Var(X) veya σ2 olarak not edilen rastgele değişkenin varyansı, dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
+&#10230; Varyans - Genellikle Var(X) veya σ2 olarak not edilen rastgele değişkenin varyansı, dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
 
 <br>
 
 **29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
 
-&#10230;Standart sapma - Genellikle σ olarak not edilen rastgele bir değişkenin standart sapması, gerçek rastgele değişkenin birimleriyle uyumlu olan dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
+&#10230; Standart sapma - Genellikle σ olarak not edilen rastgele bir değişkenin standart sapması, gerçek rastgele değişkenin birimleriyle uyumlu olan dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
 
 <br>
 
 **30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
 
-&#10230;Rastgele değişkenlerin dönüşümü - X ve Y değişkenlerinin bazı fonksiyonlarla bağlanır. FX ve fY'ye sırasıyla X ve Y'nin dağılım fonksiyonu şöyledir:
+&#10230; Rastgele değişkenlerin dönüşümü - X ve Y değişkenlerinin bazı fonksiyonlarla bağlanır. FX ve fY'ye sırasıyla X ve Y'nin dağılım fonksiyonu şöyledir:
 
 <br>
 
 **31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
 
-&#10230;Leibniz integral kuralı - g, x'e ve potansiyel olarak c'nin, c'ye bağlı olabilecek potansiyel c ve a, b sınırlarının bir fonksiyonu olsun. Elde edilen:
+&#10230; Leibniz integral kuralı - g, x'e ve potansiyel olarak c'nin, c'ye bağlı olabilecek potansiyel c ve a, b sınırlarının bir fonksiyonu olsun. Elde edilen:
 
 <br>
 
 **32. Probability Distributions**
 
-&#10230;Olasılık Dağılımları
+&#10230; Olasılık Dağılımları
 
 <br>
 
 **33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
 
-&#10230;Chebyshev'in eşitsizliği - X'in beklenen değeri value olan rastgele bir değişken olmasına izin verin. K, σ>0 için aşağıdaki eşitsizliği elde edilir:
+&#10230; Chebyshev'in eşitsizliği - X'in beklenen değeri value olan rastgele bir değişken olmasına izin verin. K, σ>0 için aşağıdaki eşitsizliği elde edilir:
 
 <br>
 
 **34. Main distributions ― Here are the main distributions to have in mind:**
 
-&#10230;Ana dağıtımlar - İşte akılda tutulması gereken ana dağıtımlar:
+&#10230; Ana dağıtımlar - İşte akılda tutulması gereken ana dağıtımlar:
 
 <br>
 
 **35. [Type, Distribution]**
 
-&#10230;
+&#10230; [Tür, Dağılım]
 
 <br>
 
 **36. Jointly Distributed Random Variables**
 
-&#10230;
+&#10230; Ortak Dağılımlı Rastgele Değişkenler
 
 <br>
 
 **37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
 
-&#10230;
+&#10230; Marjinal yoğunluk ve kümülatif dağılım - fXY ortak yoğunluk olasılık fonksiyonundan,
 
 <br>
 
 **38. [Case, Marginal density, Cumulative function]**
 
-&#10230;
+&#10230; [Olay, Marjinal yoğunluk, Kümülatif fonksiyon]
 
 <br>
 
 **39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
 
-&#10230;
+&#10230; Koşullu yoğunluk - Y'ye göre X'in koşullu yoğunluğu, genellikle fX|Y olarak elde edilir:
 
 <br>
 
 **40. Independence ― Two random variables X and Y are said to be independent if we have:**
 
-&#10230;
+&#10230; Bağımsızlık - İki rastgele değişkenin X ve Y olması durumunda bağımsız olduğu söylenir:
 
 <br>
 
 **41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
 
-&#10230;
+&#10230; Kovaryans - σ2XY veya daha genel olarak Cov(X,Y) olarak elde ettiğimiz iki rastgele değişken olan X ve Y'nin kovaryansını aşağıdaki gibi tanımlarız:
 
 <br>
 
 **42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
 
-&#10230;
+&#10230; Korelasyon - σX, σY, X ve Y'nin standart sapmalarını elde ederek, ρXY olarak belirtilen rastgele X ve Y değişkenleri arasındaki korelasyonu şu şekilde tanımlarız:
 
 <br>
 
 **43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
 
-&#10230;
+&#10230; Not 1: X, Y'nin herhangi bir rastgele değişkeni için ρXY∈ [note1,1] olduğuna dikkat edin.
 
 <br>
 
 **44. Remark 2: If X and Y are independent, then ρXY=0.**
 
-&#10230;
+&#10230; Not 2: Eğer X ve Y bağımsızsa, ρXY = 0 olur.
 
 <br>
 
 **45. Parameter estimation**
 
-&#10230;
+&#10230; Parametre tahmini (kestirimi)
 
 <br>
 
 **46. Definitions**
 
-&#10230;
+&#10230; Tanımlamalar
 
 <br>
 
 **47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
 
-&#10230;
+&#10230; Rastgele örnek - Rastgele bir örnek, bağımsız ve aynı şekilde X ile dağıtılan n1, ..., Xn değişkeninin rastgele değişkenidir.
 
 <br>
 
 **48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
 
-&#10230;
+&#10230; Tahminci (Kestirimci) - Tahmin edici, istatistiksel bir modelde bilinmeyen bir parametrenin değerini ortaya çıkarmak için kullanılan verilerin bir fonksiyonudur.
 
 <br>
 
 **49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
 
-&#10230;
+&#10230; Önyargı - Bir tahmin edicinin önyargısı ^ θ, ^ θ dağılımının beklenen değeri ile gerçek değer arasındaki fark olarak tanımlanır, yani:
 
 <br>
 
 **50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
 
-&#10230;
+&#10230; Not: E [^ θ] = θ olduğunda bir tahmincinin tarafsız olduğu söylenir.
 
 <br>
 
 **51. Estimating the mean**
 
-&#10230;
+&#10230; Ortalamayı tahmin etme
 
 <br>
 
 **52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
 
-&#10230;
+&#10230; Örnek ortalaması - Rastgele bir numunenin numune ortalaması, dağılımın gerçek ortalamasını to tahmin etmek için kullanılır, genellikle ¯¯¯¯¯X olarak belirtilir ve şöyle tanımlanır:
 
 <br>
 
 **53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
 
-&#10230;
+&#10230; Not: örnek ortalama tarafsız, yani: E[¯¯¯¯¯X]=μ.
 
 <br>
 
 **54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
 
-&#10230;
+&#10230; Merkezi Limit Teoremi - Ortalama μ ve varyans σ2 ile verilen bir dağılımın ardından rastgele bir X1, ..., Xn örneğine sahip olalım.
 
 <br>
 
 **55. Estimating the variance**
 
-&#10230;
+&#10230; Varyansı tahmin etmek
 
 <br>
 
 **56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
 
-&#10230;
+&#10230; Örnek varyansı - Rastgele bir örneğin örnek varyansı, bir dağılımın σ2 gerçek varyansını tahmin etmek için kullanılır, genellikle s2 veya ^σ2 olarak elde edilir ve aşağıdaki gibi tanımlanır:
 
 <br>
 
 **57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
 
-&#10230;
+&#10230; Not: Örneklem sapması yansızdır,E[s2]=σ2.
 
 <br>
 
 **58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
 
-&#10230;
+&#10230; Örnek varyansı ile ki-kare ilişkisi - s2, rastgele bir örneğin örnek varyansı olsun. Elde edilir:
 
 <br>
 
 **59. [Introduction, Sample space, Event, Permutation]**
 
-&#10230;
+&#10230; [Giriş, Örnek uzay, Olay, Permütasyon]
 
 <br>
 
 **60. [Conditional probability, Bayes' rule, Independence]**
 
-&#10230;
+&#10230; [Koşullu olasılık, Bayes kuralı, Bağımsızlık]
 
 <br>
 
 **61. [Random variables, Definitions, Expectation, Variance]**
 
-&#10230;
+&#10230; [Rastgele değişkenler, Tanımlamalar, Beklenti, Varyans]
 
 <br>
 
 **62. [Probability distributions, Chebyshev's inequality, Main distributions]**
 
-&#10230;
+&#10230; [Olasılık dağılımları, Chebyshev eşitsizliği, Ana dağılımlar]
 
 <br>
 
 **63. [Jointly distributed random variables, Density, Covariance, Correlation]**
 
-&#10230;
+&#10230; [Ortak dağınık rastgele değişkenler, Yoğunluk, Kovaryans, Korelasyon]
 
 <br>
 
 **64. [Parameter estimation, Mean, Variance]**
 
-&#10230;
+&#10230; [Parameter tahmini, Ortalama, Varyans]

From a2bab052cbf2597bfc9c63e034f6aeb99c93b8af Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Thu, 24 Jan 2019 10:59:25 -0800
Subject: [PATCH 077/531] Add [tr] fields

---
 README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index f927a488d..0938d18f6 100644
--- a/README.md
+++ b/README.md
@@ -43,9 +43,9 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|not started|not started|not started|not started|not started|
-|Recurrent Neural Nets|not started|not started|not started|not started|not started|not started|
-|DL tips and tricks|not started|not started|not started|not started|not started|not started|
+|Convolutional Neural Nets|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/117)|not started|not started|
+|Recurrent Neural Nets|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/118)|not started|not started|
+|DL tips and tricks|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/116)|not started|not started|
 
 
 |Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|
@@ -67,10 +67,10 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |Deep learning|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|
-|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|
+|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/114)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|
 |Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|not started|not started|done|not started|not started|
 |ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/39)|not started|not started|
-|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|not started|not started|not started|not started|not started|
+|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/115)|not started|not started|
 |Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|not started|not started|done|not started|not started|
 
 
From 7f7cf24e66377a9e75ab90cb55840ddd74784066 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Yavuz=20K=C3=B6me=C3=A7o=C4=9Flu?=
 <komecoglu.yavuz@gmail.com>
Date: Thu, 24 Jan 2019 23:17:19 +0300
Subject: [PATCH 078/531] [tr] Machine learning tips and tricks

---
 ...tsheet-machine-learning-tips-and-tricks.md | 105 +++++++++---------
 1 file changed, 55 insertions(+), 50 deletions(-)

diff --git a/tr/cheatsheet-machine-learning-tips-and-tricks.md b/tr/cheatsheet-machine-learning-tips-and-tricks.md
index 9712297b8..b12670229 100644
--- a/tr/cheatsheet-machine-learning-tips-and-tricks.md
+++ b/tr/cheatsheet-machine-learning-tips-and-tricks.md
@@ -1,285 +1,290 @@
 **1. Machine Learning tips and tricks cheatsheet**
 
-&#10230;
+&#10230;  Makine Öğrenmesi ipuçları ve püf noktaları el kitabı
 
 <br>
 
 **2. Classification metrics**
 
-&#10230;
+&#10230; Sınıflandırma metrikleri
 
 <br>
 
 **3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
 
-&#10230;
+&#10230; İkili bir sınıflandırma durumunda, modelin performansını değerlendirmek için gerekli olan ana metrikler aşağıda verilmiştir.
 
 <br>
 
 **4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
 
-&#10230;
+&#10230; Karışıklık matrisi - Karışıklık matrisi, bir modelin performansını değerlendirirken daha eksiksiz bir sonuca sahip olmak için kullanılır. Aşağıdaki şekilde tanımlanmıştır:
 
 <br>
 
 **5. [Predicted class, Actual class]**
 
-&#10230;
+&#10230; [Tahmini sınıf, Gerçek sınıf]
 
 <br>
 
 **6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
 
-&#10230;
+&#10230; Ana metrikler - Sınıflandırma modellerinin performansını değerlendirmek için aşağıda verilen metrikler yaygın olarak kullanılmaktadır:
 
 <br>
 
 **7. [Metric, Formula, Interpretation]**
 
-&#10230;
+&#10230; [Metrik, Formül, Açıklama]
 
 <br>
 
 **8. Overall performance of model**
 
-&#10230;
+&#10230; Modelin genel performansı
 
 <br>
 
 **9. How accurate the positive predictions are**
 
-&#10230;
+&#10230; Doğru tahminlerin ne kadar kesin olduğu
 
 <br>
 
 **10. Coverage of actual positive sample**
 
-&#10230;
+&#10230; Gerçek pozitif örneklerin oranı
 
 <br>
 
 **11. Coverage of actual negative sample**
 
-&#10230;
+&#10230; Gerçek negatif örneklerin oranı
 
 <br>
 
 **12. Hybrid metric useful for unbalanced classes**
 
-&#10230;
+&#10230; Dengesiz sınıflar için yararlı hibrit metrik
 
 <br>
 
 **13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
 
-&#10230;
+&#10230; İşlem Karakteristik Eğrisi (ROC) ― İşlem Karakteristik Eğrisi (receiver operating curve), eşik değeri değiştirilerek Doğru Pozitif Oranı-Yanlış Pozitif Oranı grafiğidir. Bu metrikler aşağıdaki tabloda özetlenmiştir:
 
 <br>
 
 **14. [Metric, Formula, Equivalent]**
-
-&#10230;
+ 
+&#10230; [Metrik, Formül, Eşdeğer]
 
 <br>
 
 **15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
 
-&#10230;
+&#10230; Eğri Altında Kalan Alan (AUC) ― Aynı zamanda AUC veya AUROC olarak belirtilen işlem karakteristik eğrisi altındaki alan, aşağıdaki şekilde gösterildiği gibi İşlem Karakteristik Eğrisi (ROC)'nin altındaki alandır:
 
 <br>
 
 **16. [Actual, Predicted]**
 
-&#10230;
+&#10230; [Gerçek, Tahmin Edilen]
 
 <br>
 
 **17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
 
-&#10230;
+&#10230; Temel metrikler - Bir f regresyon modeli verildiğinde aşağıdaki metrikler genellikle modelin performansını değerlendirmek için kullanılır:
 
 <br>
 
 **18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
 
-&#10230;
+&#10230; [Toplam karelerinin toplamı, Karelerinin toplamının açıklaması, Karelerinin toplamından artanlar]
 
 <br>
 
 **19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
 
-&#10230;
+&#10230; Belirleme katsayısı - Genellikle R2 veya r2 olarak belirtilen belirleme katsayısı, gözlemlenen sonuçların model tarafından ne kadar iyi kopyalandığının bir ölçütüdür ve aşağıdaki gibi tanımlanır:
 
 <br>
 
 **20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
 
-&#10230;
+&#10230; Ana metrikler - Aşağıdaki metrikler, göz önüne aldıkları değişken sayısını dikkate alarak regresyon modellerinin performansını değerlendirmek için yaygın olarak kullanılır:
 
 <br>
 
 **21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
 
-&#10230;
+&#10230; burada L olabilirlik ve ˆσ2, her bir yanıtla ilişkili varyansın bir tahminidir.
 
 <br>
 
 **22. Model selection**
 
-&#10230;
+&#10230; Model seçimi
 
 <br>
 
 **23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
 
-&#10230;
+&#10230; Kelime Bilgisi - Bir model seçerken, aşağıdaki gibi sahip olduğumuz verileri 3 farklı parçaya ayırırız:
 
 <br>
 
 **24. [Training set, Validation set, Testing set]**
 
-&#10230;
+&#10230; [Eğitim seti, Doğrulama seti, Test seti]
 
 <br>
 
 **25. [Model is trained, Model is assessed, Model gives predictions]**
 
-&#10230;
+&#10230; [Model eğitildi, Model değerlendirildi, Model tahminleri gerçekleştiriyor]
 
 <br>
 
 **26. [Usually 80% of the dataset, Usually 20% of the dataset]**
 
-&#10230;
+&#10230; [Genelde veri kümesinin %80'i, Genelde veri kümesinin %20'si]
 
 <br>
 
 **27. [Also called hold-out or development set, Unseen data]**
 
-&#10230;
+&#10230; [Ayrıca doğrulama için bir kısmını bekletme veya geliştirme seti olarak da bilinir, Görülmemiş veri]
 
 <br>
 
 **28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
 
-&#10230;
+&#10230; Model bir kere seçildikten sonra, tüm veri seti üzerinde eğitilir ve görünmeyen test setinde test edilir. Bunlar aşağıdaki şekilde gösterilmiştir:
 
 <br>
 
 **29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
 
-&#10230;
+&#10230; Çapraz doğrulama ― Çapraz doğrulama, başlangıçtaki eğitim setine çok fazla güvenmeyen bir modeli seçmek için kullanılan bir yöntemdir. Farklı tipleri aşağıdaki tabloda özetlenmiştir:
 
 <br>
 
 **30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
 
-&#10230;
+&#10230; [k − 1 katı üzerinde eğitim ve geriye kalanlar üzerinde değerlendirme, n − p gözlemleri üzerine eğitim ve kalan p üzerinde değerlendirme]
 
 <br>
 
 **31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
 
-&#10230;
+&#10230; [Genel olarak k=5 veya 10, Durum p=1'e bir tanesini dışarıda bırak denir]
 
 <br>
 
 **32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
 
-&#10230;
+&#10230; En yaygın olarak kullanılan yöntem k-kat çapraz doğrulama olarak adlandırılır ve k-1 diğer katlarda olmak üzere, bu k sürelerinin hepsinde model eğitimi yapılırken, modeli bir kat üzerinde doğrulamak için eğitim verilerini k katlarına ayırır. Hata için daha sonra k-katlar üzerinden ortalama alınır ve çapraz doğrulama hatası olarak adlandırılır.
 
 <br>
 
 **33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
 
-&#10230;
+&#10230; Düzenlileştirme (Regularization) - Düzenlileştirme prosedürü, modelin verileri aşırı öğrenmesinden kaçınılmasını ve dolayısıyla yüksek varyans sorunları ile ilgilenmeyi amaçlamaktadır. Aşağıdaki tablo, yaygın olarak kullanılan düzenlileştirme tekniklerinin farklı türlerini özetlemektedir:
+
 
 <br>
 
 **34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
-&#10230;
+&#10230; [Değişkenleri 0'a kadra küçült, Değişken seçimi için iyi, Katsayıları daha küçük yap, Değişken seçimi ile küçük katsayılar arasındaki çelişki]
+
 
 <br>
 
 **35. Diagnostics**
 
-&#10230;
+&#10230; Tanı
 
 <br>
 
 **36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
 
-&#10230;
+&#10230; Önyargı - Bir modelin önyargısı, beklenen tahmin ve verilen veri noktaları için tahmin etmeye çalıştığımız doğru model arasındaki farktır.
 
 <br>
 
 **37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
 
-&#10230;
-
+&#10230; Varyans - Bir modelin varyansı, belirli veri noktaları için model tahmininin değişkenliğidir.
+ 
 <br>
 
 **38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
 
-&#10230;
+&#10230; Önyargı/varyans çelişkisi - Daha basit model, daha yüksek önyargı, ve daha karmaşık model, daha yüksek varyans.
+
 
 <br>
 
 **39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
 
-&#10230;
+&#10230; [Belirtiler, Regresyon illüstrasyonu, sınıflandırma illüstrasyonu, derin öğrenme illüstrasyonu, olası çareler]
 
 <br>
 
 **40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
 
-&#10230;
+&#10230; [Yüksek eğitim hatası, Test hatasına yakın eğitim hatası, Yüksek önyargı, Eğitim hatasından biraz daha düşük eğitim hatası, Çok düşük eğitim hatası, Eğitim hatası test hatasının çok altında, Yüksek varyans]
+
 
 <br>
 
 **41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
 
-&#10230;
+&#10230; [Model karmaşıklaştığında, Daha fazla özellik ekle, Daha uzun eğitim süresi ile eğit, Düzenlileştirme gerçekleştir, Daha fazla bilgi edin]
+
 
 <br>
 
 **42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
 
-&#10230;
+&#10230; Hata analizi - Hata analizinde mevcut ve mükemmel modeller arasındaki performans farkının temel nedeni analiz edilir.
 
 <br>
 
 **43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
 
-&#10230;
+&#10230; Ablatif analiz - Ablatif analizde mevcut ve başlangıç modelleri arasındaki performans farkının temel nedeni analiz edilir.
 
 <br>
 
 **44. Regression metrics**
 
-&#10230;
+&#10230; Regresyon metrikleri
 
 <br>
 
 **45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
 
-&#10230;
+&#10230; [Sınıflandırma metrikleri, karışıklık matrisi, doğruluk, kesinlik, geri çağırma, F1 skoru, ROC]
 
 <br>
 
 **46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
 
-&#10230;
+&#10230; [Regresyon metrikleri, R karesi, Mallow'un CP'si, AIC, BIC]
 
 <br>
 
 **47. [Model selection, cross-validation, regularization]**
 
-&#10230;
+&#10230; [Model seçimi, çapraz doğrulama, düzenlileştirme]
 
 <br>
 
 **48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
 
-&#10230;
+&#10230; [Tanı, Önyargı/varyans çelişkisi, hata/ablatif analiz]

From f09337e4957a405907c69455099d90de93ba82a8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ba=C5=9Fak=20Buluz?=
 <41359672+basakbuluz@users.noreply.github.com>
Date: Fri, 25 Jan 2019 23:40:56 +0300
Subject: [PATCH 079/531] [tr] Recurrent neural networks

---
 tr/recurrent-neural-networks.md | 673 ++++++++++++++++++++++++++++++++
 1 file changed, 673 insertions(+)
 create mode 100644 tr/recurrent-neural-networks.md

diff --git a/tr/recurrent-neural-networks.md b/tr/recurrent-neural-networks.md
new file mode 100644
index 000000000..6f3a18b36
--- /dev/null
+++ b/tr/recurrent-neural-networks.md
@@ -0,0 +1,673 @@
+**1. Recurrent Neural Networks cheatsheet**
+
+&#10230; Tekrarlayan Yapay Sinir Ağları (Recurrent Neural Networks-RNN) El Kitabı
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Derin Öğrenme
+
+<br>
+
+
+**3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
+
+&#10230; [Genel bakış, Mimari yapı, RNN'lerin uygulamaları, Kayıp fonksiyonu, Geriye Yayılım]
+
+<br>
+
+
+**4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
+
+&#10230; [Uzun vadeli bağımlılıkların ele alınması, Ortak aktivasyon fonksiyonları, Gradyanın kaybolması / patlaması, Gradyan kırpma, GRU / LSTM, Kapı tipleri, Çift Yönlü RNN, Derin RNN]
+
+<br>
+
+
+**5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
+
+&#10230; [Kelime gösterimini öğrenme, Notasyonlar, Gömme matrisi, Word2vec, Skip-gram, Negatif örnekleme, GloVe]
+
+<br>
+
+
+**6. [Comparing words, Cosine similarity, t-SNE]**
+
+&#10230; [Kelimeleri karşılaştırmak, Cosine benzerliği, t-SNE]
+
+<br>
+
+
+**7. [Language model, n-gram, Perplexity]**
+
+&#10230; [Dil modeli, n-gram, Karışıklık]
+
+<br>
+
+
+**8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
+
+&#10230; [Makine çevirisi, Kiriş arama, Uzunluk normalizasyonu, Hata analizi, Bleu skoru]
+
+<br>
+
+
+**9. [Attention, Attention model, Attention weights]**
+
+&#10230; [Dikkat, Dikkat modeli, Dikkat ağırlıkları]
+
+<br>
+
+
+**10. Overview**
+
+&#10230; Genel Bakış
+
+<br>
+
+
+**11. Architecture of a traditional RNN ? Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
+
+&#10230; Geleneksel bir RNN mimarisi - RNN'ler olarak da bilinen tekrarlayan sinir ağları, gizli durumlara sahipken önceki çıktıların girdi olarak kullanılmasına izin veren bir sinir ağları sınıfıdır. Tipik olarak aşağıdaki gibidirler:
+
+<br>
+
+
+**12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
+
+&#10230; Her bir t zamanında, a<t> aktivasyonu ve y<t> çıktısı aşağıdaki gibi ifade edilir:
+
+<br>
+
+
+**13. and**
+
+&#10230; ve
+
+<br>
+
+
+**14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
+
+&#10230; burada Wax,Waa,Wya,ba,by geçici olarak paylaşılan katsayılardır ve g1,g2 aktivasyon fonksiyonlarıdır.
+
+<br>
+
+
+**15. The pros and cons of a typical RNN architecture are summed up in the table below:**
+
+&#10230; Tipik bir RNN mimarisinin artıları ve eksileri aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
+
+&#10230; [Avantajlar, Herhangi bir uzunluktaki girdilerin işlenmesi imkanı, Girdi büyüklüğüyle artmayan model boyutu, Geçmiş bilgileri dikkate alarak hesaplama, Zaman içinde paylaşılan ağırlıklar]
+
+<br>
+
+
+**17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
+
+&#10230; [Dezavantajları, Yavaş hesaplama, Uzun zaman önceki bilgiye erişme zorluğu, Mevcut durum için gelecekteki herhangi bir girdinin düşünülememesi]
+
+<br>
+
+
+**18. Applications of RNNs ? RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
+
+&#10230; RNN'lerin Uygulamaları - RNN modelleri çoğunlukla doğal dil işleme ve konuşma tanıma alanlarında kullanılır. Farklı uygulamalar aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**19. [Type of RNN, Illustration, Example]**
+
+&#10230; [RNN Türü, Örnekleme, Örnek]
+
+<br>
+
+
+**20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
+
+&#10230; [Bire bir, Bire çok, Çoka bir, Çoka çok]
+
+<br>
+
+
+**21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
+
+&#10230; [Geleneksel sinir ağı, Müzik üretimi, Duygu sınıflandırma, İsim varlık tanıma, Makine çevirisi]
+
+<br>
+
+
+**22. Loss function ? In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
+
+&#10230; Kayıp fonksiyonu - Tekrarlayan bir sinir ağı olması durumunda, tüm zaman dilimlerindeki L kayıp fonksiyonu, her zaman dilimindeki kayıbı temel alınarak aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**23. Backpropagation through time ? Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
+
+&#10230; Zamanla geri yayılım - Geriye yayılım zamanın her noktasında yapılır. T zaman diliminde, ağırlık matrisi W'ye göre L kaybının türevi aşağıdaki gibi ifade edilir:
+
+<br>
+
+
+**24. Handling long term dependencies**
+
+&#10230; Uzun vadeli bağımlılıkların ele alınması
+
+<br>
+
+
+**25. Commonly used activation functions ? The most common activation functions used in RNN modules are described below:**
+
+&#10230; Yaygın olarak kullanılan aktivasyon fonksiyonları - RNN modüllerinde kullanılan en yaygın aktivasyon fonksiyonları aşağıda açıklanmıştır:
+
+<br>
+
+
+**26. [Sigmoid, Tanh, RELU]**
+
+&#10230; [Sigmoid, Tanh, RELU]
+
+<br>
+
+
+**27. Vanishing/exploding gradient ? The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
+
+&#10230; Kaybolan / patlayan gradyan - Kaybolan ve patlayan gradyan fenomenlerine RNN'ler bağlamında sıklıkla rastlanır. Bunların olmasının nedeni, katman sayısına göre katlanarak azalan / artan olabilen çarpımsal gradyan nedeniyle uzun vadeli bağımlılıkları yakalamanın zor olmasıdır.
+
+<br>
+
+
+**28. Gradient clipping ? It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
+
+&#10230; Gradyan kırpma - Geri yayılım işlemi sırasında bazen karşılaşılan patlayan gradyan sorunuyla başa çıkmak için kullanılan bir tekniktir. Gradyan için maksimum değeri sınırlayarak, bu durum pratikte kontrol edilir.
+
+<br>
+
+
+**29. clipped**
+
+&#10230; kırpılmış
+
+<br>
+
+
+**30. Types of gates ? In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted ? and are equal to:**
+
+&#10230; Giriş Kapıları Çeşitleri - Kaybolan gradyan problemini çözmek için bazı RNN türlerinde belirli kapılar kullanılır ve genellikle iyi tanımlanmış bir amaca sahiptir. Genellikle ? olarak ifade edilir ve şuna eşittir:
+
+<br>
+
+
+**31. where W,U,b are coefficients specific to the gate and ? is the sigmoid function. The main ones are summed up in the table below:**
+
+&#10230; burada W, U, b kapıya özgü katsayılardır ve ? ise sigmoid fonksiyondur. Temel olanlar aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**32. [Type of gate, Role, Used in]**
+
+&#10230; [Kapının tipi, Rol, Kullanılan]
+
+<br>
+
+
+**33. [Update gate, Relevance gate, Forget gate, Output gate]**
+
+&#10230; [Güncelleme kapısı, Uygunluk kapısı, Unutma kapısı, Çıkış kapısı]
+
+<br>
+
+
+**34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
+
+&#10230; [Şimdi ne kadar geçmiş olması gerekir ?, Önceki bilgiyi bırak?, Bir hücreyi sil ya da silme?, Bir hücreyi ortaya çıkarmak için ne kadar?]
+
+<br>
+
+
+**35. [LSTM, GRU]**
+
+&#10230; [LSTM, GRU]
+
+<br>
+
+
+**36. GRU/LSTM ? Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
+
+&#10230; GRU / LSTM - Geçitli Tekrarlayan Birim (Gated Recurrent Unit-GRU) ve Uzun Kısa Süreli Bellek Birimleri (Long Short-Term Memory-LSTM), geleneksel RNN'lerin karşılaştığı kaybolan gradyan problemini ele alır, LSTM ise GRU'nun genelleştirilmiş halidir. Her bir mimarinin karakterizasyon denklemlerini özetleyen tablo aşağıdadır:
+
+<br>
+
+
+**37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
+
+&#10230; [Karakterizasyon, Geçitli Tekrarlayan Birim (GRU), Uzun Kısa Süreli Bellek (LSTM), Bağımlılıklar]
+
+<br>
+
+
+**38. Remark: the sign ? denotes the element-wise multiplication between two vectors.**
+
+&#10230; Not: ? işareti iki vektör arasındaki birimsel çarpımı belirtir.
+
+<br>
+
+
+**39. Variants of RNNs ? The table below sums up the other commonly used RNN architectures:**
+
+&#10230; RNN varyantları - Aşağıdaki tablo, diğer yaygın kullanılan RNN mimarilerini özetlemektedir:
+
+<br>
+
+
+**40. [Bidirectional (BRNN), Deep (DRNN)]**
+
+&#10230; [Çift Yönlü (Bidirectional-BRNN), Derin (Deep-DRNN)]
+
+<br>
+
+
+**41. Learning word representation**
+
+&#10230; Kelime temsilini öğrenme
+
+<br>
+
+
+**42. In this section, we note V the vocabulary and |V| its size.**
+
+&#10230; Bu bölümde V kelimeleri, |V| ise kelimelerin boyutlarını ifade eder.
+
+<br>
+
+
+**43. Motivation and notations**
+
+&#10230; Motivasyon ve notasyon
+
+<br>
+
+
+**44. Representation techniques ? The two main ways of representing words are summed up in the table below:**
+
+&#10230; Temsil etme teknikleri - Kelimeleri temsil etmenin iki temel yolu aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**45. [1-hot representation, Word embedding]**
+
+&#10230; [1-hot gösterim, Kelime gömme]
+
+<br>
+
+
+**46. [teddy bear, book, soft]**
+
+&#10230; [oyuncak ayı, kitap, yumuşak]
+
+<br>
+
+
+**47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
+
+&#10230;
+
+<br> [ow not edildi, Naive yaklaşım, benzerlik bilgisi yok, ew not edildi, kelime benzerliği dikkate alınır]
+
+
+**48. Embedding matrix ? For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
+
+&#10230; Gömme matrisi - Belirli bir w kelimesi için E gömme matrisi, 1-hot temsilini ew gömmesi sayesinde aşağıdaki gibi eşleştiren bir matristir:
+
+<br>
+
+
+**49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
+
+&#10230; Not: Gömme matrisinin öğrenilmesi hedef / içerik olabilirlik modelleri kullanılarak yapılabilir.
+
+<br>
+
+
+**50. Word embeddings**
+
+&#10230; Kelime gömmeleri
+
+<br>
+
+
+**51. Word2vec ? Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
+
+&#10230; Word2vec ? Word2vec, belirli bir kelimenin diğer kelimelerle çevrili olma olasılığını tahmin ederek kelime gömmelerini öğrenmeyi amaçlayan bir çerçevedir. Popüler modeller arasında skip-gram, negatif örnekleme ve CBOW bulunur.
+
+<br>
+
+
+**52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
+
+&#10230; [Sevimli ayıcık okuyor, ayıcık, yumuşak, Farsça şiir, sanat]
+
+<br>
+
+
+**53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
+
+&#10230; [Proxy görevinde ağı eğitme, üst düzey gösterimi çıkartme, Kelime gömme hesaplama]
+
+<br>
+
+
+**54. Skip-gram ? The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting ?t a parameter associated with t, the probability P(t|c) is given by:**
+
+&#10230; Skip-gram ? Skip-gram word2vec modeli verilen herhangi bir t hedef kelimesinin c gibi bir bağlam kelimesi ile gerçekleşme olasılığını değerlendirerek kelime gömmelerini öğrenen denetimli bir öğrenme görevidir.
+
+<br>
+
+
+**55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
+
+&#10230; Not: Softmax bölümünün paydasındaki tüm kelime dağarcığını toplamak, bu modeli hesaplama açısından maliyetli kılar. CBOW, verilen bir kelimeyi tahmin etmek için çevreleyen kelimeleri kullanan başka bir word2vec modelidir.
+
+<br>
+
+
+**56. Negative sampling ? It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
+
+&#10230; Negatif örnekleme - Belirli bir bağlamın ve belirli bir hedef kelimenin eşzamanlı olarak ortaya çıkmasının muhtemel olup olmadığının değerlendirilmesini, modellerin k negatif örnek kümeleri ve 1 pozitif örnek kümesinde eğitilmesini hedefleyen, lojistik regresyon kullanan bir ikili sınıflandırma kümesidir. Bağlam sözcüğü c ve hedef sözcüğü t göz önüne alındığında, tahmin şöyle ifade edilir:
+
+<br>
+
+
+**57. Remark: this method is less computationally expensive than the skip-gram model.**
+
+&#10230; Not: Bu yöntem, skip-gram modelinden daha az hesaplamalıdır.
+
+<br>
+
+
+**57bis. GloVe ? The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
+
+&#10230; GloVe ? Kelime gösterimi için Global vektörler tanımının kısaltılmış hali olan GloVe, eşzamanlı bir X matrisi kullanan ki burada her bir Xi, j, bir hedefin bir j bağlamında gerçekleştiği sayısını belirten bir kelime gömme tekniğidir. Maliyet fonksiyonu J aşağıdaki gibidir:
+
+<br>
+
+
+**58. where f is a weighting function such that Xi,j=0?f(Xi,j)=0.
+Given the symmetry that e and ? play in this model, the final word embedding e(final)w is given by:**
+
+&#10230; f, Xi, j = 0?f (Xi, j) = 0 olacak şekilde bir ağırlıklandırma fonksiyonudur.
+Bu modelde e ve ? 'nin oynadığı simetri göz önüne alındığında, e (final) w'nin kelime gömmesi şöyle ifade edilir:
+
+<br>
+
+
+**59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
+
+&#10230; Not: Öğrenilen kelime gömme bileşenlerinin ayrı ayrı bileşenleri tam olarak yorumlanamaz.
+
+<br>
+
+
+**60. Comparing words**
+
+&#10230; Kelimelerin karşılaştırılması
+
+<br>
+
+
+**61. Cosine similarity ? The cosine similarity between words w1 and w2 is expressed as follows:**
+
+&#10230; Kosinüs benzerliği - w1 ve w2 kelimeleri arasındaki kosinüs benzerliği şu şekilde ifade edilir:
+
+<br>
+
+
+**62. Remark: ? is the angle between words w1 and w2.**
+
+&#10230; Not: ?, w1 ve w2 kelimeleri arasındaki açıdır.
+
+<br>
+
+
+**63. t-SNE ? t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
+
+&#10230; t-SNE - t-SNE (t-dağıtımlı Stokastik Komşu Gömme), yüksek boyutlu gömmeleri daha düşük boyutlu bir alana indirmeyi amaçlayan bir tekniktir. Uygulamada, kelime uzaylarını 2B alanda görselleştirmek için yaygın olarak kullanılır.
+
+<br>
+
+
+**64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
+
+&#10230; [edebiyat, sanat, kitap, kültür, şiir, okuma, bilgi, eğlendirici, sevimli, çocukluk, kibar, ayıcık, yumuşak, sarılmak, sevimli, sevimli]
+
+<br>
+
+
+**65. Language model**
+
+&#10230; Dil modeli
+
+<br>
+
+
+**66. Overview ? A language model aims at estimating the probability of a sentence P(y).**
+
+&#10230; Genel bakış - Bir dil modeli P (y) cümlesinin olasılığını tahmin etmeyi amaçlar.
+
+<br>
+
+
+**67. n-gram model ? This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
+
+&#10230; n-gram modeli - Bu model, eğitim verilerindeki görünüm sayısını sayarak bir ifadenin bir korpusta ortaya çıkma olasılığını ölçmeyi amaçlayan naif bir yaklaşımdır.
+
+<br>
+
+
+**68. Perplexity ? Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
+
+&#10230; Karışıklık ? Dil modelleri yaygın olarak, PP kelimesi olarak da bilinen, T kelimesi ile normalize edilmiş veri kümesinin ters olasılığı olarak yorumlanabilen, çift yönlü ölçüm ölçüsü kullanılarak değerlendirilir. Karmaşıklık Çift yönlü, daha düşük, daha iyi ve aşağıdaki gibi tanımlandığı gibidir:
+
+<br>
+
+
+**69. Remark: PP is commonly used in t-SNE.**
+
+&#10230; Not: PP, t-SNE'de yaygın olarak kullanılır.
+
+<br>
+
+
+**70. Machine translation**
+
+&#10230; Makine çevirisi
+
+<br>
+
+
+**71. Overview ? A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
+
+&#10230; Genel bakış - Bir makine çeviri modeli, daha önce yerleştirilmiş bir kodlayıcı ağına sahip olması dışında, bir dil modeline benzer. Bu nedenle, bazen koşullu dil modeli olarak da adlandırılır. Amaç şu şekilde bir cümle bulmaktır:
+
+<br>
+
+
+**72. Beam search ? It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
+
+&#10230; Işın arama - Makine çevirisinde ve konuşma tanımada kullanılan ve x girişi verilen en olası cümleyi bulmak için kullanılan sezgisel bir arama algoritmasıdır.
+
+<br>
+
+
+**73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k-1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
+&#10230; En olası B kelimeleri bulun y <1>, 2. Adım: Koşullu olasılıkları hesaplayın y <k> | x, y <1>, ..., y <k-1>, 3. Adım: En olası B kombinasyonlarını koruyun x, y <1>, ..., y <k>, İşlemi durdurarak sonlandırın]
+
+<br>
+
+
+**74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
+
+&#10230; Not: Eğer ışın genişliği 1 olarak ayarlanmışsa, bu naif (naive) bir açgözlü (greedy) aramaya eşdeğerdir.
+
+<br>
+
+
+**75. Beam width ? The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
+
+&#10230; Işın genişliği - Işın genişliği B, ışın araması için bir parametredir. Daha yüksek B değerleri daha iyi sonuç elde edilmesini sağlar fakat daha düşük performans ve daha yüksek hafıza ile. Küçük B değerleri daha kötü sonuçlara neden olur, ancak hesaplama açısından daha az yoğundur. B için standart bir değer 10 civarındadır.
+
+<br>
+
+
+**76. Length normalization ? In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
+
+&#10230; Uzunluk normalizasyonu - Sayısal stabiliteyi arttırmak için, ışın arama genellikle, aşağıdaki gibi tanımlanan normalize edilmiş log-olabilirlik amacı olarak adlandırılan normalize edilmiş hedefe uygulanır:
+
+<br>
+
+
+**77. Remark: the parameter ? can be seen as a softener, and its value is usually between 0.5 and 1.**
+
+&#10230; Not: ? parametresi yumuşatıcı olarak görülebilir ve değeri genellikle 0,5 ile 1 arasındadır.
+
+<br>
+
+
+**78. Error analysis ? When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y* by performing the following error analysis:**
+
+&#10230; Hata analizi - Kötü bir çeviri elde edildiğinde, aşağıdaki hata analizini yaparak neden iyi bir çeviri almadığımızı araştırabiliriz:
+
+<br>
+
+
+**79. [Case, Root cause, Remedies]**
+
+&#10230; [Durum,Ana neden, Çözümler]
+
+<br>
+
+
+**80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
+
+&#10230; [Işın arama hatası, RNN hatası, Işın genişliğini artırma, farklı mimariyi deneme, Düzenlileştirme, Daha fazla bilgi edinme]
+
+<br>
+
+
+**81. Bleu score ? The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
+
+&#10230; Bleu puanı - İki dilli değerlendirme alt ölçeği (bleu) puanı, makine çevirisinin ne kadar iyi olduğunu, n-gram hassasiyetine dayalı bir benzerlik puanı hesaplayarak belirler. Aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**82. where pn is the bleu score on n-gram only defined as follows:**
+
+&#10230; pn, n-gramdaki bleu skorunun sadece aşağıdaki şekilde tanımlandığı durumlarda:
+
+<br>
+
+
+**83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
+
+&#10230; Not: Yapay olarak şişirilmiş bir bleu skorunu önlemek için kısa öngörülen çevirilere küçük bir ceza verilebilir.
+
+<br>
+
+
+**84. Attention**
+
+&#10230; Dikkat
+
+<br>
+
+
+**85. Attention model ? This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting ?<t,t'> the amount of attention that the output y<t> should pay to the activation a<t'> and c<t> the context at time t, we have:**
+
+&#10230; Dikkat modeli ? Bu model, bir RNN'de girişin önemli olduğu düşünülen belirli kısımlarına dikkat etmesine olanak sağlar,sonuçta ortaya çıkan modelin pratikteki performansını arttırır.
+
+<br>
+
+
+**86. with**
+
+&#10230; ile
+
+<br>
+
+
+**87. Remark: the attention scores are commonly used in image captioning and machine translation.**
+
+&#10230; Not: Dikkat skorları, resim yazılama ve makine çevirisinde yaygın olarak kullanılır.
+
+<br>
+
+
+**88. A cute teddy bear is reading Persian literature.**
+
+&#10230; Sevimli bir oyuncak ayı Fars edebiyatı okuyor.
+
+<br>
+
+
+**89. Attention weight ? The amount of attention that the output y<t> should pay to the activation a<t'> is given by ?<t,t'> computed as follows:**
+
+&#10230; Dikkat ağırlığı - y <t> çıktısının a <t '> aktivasyonuna vermesi gereken dikkat miktarı, aşağıdaki gibi hesaplanan ? <t, t '> ile ifade edilir:
+
+<br>
+
+
+**90. Remark: computation complexity is quadratic with respect to Tx.**
+
+&#10230; Not: hesaplama karmaşıklığı Tx'e göre ikinci derecedendir.
+
+<br>
+
+
+**91. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230; Derin Öğrenme el kitapları şimdi [hedef dilde] mevcuttur.
+
+<br>
+
+**92. Original authors**
+
+&#10230;Orijinal yazarlar
+
+<br>
+
+**93. Translated by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından çevrilmiştir.
+
+<br>
+
+**94. Reviewed by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından gözden geçirilmiştir.
+
+<br>
+
+**95. View PDF version on GitHub**
+
+&#10230; GitHub'da PDF versiyonunu görüntüleyin.
+
+<br>
+
+**96. By X and Y**
+
+&#10230; X ve Y tarafından
+
+<br>

From ea318e7c1879fcb6c18cec9665b8041110965933 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ba=C5=9Fak=20Buluz?=
 <41359672+basakbuluz@users.noreply.github.com>
Date: Fri, 25 Jan 2019 23:55:16 +0300
Subject: [PATCH 080/531] [tr] Supervised Learning

@ayyucekizrak, Thanks for the review.
@shervinea, editing of all suggestions is complete.
---
 tr/cheatsheet-supervised-learning.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tr/cheatsheet-supervised-learning.md b/tr/cheatsheet-supervised-learning.md
index fe66b2f48..90d816803 100644
--- a/tr/cheatsheet-supervised-learning.md
+++ b/tr/cheatsheet-supervised-learning.md
@@ -30,7 +30,7 @@
 
 **6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
 
-&#10230; Sürekli, Sınıf, Lineer regresyon, Lojistik regresyon, Destek Vektör Makineleri (DVM), Naive Bayes]
+&#10230; [Sürekli, Sınıf, Lineer regresyon (bağlanım), Lojistik regresyon (bağlanım), Destek Vektör Makineleri (DVM), Naive Bayes]
 
 <br> 
 
@@ -72,13 +72,13 @@
 
 **13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
 
-&#10230; [En küçük kareler hatası, Lojistik kaybı, Menteşe yitimi, Çapraz entropi]
+&#10230; [En küçük kareler hatası, Lojistik yitimi (kaybı), Menteşe yitimi (kaybı), Çapraz entropi]
 
 <br>
 
 **14. [Linear regression, Logistic regression, SVM, Neural Network]**
 
-&#10230; [Lineer regresyon, Lojistik regresyon, Destek Vektör Makineleri, Sinir Ağı]
+&#10230; [Lineer regresyon (bağlanım), Lojistik regresyon (bağlanım), Destek Vektör Makineleri, Sinir Ağı]
 
 <br>
 
@@ -276,7 +276,7 @@
 
 **47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
 
-&#10230; Menteşe yitimi - Menteşe yitimi Destek Vektör Makinelerinin ayarlarında kullanılır ve aşağıdaki gibi tanımlanır:
+&#10230; Menteşe yitimi (kaybı) - Menteşe yitimi Destek Vektör Makinelerinin ayarlarında kullanılır ve aşağıdaki gibi tanımlanır:
 
 <br>
 
@@ -329,7 +329,7 @@
 
 **56. Gaussian Discriminant Analysis**
 
-&#10230; Gauss Diskriminant Analizi
+&#10230; Gauss Diskriminant (Ayırtaç) Analizi
 
 <br>
 
@@ -474,7 +474,7 @@ Basit karar ağacının tersine, oldukça yorumlanamaz bir yapıdadır ancak gen
 
 **80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
 
-&#10230; Muhtemel Yaklaşık Doğru (Probably Approximately Correct (PAC)) ― PAC, öğrenme teorisi üzerine sayısız sonuçların kanıtlandığı ve aşağıdaki varsayımlara sahip olan bir çerçevedir:
+&#10230; Olası Yaklaşık Doğru (Probably Approximately Correct (PAC)) ― PAC, öğrenme teorisi üzerine sayısız sonuçların kanıtlandığı ve aşağıdaki varsayımlara sahip olan bir çerçevedir:
 <br> 
 
 
From 3a438537055eef1140bbfc4e14b6d8c4e1db281f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Sat, 26 Jan 2019 08:21:36 +0300
Subject: [PATCH 081/531] Update refresher-probability.md

I changed the suggested. It is OK now!
---
 tr/refresher-probability.md | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/tr/refresher-probability.md b/tr/refresher-probability.md
index d9aceea77..5e30fe358 100644
--- a/tr/refresher-probability.md
+++ b/tr/refresher-probability.md
@@ -24,13 +24,13 @@
 
 **5. Axioms of probability: For each event E, we denote P(E) as the probability of event E occuring.**
 
-&#10230; Olasılık aksiyomları: Her olay E için, E olayının meydana gelme olasılığı olarak P (E) anlamına gelir.
+&#10230; Olasılık aksiyomları: Her E olayı için, E olayının meydana gelme olasılığı P (E) olarak ifade edilir.
 
 <br>
 
 **6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
 
-&#10230; Aksiyom 1 - Her olasılık dahil 0 ile 1 arasındadır, yani:
+&#10230; Aksiyom 1 - Her olasılık 0 ve 1 de dahil olmak üzere 0 ve 1 arasındadır, yani:
 
 <br>
 
@@ -84,25 +84,25 @@
 
 **15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
 
-&#10230; Parça - {Ai,i∈[[1,n]]} olsun; {Ai}'nın bir parçası olduğunu söyleriz:
+&#10230; Parça - Tüm i değerleri için Ai≠∅ olmak üzere {Ai,i∈[[1,n]]} olsun. {Ai} bir parça olduğunu söyleriz eğer :
 
 <br>
 
 **16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
 
-&#10230; Not: Örnek uzaydaki herhangi bir B olayı için P(B)=n∑i=1P(B|Ai)P(Ai)'ye sahibiz.
+&#10230; Not: Örneklem uzaydaki herhangi bir B olayı için P(B)=n∑i=1P(B|Ai)P(Ai)'ye sahibiz.
 
 <br>
 
 **17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
 
-&#10230; Genişletilmiş Bayes kuralı formu - {Ai,i∈[[1,n]]} örneklemenin bir bölümü olsun. Elde edilen:
+&#10230; Genişletilmiş Bayes kuralı formu - {Ai,i∈[[1,n]]} örneklem uzayının bir bölümü olsun. Elde edilen:
 
 <br>
 
 **18. Independence ― Two events A and B are independent if and only if we have:**
 
-&#10230; Bağımsızlık - İki olay A ve B birbirinden bağımısz ise, elde edilen: 
+&#10230; Bağımsızlık - İki olay A ve B birbirinden bağımsızdır ancak ve ancak eğer: 
 
 <br>
 
@@ -120,13 +120,13 @@
 
 **21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
 
-&#10230; Rastgele değişken - Genellikle X işaretli rastgele bir değişken, bir örnek uzayındaki her öğeyi gerçek bir çizgiye eşleyen bir fonksiyondur (işlevdir).
+&#10230; Rastgele değişken - Genellikle X olarak ifade edilen rastgele bir değişken, bir örneklem uzayındaki her öğeyi gerçek bir çizgiye eşleyen bir fonksiyondur.
 
 <br>
 
 **22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
 
-&#10230; Kümülatif dağılım fonksiyonu (KDF/CDF) - Monotonik olarak azalmayan ve limx→−∞F(x)=0 ve limx→+∞F(x)=1 olacak şekilde kümülatif dağılım fonksiyonu F:
+&#10230; Kümülatif dağılım fonksiyonu (KDF/ Cumulative distribution function-CDF) - Monotonik olarak azalmayan ve limx→−∞F(x)=0 ve limx→+∞F(x)=1 olacak şekilde kümülatif dağılım fonksiyonu F şu şekilde tanımlanır:
 
 <br>
 
@@ -138,19 +138,19 @@
 
 **24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
 
-&#10230; Olasılık yoğunluğu fonksiyonu (OYF/PDF) - Olasılık yoğunluğu fonksiyonu f, X'in rastgele değişkenin iki bitişik gerçekleşmesi arasındaki değerleri alması ihtimalidir.
+&#10230; Olasılık yoğunluğu fonksiyonu (OYF/Probability density function-PDF) - Olasılık yoğunluğu fonksiyonu f, X'in rastgele değişkenin iki bitişik gerçekleşmesi arasındaki değerleri alması ihtimalidir.
 
 <br>
 
 **25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
 
-&#10230; PDF ve CDF'yi içeren ilişkiler - Ayrık (D) ve sürekli (C) olaylarında bilmeniz gereken önemli özellikler.
+&#10230; OYF ve KDF'yi içeren ilişkiler - Ayrık (D) ve sürekli (C) olaylarında bilmeniz gereken önemli özelliklerdir.
 
 <br>
 
 **26. [Case, CDF F, PDF f, Properties of PDF]**
 
-&#10230; [Olay, CDF F, PDF f, PDF Özellikleri]
+&#10230; [Olay, KDF F, OYF f, OYF Özellikleri]
 
 <br>
 
@@ -162,19 +162,19 @@
 
 **28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
 
-&#10230; Varyans - Genellikle Var(X) veya σ2 olarak not edilen rastgele değişkenin varyansı, dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
+&#10230; Varyans - Genellikle Var(X) veya σ2 olarak ifade edilen rastgele değişkenin varyansı, dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
 
 <br>
 
 **29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
 
-&#10230; Standart sapma - Genellikle σ olarak not edilen rastgele bir değişkenin standart sapması, gerçek rastgele değişkenin birimleriyle uyumlu olan dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
+&#10230; Standart sapma - Genellikle σ olarak ifade edilen rastgele bir değişkenin standart sapması, gerçek rastgele değişkenin birimleriyle uyumlu olan dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
 
 <br>
 
 **30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
 
-&#10230; Rastgele değişkenlerin dönüşümü - X ve Y değişkenlerinin bazı fonksiyonlarla bağlanır. FX ve fY'ye sırasıyla X ve Y'nin dağılım fonksiyonu şöyledir:
+&#10230; Rastgele değişkenlerin dönüşümü - X ve Y değişkenlerinin bazı fonksiyonlarla bağlanır. fX ve fY'ye sırasıyla X ve Y'nin dağılım fonksiyonu şöyledir:
 
 <br>
 
@@ -192,7 +192,7 @@
 
 **33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
 
-&#10230; Chebyshev'in eşitsizliği - X'in beklenen değeri value olan rastgele bir değişken olmasına izin verin. K, σ>0 için aşağıdaki eşitsizliği elde edilir:
+&#10230; Chebyshev'in eşitsizliği - X beklenen değeri μ olan rastgele bir değişken olsun. K, σ>0 için aşağıdaki eşitsizliği elde edilir:
 
 <br>
 

From 162e1bfac44000e9b3b12122c4b073d581fd7592 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 26 Jan 2019 15:12:41 -0800
Subject: [PATCH 082/531] Add [tr] contributors

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index caf2f5495..9aacb2f66 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -97,6 +97,9 @@
   Kadir Tekeli (translation of linear algebra)
   Ekrem Çetinkaya (review of linear algebra)
   
+  Başak Buluz (translation of supervised learning)
+  Ayyüce Kızrak (review of supervised learning)
+  
   Yavuz Kömeçoğlu (translation of unsupervised learning)
   Başak Buluz (review of unsupervised learning)
   

From 4cc94743e4740422dd63d53bee4cc1ac77c49b1e Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 26 Jan 2019 15:15:03 -0800
Subject: [PATCH 083/531] Add more [tr] contributors

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 9aacb2f66..a9c35331f 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -96,6 +96,9 @@
   
   Kadir Tekeli (translation of linear algebra)
   Ekrem Çetinkaya (review of linear algebra)
+
+  Ayyüce Kızrak (translation of probabilities and statistics)
+  Başak Buluz (review of probabilities and statistics)
   
   Başak Buluz (translation of supervised learning)
   Ayyüce Kızrak (review of supervised learning)

From c309007c5bd22bf81aad49c5313497cbbb9dfa82 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 26 Jan 2019 15:21:23 -0800
Subject: [PATCH 084/531] Update progression table

---
 README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 0938d18f6..2c577d67b 100644
--- a/README.md
+++ b/README.md
@@ -44,7 +44,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |Convolutional Neural Nets|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/117)|not started|not started|
-|Recurrent Neural Nets|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/118)|not started|not started|
+|Recurrent Neural Nets|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/120)|not started|not started|
 |DL tips and tricks|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/116)|not started|not started|
 
 
@@ -67,10 +67,10 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |Deep learning|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|
-|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/114)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|
+|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|
 |Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|not started|not started|done|not started|not started|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/39)|not started|not started|
-|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/115)|not started|not started|
+|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/119)|not started|not started|
+|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|not started|not started|done|not started|not started|
 |Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|not started|not started|done|not started|not started|
 
 
From 1017252d3743095cb5927b0a50ef2ecef34fc424 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ba=C5=9Fak=20Buluz?=
 <41359672+basakbuluz@users.noreply.github.com>
Date: Sun, 27 Jan 2019 16:21:21 +0300
Subject: [PATCH 085/531] [tr] Recurrent neural networks

---
 tr/recurrent-neural-networks.md | 145 ++++++++++++++++----------------
 1 file changed, 73 insertions(+), 72 deletions(-)

diff --git a/tr/recurrent-neural-networks.md b/tr/recurrent-neural-networks.md
index 6f3a18b36..83a78b588 100644
--- a/tr/recurrent-neural-networks.md
+++ b/tr/recurrent-neural-networks.md
@@ -49,7 +49,7 @@
 
 **8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
 
-&#10230; [Makine çevirisi, Kiriş arama, Uzunluk normalizasyonu, Hata analizi, Bleu skoru]
+&#10230; [Makine çevirisi, Işın araması, Uzunluk normalizasyonu, Hata analizi, Bleu skoru]
 
 <br>
 
@@ -68,7 +68,7 @@
 <br>
 
 
-**11. Architecture of a traditional RNN ? Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
+**11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
 
 &#10230; Geleneksel bir RNN mimarisi - RNN'ler olarak da bilinen tekrarlayan sinir ağları, gizli durumlara sahipken önceki çıktıların girdi olarak kullanılmasına izin veren bir sinir ağları sınıfıdır. Tipik olarak aşağıdaki gibidirler:
 
@@ -117,9 +117,9 @@
 <br>
 
 
-**18. Applications of RNNs ? RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
+**18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
 
-&#10230; RNN'lerin Uygulamaları - RNN modelleri çoğunlukla doğal dil işleme ve konuşma tanıma alanlarında kullanılır. Farklı uygulamalar aşağıdaki tabloda özetlenmiştir:
+&#10230; RNN'lerin Uygulamaları ― RNN modelleri çoğunlukla doğal dil işleme ve konuşma tanıma alanlarında kullanılır. Farklı uygulamalar aşağıdaki tabloda özetlenmiştir:
 
 <br>
 
@@ -145,16 +145,16 @@
 <br>
 
 
-**22. Loss function ? In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
+**22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
 
-&#10230; Kayıp fonksiyonu - Tekrarlayan bir sinir ağı olması durumunda, tüm zaman dilimlerindeki L kayıp fonksiyonu, her zaman dilimindeki kayıbı temel alınarak aşağıdaki gibi tanımlanır:
+&#10230; Kayıp fonksiyonu ― Tekrarlayan bir sinir ağı olması durumunda, tüm zaman dilimlerindeki L kayıp fonksiyonu, her zaman dilimindeki kayıbı temel alınarak aşağıdaki gibi tanımlanır:
 
 <br>
 
 
-**23. Backpropagation through time ? Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
+**23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
 
-&#10230; Zamanla geri yayılım - Geriye yayılım zamanın her noktasında yapılır. T zaman diliminde, ağırlık matrisi W'ye göre L kaybının türevi aşağıdaki gibi ifade edilir:
+&#10230; Zamanla geri yayılım ― Geriye yayılım zamanın her noktasında yapılır. T zaman diliminde, ağırlık matrisi W'ye göre L kaybının türevi aşağıdaki gibi ifade edilir:
 
 <br>
 
@@ -166,9 +166,9 @@
 <br>
 
 
-**25. Commonly used activation functions ? The most common activation functions used in RNN modules are described below:**
+**25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
 
-&#10230; Yaygın olarak kullanılan aktivasyon fonksiyonları - RNN modüllerinde kullanılan en yaygın aktivasyon fonksiyonları aşağıda açıklanmıştır:
+&#10230; Yaygın olarak kullanılan aktivasyon fonksiyonları ― RNN modüllerinde kullanılan en yaygın aktivasyon fonksiyonları aşağıda açıklanmıştır:
 
 <br>
 
@@ -180,16 +180,16 @@
 <br>
 
 
-**27. Vanishing/exploding gradient ? The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
+**27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
 
-&#10230; Kaybolan / patlayan gradyan - Kaybolan ve patlayan gradyan fenomenlerine RNN'ler bağlamında sıklıkla rastlanır. Bunların olmasının nedeni, katman sayısına göre katlanarak azalan / artan olabilen çarpımsal gradyan nedeniyle uzun vadeli bağımlılıkları yakalamanın zor olmasıdır.
+&#10230; Kaybolan / patlayan gradyan ― Kaybolan ve patlayan gradyan fenomenlerine RNN'ler bağlamında sıklıkla rastlanır. Bunların olmasının nedeni, katman sayısına göre katlanarak azalan / artan olabilen çarpımsal gradyan nedeniyle uzun vadeli bağımlılıkları yakalamanın zor olmasıdır.
 
 <br>
 
 
-**28. Gradient clipping ? It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
+**28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
 
-&#10230; Gradyan kırpma - Geri yayılım işlemi sırasında bazen karşılaşılan patlayan gradyan sorunuyla başa çıkmak için kullanılan bir tekniktir. Gradyan için maksimum değeri sınırlayarak, bu durum pratikte kontrol edilir.
+&#10230; Gradyan kırpma ― Geri yayılım işlemi sırasında bazen karşılaşılan patlayan gradyan sorunuyla başa çıkmak için kullanılan bir tekniktir. Gradyan için maksimum değeri sınırlayarak, bu durum pratikte kontrol edilir.
 
 <br>
 
@@ -201,16 +201,16 @@
 <br>
 
 
-**30. Types of gates ? In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted ? and are equal to:**
+**30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
 
-&#10230; Giriş Kapıları Çeşitleri - Kaybolan gradyan problemini çözmek için bazı RNN türlerinde belirli kapılar kullanılır ve genellikle iyi tanımlanmış bir amaca sahiptir. Genellikle ? olarak ifade edilir ve şuna eşittir:
+&#10230; Giriş Kapıları Çeşitleri ― Kaybolan gradyan problemini çözmek için bazı RNN türlerinde belirli kapılar kullanılır ve genellikle iyi tanımlanmış bir amaca sahiptir. Genellikle Γ olarak ifade edilir ve şuna eşittir:
 
 <br>
 
 
-**31. where W,U,b are coefficients specific to the gate and ? is the sigmoid function. The main ones are summed up in the table below:**
+**31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
 
-&#10230; burada W, U, b kapıya özgü katsayılardır ve ? ise sigmoid fonksiyondur. Temel olanlar aşağıdaki tabloda özetlenmiştir:
+&#10230; burada W, U, b kapıya özgü katsayılardır ve σ ise sigmoid fonksiyondur. Temel olanlar aşağıdaki tabloda özetlenmiştir:
 
 <br>
 
@@ -231,7 +231,7 @@
 
 **34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
 
-&#10230; [Şimdi ne kadar geçmiş olması gerekir ?, Önceki bilgiyi bırak?, Bir hücreyi sil ya da silme?, Bir hücreyi ortaya çıkarmak için ne kadar?]
+&#10230; [Şimdi ne kadar geçmiş olması gerekir?, Önceki bilgiyi bırak?, Bir hücreyi sil ya da silme?, Bir hücreyi ortaya çıkarmak için ne kadar?]
 
 <br>
 
@@ -243,9 +243,9 @@
 <br>
 
 
-**36. GRU/LSTM ? Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
+**36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
 
-&#10230; GRU / LSTM - Geçitli Tekrarlayan Birim (Gated Recurrent Unit-GRU) ve Uzun Kısa Süreli Bellek Birimleri (Long Short-Term Memory-LSTM), geleneksel RNN'lerin karşılaştığı kaybolan gradyan problemini ele alır, LSTM ise GRU'nun genelleştirilmiş halidir. Her bir mimarinin karakterizasyon denklemlerini özetleyen tablo aşağıdadır:
+&#10230; GRU/LSTM ― Geçitli Tekrarlayan Birim (Gated Recurrent Unit-GRU) ve Uzun Kısa Süreli Bellek Birimleri (Long Short-Term Memory-LSTM), geleneksel RNN'lerin karşılaştığı kaybolan gradyan problemini ele alır, LSTM ise GRU'nun genelleştirilmiş halidir. Her bir mimarinin karakterizasyon denklemlerini özetleyen tablo aşağıdadır:
 
 <br>
 
@@ -257,16 +257,16 @@
 <br>
 
 
-**38. Remark: the sign ? denotes the element-wise multiplication between two vectors.**
+**38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
 
-&#10230; Not: ? işareti iki vektör arasındaki birimsel çarpımı belirtir.
+&#10230; Not: ⋆ işareti iki vektör arasındaki birimsel çarpımı belirtir.
 
 <br>
 
 
-**39. Variants of RNNs ? The table below sums up the other commonly used RNN architectures:**
+**39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
 
-&#10230; RNN varyantları - Aşağıdaki tablo, diğer yaygın kullanılan RNN mimarilerini özetlemektedir:
+&#10230; RNN varyantları ― Aşağıdaki tablo, diğer yaygın kullanılan RNN mimarilerini özetlemektedir:
 
 <br>
 
@@ -299,9 +299,9 @@
 <br>
 
 
-**44. Representation techniques ? The two main ways of representing words are summed up in the table below:**
+**44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
 
-&#10230; Temsil etme teknikleri - Kelimeleri temsil etmenin iki temel yolu aşağıdaki tabloda özetlenmiştir:
+&#10230; Temsil etme teknikleri ― Kelimeleri temsil etmenin iki temel yolu aşağıdaki tabloda özetlenmiştir:
 
 <br>
 
@@ -327,9 +327,9 @@
 <br> [ow not edildi, Naive yaklaşım, benzerlik bilgisi yok, ew not edildi, kelime benzerliği dikkate alınır]
 
 
-**48. Embedding matrix ? For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
+**48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
 
-&#10230; Gömme matrisi - Belirli bir w kelimesi için E gömme matrisi, 1-hot temsilini ew gömmesi sayesinde aşağıdaki gibi eşleştiren bir matristir:
+&#10230; Gömme matrisi ― Belirli bir w kelimesi için E gömme matrisi, 1-hot temsilini ew gömmesi sayesinde aşağıdaki gibi eşleştiren bir matristir:
 
 <br>
 
@@ -348,9 +348,9 @@
 <br>
 
 
-**51. Word2vec ? Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
+**51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
 
-&#10230; Word2vec ? Word2vec, belirli bir kelimenin diğer kelimelerle çevrili olma olasılığını tahmin ederek kelime gömmelerini öğrenmeyi amaçlayan bir çerçevedir. Popüler modeller arasında skip-gram, negatif örnekleme ve CBOW bulunur.
+&#10230; Word2vec ― Word2vec, belirli bir kelimenin diğer kelimelerle çevrili olma olasılığını tahmin ederek kelime gömmelerini öğrenmeyi amaçlayan bir çerçevedir. Popüler modeller arasında skip-gram, negatif örnekleme ve CBOW bulunur.
 
 <br>
 
@@ -369,9 +369,9 @@
 <br>
 
 
-**54. Skip-gram ? The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting ?t a parameter associated with t, the probability P(t|c) is given by:**
+**54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
 
-&#10230; Skip-gram ? Skip-gram word2vec modeli verilen herhangi bir t hedef kelimesinin c gibi bir bağlam kelimesi ile gerçekleşme olasılığını değerlendirerek kelime gömmelerini öğrenen denetimli bir öğrenme görevidir.
+&#10230; Skip-gram ― Skip-gram word2vec modeli verilen herhangi bir t hedef kelimesinin c gibi bir bağlam kelimesi ile gerçekleşme olasılığını değerlendirerek kelime gömmelerini öğrenen denetimli bir öğrenme görevidir.
 
 <br>
 
@@ -383,7 +383,7 @@
 <br>
 
 
-**56. Negative sampling ? It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
+**56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
 
 &#10230; Negatif örnekleme - Belirli bir bağlamın ve belirli bir hedef kelimenin eşzamanlı olarak ortaya çıkmasının muhtemel olup olmadığının değerlendirilmesini, modellerin k negatif örnek kümeleri ve 1 pozitif örnek kümesinde eğitilmesini hedefleyen, lojistik regresyon kullanan bir ikili sınıflandırma kümesidir. Bağlam sözcüğü c ve hedef sözcüğü t göz önüne alındığında, tahmin şöyle ifade edilir:
 
@@ -397,18 +397,18 @@
 <br>
 
 
-**57bis. GloVe ? The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
+**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
 
-&#10230; GloVe ? Kelime gösterimi için Global vektörler tanımının kısaltılmış hali olan GloVe, eşzamanlı bir X matrisi kullanan ki burada her bir Xi, j, bir hedefin bir j bağlamında gerçekleştiği sayısını belirten bir kelime gömme tekniğidir. Maliyet fonksiyonu J aşağıdaki gibidir:
+&#10230; GloVe ― Kelime gösterimi için Global vektörler tanımının kısaltılmış hali olan GloVe, eşzamanlı bir X matrisi kullanan ki burada her bir Xi,j , bir hedefin bir j bağlamında gerçekleştiği sayısını belirten bir kelime gömme tekniğidir. Maliyet fonksiyonu J aşağıdaki gibidir:
 
 <br>
 
 
-**58. where f is a weighting function such that Xi,j=0?f(Xi,j)=0.
+**58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
 Given the symmetry that e and ? play in this model, the final word embedding e(final)w is given by:**
 
-&#10230; f, Xi, j = 0?f (Xi, j) = 0 olacak şekilde bir ağırlıklandırma fonksiyonudur.
-Bu modelde e ve ? 'nin oynadığı simetri göz önüne alındığında, e (final) w'nin kelime gömmesi şöyle ifade edilir:
+&#10230; f, Xi,j=0⟹f(Xi,j)=0 olacak şekilde bir ağırlıklandırma fonksiyonudur.
+Bu modelde e ve θ'nin oynadığı simetri göz önüne alındığında, e (final) w'nin kelime gömmesi şöyle ifade edilir:
 
 <br>
 
@@ -427,23 +427,23 @@ Bu modelde e ve ? 'nin oynadığı simetri göz önüne alındığında, e (fina
 <br>
 
 
-**61. Cosine similarity ? The cosine similarity between words w1 and w2 is expressed as follows:**
+**61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
 
-&#10230; Kosinüs benzerliği - w1 ve w2 kelimeleri arasındaki kosinüs benzerliği şu şekilde ifade edilir:
+&#10230; Kosinüs benzerliği ― w1 ve w2 kelimeleri arasındaki kosinüs benzerliği şu şekilde ifade edilir:
 
 <br>
 
 
-**62. Remark: ? is the angle between words w1 and w2.**
+**62. Remark: θ is the angle between words w1 and w2.**
 
-&#10230; Not: ?, w1 ve w2 kelimeleri arasındaki açıdır.
+&#10230; Not: θ, w1 ve w2 kelimeleri arasındaki açıdır.
 
 <br>
 
 
-**63. t-SNE ? t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
+**63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
 
-&#10230; t-SNE - t-SNE (t-dağıtımlı Stokastik Komşu Gömme), yüksek boyutlu gömmeleri daha düşük boyutlu bir alana indirmeyi amaçlayan bir tekniktir. Uygulamada, kelime uzaylarını 2B alanda görselleştirmek için yaygın olarak kullanılır.
+&#10230; t-SNE ― t-SNE (t-dağıtımlı Stokastik Komşu Gömme), yüksek boyutlu gömmeleri daha düşük boyutlu bir alana indirmeyi amaçlayan bir tekniktir. Uygulamada, kelime uzaylarını 2B alanda görselleştirmek için yaygın olarak kullanılır.
 
 <br>
 
@@ -462,23 +462,23 @@ Bu modelde e ve ? 'nin oynadığı simetri göz önüne alındığında, e (fina
 <br>
 
 
-**66. Overview ? A language model aims at estimating the probability of a sentence P(y).**
+**66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
 
-&#10230; Genel bakış - Bir dil modeli P (y) cümlesinin olasılığını tahmin etmeyi amaçlar.
+&#10230; Genel bakış - Bir dil modeli P(y) cümlesinin olasılığını tahmin etmeyi amaçlar.
 
 <br>
 
 
-**67. n-gram model ? This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
+**67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
 
-&#10230; n-gram modeli - Bu model, eğitim verilerindeki görünüm sayısını sayarak bir ifadenin bir korpusta ortaya çıkma olasılığını ölçmeyi amaçlayan naif bir yaklaşımdır.
+&#10230; n-gram modeli ― Bu model, eğitim verilerindeki görünüm sayısını sayarak bir ifadenin bir korpusta ortaya çıkma olasılığını ölçmeyi amaçlayan naif bir yaklaşımdır.
 
 <br>
 
 
-**68. Perplexity ? Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
+**68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
 
-&#10230; Karışıklık ? Dil modelleri yaygın olarak, PP kelimesi olarak da bilinen, T kelimesi ile normalize edilmiş veri kümesinin ters olasılığı olarak yorumlanabilen, çift yönlü ölçüm ölçüsü kullanılarak değerlendirilir. Karmaşıklık Çift yönlü, daha düşük, daha iyi ve aşağıdaki gibi tanımlandığı gibidir:
+&#10230; Karışıklık - Dil modelleri yaygın olarak, PP olarak da bilinen karışıklık metriği kullanılarak değerlendirilir ve bunlar T kelimelerinin sayısıyla normalize edilmiş veri setinin ters olasılığı olarak yorumlanabilir. Karışıklık, daha düşük, daha iyi ve şöyle tanımlanır:
 
 <br>
 
@@ -497,22 +497,23 @@ Bu modelde e ve ? 'nin oynadığı simetri göz önüne alındığında, e (fina
 <br>
 
 
-**71. Overview ? A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
+**71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
 
-&#10230; Genel bakış - Bir makine çeviri modeli, daha önce yerleştirilmiş bir kodlayıcı ağına sahip olması dışında, bir dil modeline benzer. Bu nedenle, bazen koşullu dil modeli olarak da adlandırılır. Amaç şu şekilde bir cümle bulmaktır:
+&#10230; Genel bakış ― Bir makine çeviri modeli, daha önce yerleştirilmiş bir kodlayıcı ağına sahip olması dışında, bir dil modeline benzer. Bu nedenle, bazen koşullu dil modeli olarak da adlandırılır. Amaç şu şekilde bir cümle bulmaktır:
 
 <br>
 
 
-**72. Beam search ? It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
+**72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
 
-&#10230; Işın arama - Makine çevirisinde ve konuşma tanımada kullanılan ve x girişi verilen en olası cümleyi bulmak için kullanılan sezgisel bir arama algoritmasıdır.
+&#10230; Işın arama ― Makine çevirisinde ve konuşma tanımada kullanılan ve x girişi verilen en olası cümleyi bulmak için kullanılan sezgisel bir arama algoritmasıdır.
 
 <br>
 
 
 **73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k-1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
-&#10230; En olası B kelimeleri bulun y <1>, 2. Adım: Koşullu olasılıkları hesaplayın y <k> | x, y <1>, ..., y <k-1>, 3. Adım: En olası B kombinasyonlarını koruyun x, y <1>, ..., y <k>, İşlemi durdurarak sonlandırın]
+  
+&#10230; [Adım 1: En olası B kelimeleri bulun y<1>, 2. Adım: Koşullu olasılıkları hesaplayın y|x,y<1>, ..., y, 3. Adım: En olası B kombinasyonlarını koruyun x,y<1>, ..., y, İşlemi durdurarak sonlandırın]
 
 <br>
 
@@ -524,37 +525,37 @@ Bu modelde e ve ? 'nin oynadığı simetri göz önüne alındığında, e (fina
 <br>
 
 
-**75. Beam width ? The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
+**75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
 
-&#10230; Işın genişliği - Işın genişliği B, ışın araması için bir parametredir. Daha yüksek B değerleri daha iyi sonuç elde edilmesini sağlar fakat daha düşük performans ve daha yüksek hafıza ile. Küçük B değerleri daha kötü sonuçlara neden olur, ancak hesaplama açısından daha az yoğundur. B için standart bir değer 10 civarındadır.
+&#10230; Işın genişliği ― Işın genişliği B, ışın araması için bir parametredir. Daha yüksek B değerleri daha iyi sonuç elde edilmesini sağlar fakat daha düşük performans ve daha yüksek hafıza ile. Küçük B değerleri daha kötü sonuçlara neden olur, ancak hesaplama açısından daha az yoğundur. B için standart bir değer 10 civarındadır.
 
 <br>
 
 
-**76. Length normalization ? In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
+**76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
 
-&#10230; Uzunluk normalizasyonu - Sayısal stabiliteyi arttırmak için, ışın arama genellikle, aşağıdaki gibi tanımlanan normalize edilmiş log-olabilirlik amacı olarak adlandırılan normalize edilmiş hedefe uygulanır:
+&#10230; Uzunluk normalizasyonu ― Sayısal stabiliteyi arttırmak için, ışın arama genellikle, aşağıdaki gibi tanımlanan normalize edilmiş log-olabilirlik amacı olarak adlandırılan normalize edilmiş hedefe uygulanır:
 
 <br>
 
 
-**77. Remark: the parameter ? can be seen as a softener, and its value is usually between 0.5 and 1.**
+**77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
 
-&#10230; Not: ? parametresi yumuşatıcı olarak görülebilir ve değeri genellikle 0,5 ile 1 arasındadır.
+&#10230; Not: α parametresi yumuşatıcı olarak görülebilir ve değeri genellikle 0,5 ile 1 arasındadır.
 
 <br>
 
 
-**78. Error analysis ? When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y* by performing the following error analysis:**
+**78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
 
-&#10230; Hata analizi - Kötü bir çeviri elde edildiğinde, aşağıdaki hata analizini yaparak neden iyi bir çeviri almadığımızı araştırabiliriz:
+&#10230; Hata analizi ― Kötü bir çeviri elde edildiğinde, aşağıdaki hata analizini yaparak neden iyi bir çeviri almadığımızı araştırabiliriz:
 
 <br>
 
 
 **79. [Case, Root cause, Remedies]**
 
-&#10230; [Durum,Ana neden, Çözümler]
+&#10230; [Durum, Ana neden, Çözümler]
 
 <br>
 
@@ -566,9 +567,9 @@ Bu modelde e ve ? 'nin oynadığı simetri göz önüne alındığında, e (fina
 <br>
 
 
-**81. Bleu score ? The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
+81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:
 
-&#10230; Bleu puanı - İki dilli değerlendirme alt ölçeği (bleu) puanı, makine çevirisinin ne kadar iyi olduğunu, n-gram hassasiyetine dayalı bir benzerlik puanı hesaplayarak belirler. Aşağıdaki gibi tanımlanır:
+&#10230; Bleu puanı ― İki dilli değerlendirme alt ölçeği (bleu) puanı, makine çevirisinin ne kadar iyi olduğunu, n-gram hassasiyetine dayalı bir benzerlik puanı hesaplayarak belirler. Aşağıdaki gibi tanımlanır:
 
 <br>
 
@@ -594,9 +595,9 @@ Bu modelde e ve ? 'nin oynadığı simetri göz önüne alındığında, e (fina
 <br>
 
 
-**85. Attention model ? This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting ?<t,t'> the amount of attention that the output y<t> should pay to the activation a<t'> and c<t> the context at time t, we have:**
+**85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y should pay to the activation a<t′> and c the context at time t, we have:**
 
-&#10230; Dikkat modeli ? Bu model, bir RNN'de girişin önemli olduğu düşünülen belirli kısımlarına dikkat etmesine olanak sağlar,sonuçta ortaya çıkan modelin pratikteki performansını arttırır.
+&#10230; Dikkat modeli ― Bu model, bir RNN'de girişin önemli olduğu düşünülen belirli kısımlarına dikkat etmesine olanak sağlar,sonuçta ortaya çıkan modelin pratikteki performansını arttırır.
 
 <br>
 
@@ -610,7 +611,7 @@ Bu modelde e ve ? 'nin oynadığı simetri göz önüne alındığında, e (fina
 
 **87. Remark: the attention scores are commonly used in image captioning and machine translation.**
 
-&#10230; Not: Dikkat skorları, resim yazılama ve makine çevirisinde yaygın olarak kullanılır.
+&#10230; Not: Dikkat skorları, görüntü altyazılama ve makine çevirisinde yaygın olarak kullanılır.
 
 <br>
 
@@ -622,9 +623,9 @@ Bu modelde e ve ? 'nin oynadığı simetri göz önüne alındığında, e (fina
 <br>
 
 
-**89. Attention weight ? The amount of attention that the output y<t> should pay to the activation a<t'> is given by ?<t,t'> computed as follows:**
+**89. Attention weight ― The amount of attention that the output y should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
 
-&#10230; Dikkat ağırlığı - y <t> çıktısının a <t '> aktivasyonuna vermesi gereken dikkat miktarı, aşağıdaki gibi hesaplanan ? <t, t '> ile ifade edilir:
+&#10230; Dikkat ağırlığı ― Y çıktısının a<t′> aktivasyonuna vermesi gereken dikkat miktarı, aşağıdaki gibi hesaplanan α<t,t′> ile verilir:
 
 <br>
 

From a69f2910321579bf2cadc386706709ec0a37b69d Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ba=C5=9Fak=20Buluz?=
 <41359672+basakbuluz@users.noreply.github.com>
Date: Sun, 27 Jan 2019 16:39:25 +0300
Subject: [PATCH 086/531] Update recurrent-neural-networks.md

---
 tr/recurrent-neural-networks.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tr/recurrent-neural-networks.md b/tr/recurrent-neural-networks.md
index 83a78b588..17536b665 100644
--- a/tr/recurrent-neural-networks.md
+++ b/tr/recurrent-neural-networks.md
@@ -567,7 +567,7 @@ Bu modelde e ve θ'nin oynadığı simetri göz önüne alındığında, e (fina
 <br>
 
 
-81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:
+**81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
 
 &#10230; Bleu puanı ― İki dilli değerlendirme alt ölçeği (bleu) puanı, makine çevirisinin ne kadar iyi olduğunu, n-gram hassasiyetine dayalı bir benzerlik puanı hesaplayarak belirler. Aşağıdaki gibi tanımlanır:
 
@@ -597,7 +597,7 @@ Bu modelde e ve θ'nin oynadığı simetri göz önüne alındığında, e (fina
 
 **85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y should pay to the activation a<t′> and c the context at time t, we have:**
 
-&#10230; Dikkat modeli ― Bu model, bir RNN'de girişin önemli olduğu düşünülen belirli kısımlarına dikkat etmesine olanak sağlar,sonuçta ortaya çıkan modelin pratikteki performansını arttırır.
+&#10230; Dikkat modeli ― Bu model, bir RNN'de girişin önemli olduğu düşünülen belirli kısımlarına dikkat etmesine olanak sağlar,sonuçta ortaya çıkan modelin pratikteki performansını arttırır. α<t,t′> ile ifade edilen dikkat miktarı, a<t′> aktivasyonu ve t zamanındaki c bağlamını y çıktısı olarak verir.
 
 <br>
 

From eb7fc1694b625f300b4fc200422a21e627eca443 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 27 Jan 2019 10:59:15 -0800
Subject: [PATCH 087/531] Add [tr] contributors

---
 CONTRIBUTORS | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index a9c35331f..041b73cb1 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -96,6 +96,10 @@
   
   Kadir Tekeli (translation of linear algebra)
   Ekrem Çetinkaya (review of linear algebra)
+  
+  Seray Beşer (translation of machine learning tips and tricks)
+  Ayyüce Kızrak (review of machine learning tips and tricks)
+  Yavuz Kömeçoğlu (review of machine learning tips and tricks)
 
   Ayyüce Kızrak (translation of probabilities and statistics)
   Başak Buluz (review of probabilities and statistics)

From f3c3b38a6ac41e78701af9dd00a46987362d56eb Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 27 Jan 2019 12:22:45 -0800
Subject: [PATCH 088/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 2c577d67b..f3aa8291b 100644
--- a/README.md
+++ b/README.md
@@ -69,7 +69,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Deep learning|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|
 |Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|
 |Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|not started|not started|done|not started|not started|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/119)|not started|not started|
+|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|not started|done|not started|not started|
 |Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|not started|not started|done|not started|not started|
 |Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|not started|not started|done|not started|not started|
 

From eccf772a1a2186065b6c371f237d2fec9585a602 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 27 Jan 2019 12:24:23 -0800
Subject: [PATCH 089/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index f3aa8291b..752c0e874 100644
--- a/README.md
+++ b/README.md
@@ -69,7 +69,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Deep learning|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|
 |Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|
 |Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|not started|not started|done|not started|not started|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|not started|done|not started|not started|
+|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|not started|done|done|not started|not started|
 |Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|not started|not started|done|not started|not started|
 |Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|not started|not started|done|not started|not started|
 

From 5b1db7353ee643b56f338d030c385f7f77e69325 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 27 Jan 2019 12:25:20 -0800
Subject: [PATCH 090/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 752c0e874..fe1999f98 100644
--- a/README.md
+++ b/README.md
@@ -69,7 +69,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Deep learning|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|
 |Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|
 |Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|not started|not started|done|not started|not started|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|not started|done|done|not started|not started|
+|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|done|not started|not started|
 |Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|not started|not started|done|not started|not started|
 |Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|not started|not started|done|not started|not started|
 

From 839b9efe5269fa62eb1cabf0bb50d82cdc2fc48a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Sun, 27 Jan 2019 23:43:33 +0300
Subject: [PATCH 091/531] [tr] Deep Learning tips and tricks

I've completed the entire translation of this title.
Please note that the review process can be started. Thank you!
---
 tr/deep-learning-tips-and-tricks.md | 453 ++++++++++++++++++++++++++++
 1 file changed, 453 insertions(+)
 create mode 100644 tr/deep-learning-tips-and-tricks.md

diff --git a/tr/deep-learning-tips-and-tricks.md b/tr/deep-learning-tips-and-tricks.md
new file mode 100644
index 000000000..1470a3875
--- /dev/null
+++ b/tr/deep-learning-tips-and-tricks.md
@@ -0,0 +1,453 @@
+**1. Deep Learning Tips and Tricks cheatsheet**
+
+&#10230; Derin öğrenme püf noktaları ve ipuçları el kitabı
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Derin Öğrenme
+
+<br>
+
+
+**3. Tips and tricks**
+
+&#10230; Püf noktaları ve ipuçları
+
+<br>
+
+
+**4. [Data processing, Data augmentation, Batch normalization]**
+
+&#10230; [Veri işleme, Veri artırma, Küme normalizasyonu]
+
+<br>
+
+**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
+
+&#10230;
+
+**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
+
+&#10230; [Bir sinir ağının eğitilmesi, Epoch, Mini-küme, Çapraz-entropy yitimi (kaybı), Geriye yayılım, Gradyan (Bayır) iniş, Ağırlıkların güncellenmesi, Gradyan kontrolü]
+
+<br>
+
+
+**6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
+
+&#10230; [Parametrelerin ayarlanması, Xavier başlatma, Transfer öğrenme, Öğrenme oranı, Uyarlamalı öğrenme oranları]
+
+<br>
+
+
+**7. [Regularization, Dropout, Weight regularization, Early stopping]**
+
+&#10230; [Düzenlileştirme, Seyreltme, Ağırlıkların düzeltilmesi, Erken durdurma]
+
+<br>
+
+
+**8. [Good practices, Overfitting small batch, Gradient checking]**
+
+&#10230; [İyi örnekler, Küçük kümelerin ezberlenmesi, Gradyanların kontrolü]
+
+<br>
+
+
+**9. View PDF version on GitHub**
+
+&#10230; GitHub'da PDF sürümünü görüntüleyin
+
+<br>
+
+
+**10. Data processing**
+
+&#10230; Veri işleme
+
+<br>
+
+
+**11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
+
+&#10230; Veri artırma ― Derin öğrenme modelleri genellikle uygun şekilde eğitilmek için çok fazla veriye ihtiyaç duyar. Veri artırma tekniklerini kullanarak mevcut verilerden daha fazla veri üretmek genellikle yararlıdır. Temel işlemler aşağıdaki tabloda özetlenmiştir. Daha doğrusu, aşağıdaki girdi görüntüsüne bakıldığında, uygulayabileceğimiz teknikler şunlardır:
+
+<br>
+
+
+**12. [Original, Flip, Rotation, Random crop]**
+
+&#10230; [Orijinal, Çevirme, Rotasyon (Yönlendirme), Rastgele kırpma/kesme]
+ 
+<br>
+
+
+**13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
+
+&#10230; [Herhangi bir değişiklik yapmamış resim, Görüntünün anlamının korunduğu bir eksene göre çevrilmiş görüntü, Hafif açılı döndürme, Yanlış yatay kalibrasyonu simule eder, Görüntünün bir bölümüne rastgele odaklanma, Arka arkaya birkaç rasgele kesme yapılabilir]
+
+<br>
+
+
+**14. [Color shift, Noise addition, Information loss, Contrast change]**
+
+&#10230; [Renk kaydırma, Gürültü ekleme, Bilgi kaybı, Kontrast değişimi]
+
+<br>
+
+
+**15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
+
+&#10230; [RGB'nin nüansları biraz değiştirilmesi, Işığa maruz kalırken oluşabilecek gürültü, Gürültü ekleme, Girdilerin kalite değişkenliğine daha fazla toleranslı olması, Yok sayılan görüntüler, Görüntünün parçalardaki olası kayıplarını kopyalanması, Gün içindeki ışık ve renk değişimim kontrolü]
+
+<br>
+
+
+**16. Remark: data is usually augmented on the fly during training.**
+
+&#10230; Not: Veriler genellikle eğitim sırasında artırılır.
+
+<br>
+
+
+**17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
+
+&#10230; Küme normalleştirme - Bu, {xi} kümesini normalleştiren, β hiperparametresinin bir adımıdır. μB ve σ2B'ye dikkat ederek, kümeyi düzeltmek istediklerimizin ortalaması ve varyansı şu şekilde yapılır:
+
+<br>
+
+
+**18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
+
+&#10230; Genellikle tam-tüm bağlı/evrişimli bir katmandan sonra ve doğrusal olmayan bir katmandan önce yapılır. Daha yüksek öğrenme oranlarına izin vermeyi ve başlangıç durumuna güçlü bir şekilde bağımlılığı azaltmayı amaçlar.
+
+<br>
+
+
+**19. Training a neural network**
+
+&#10230; Bir sinir ağının eğitilmesi
+
+<br>
+
+
+**20. Definitions**
+
+&#10230; Tanımlamalar
+
+<br>
+
+
+**21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
+
+&#10230; Epoch ― Bir modelin eğitimi kapsamında, modelin ağırlıklarını güncellemek için tüm eğitim setini kullandığı bir yinelemeye ifade etmek için kullanılan bir terimdir.
+
+<br>
+
+
+**22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
+
+&#10230; Mini-küme gradyan (bayır) iniş ― Eğitim aşamasında, ağırlıkların güncellenmesi genellikle hesaplama karmaşıklıkları nedeniyle bir kerede ayarlanan tüm eğitime veya gürültü sorunları nedeniyle bir veri noktasına dayanmaz. Bunun yerine, güncelleme adımı bir toplu işdeki veri noktalarının sayısının ayarlayabileceğimiz bir hiperparametre olduğu mini kümelerle yapılır. Veriler mini-kümeler halinde alınır.
+
+<br>
+
+
+**23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
+
+&#10230; Yitim fonksiyonu  ― Belirli bir modelin nasıl bir performans gösterdiğini ölçmek için, L yitim (kayıp) fonksiyonu genellikle y gerçek çıktıların, z model çıktıları tarafından ne kadar doğru tahmin edildiğini değerlendirmek için kullanılır.
+
+<br>
+
+
+**24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
+
+&#10230; Çapraz-entropi kaybı ― Yapay sinir ağlarında ikili sınıflandırma bağlamında, çapraz entropi kaybı L (z, y) yaygın olarak kullanılır ve şöyle tanımlanır:
+
+<br>
+
+
+**25. Finding optimal weights**
+
+&#10230; Optimum ağırlıkların bulunması
+
+<br>
+
+
+**26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
+
+&#10230; Geriye yayılım ― Geri yayılım, asıl çıktıyı ve istenen çıktıyı dikkate alarak sinir ağındaki ağırlıkları güncellemek için kullanılan bir yöntemdir. Her bir ağırlığa göre türev, zincir kuralı kullanılarak hesaplanır.
+
+<br>
+
+
+**27. Using this method, each weight is updated with the rule:**
+
+&#10230; Bu yöntemi kullanarak, her ağırlık kurala göre güncellenir:
+
+<br>
+
+
+**28. Updating weights ― In a neural network, weights are updated as follows:**
+
+&#10230; Ağırlıkların güncellenmesi ― Bir sinir ağında, ağırlıklar aşağıdaki gibi güncellenir:
+
+<br>
+
+
+**29. [Step 1: Bir küme eğitim verisi alın ve kaybı hesaplamak için ileriye doğru ilerleyin, Step 2: Her ağırlığa göre kaybın derecesini elde etmek için kaybı tekrar geriye doğru yayın, Step 3: Use the gradients to update the weights of the network.]**
+
+&#10230; [Adım 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Ağın ağırlıklarını güncellemek için gradyanları kullanın.]
+
+<br>
+
+
+**30. [Forward propagation, Backpropagation, Weights update]**
+
+&#10230; [İleri yayılım, Geriye yayılım, Ağırlıkların gğncellenmesi]
+
+<br>
+
+
+**31. Parameter tuning**
+
+&#10230; Parametre ayarlama
+
+<br>
+
+
+**32. Weights initialization**
+
+&#10230; Ağırlıkların başlangıçlandırılması
+
+<br>
+
+
+**33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
+
+&#10230; Xavier başlangıcı ―  Ağırlıkları tamamen rastgele bir şekilde başlatmak yerine, Xavier ilklendirme, mimariye özgü özellikleri dikkate alan ilk ağırlıkların alınmasını sağlar.
+
+<br>
+
+
+**34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
+
+&#10230; Transfer öğrenme ― Bir derin öğrenme modelini eğitmek çok fazla veri ve daha da önemlisi çok zaman gerektirir. Kullanım durumumuza yönelik eğitim yapmak ve güçlendirmek için günler/haftalar süren dev veri setleri üzerinde önceden eğitilmiş ağırlıklardan yararlanmak genellikle yararlıdır. Elimizdeki ne kadar veri olduğuna bağlı olarak, aşağıdakilerden yararlanmanın farklı yolları:
+
+<br>
+
+
+**35. [Training size, Illustration, Explanation]**
+
+&#10230; [Eğitim boyutu, Görselleştirme, Açıklama]
+
+<br>
+
+
+**36. [Small, Medium, Large]**
+
+&#10230; [Küçük, Orta, Büyük]
+
+<br>
+
+
+**37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
+
+&#10230; [Tüm katmanlar dondurulur, Softmax'taki ağırlıkları eğitilir, Çoğu katmanlar dondurulur, son katmanlar ve softmax katmanı ağırlıklar ile eğitilir, Önceden eğitilerek elde edilen ağırlıkları kullanarak katmanlar ve softmax için kullanır]
+
+<br>
+
+
+**38. Optimizing convergence**
+
+&#10230; Yakınsamayı optimize etmek
+
+<br>
+
+
+**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.
+**
+&#10230; Genellikle α veya bazen η olarak belirtilen öğrenme oranı, ağırlıkların hangi hızda güncellendiğini belirler. Sabitlenebilir veya uyarlanabilir şekilde değiştirilebilir. Mevcut en popüler yöntemin adı Adam'dır ve öğrenme hızını ayarlayan bir yöntemdir.
+
+<br>
+
+**40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
+
+&#10230; Uyarlanabilir öğrenmeoranları ― Bir modelin eğitilmesi sırasında öğrenme oranının değişmesine izin vermek eğitim süresini kısaltabilir ve sayısal optimum çözümü iyileştirebilir. Adam optimizasyonu yöntemi en çok kullanılan teknik olmasına rağmen, diğer yöntemler de faydalı olabilir. Bunlar aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**41. [Method, Explanation, Update of w, Update of b]**
+
+&#10230; [Yöntem, Açıklama, w'ların güncellenmesi, b'nin güncellenmesi]
+
+<br>
+
+
+**42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
+
+&#10230; [Momentum, Osilasyonların azaltılması/yumuşatılması, SGD (Stokastik Gradyan/Bayır İniş) iyileştirmesi, Ayarlanacak 2 parametre]
+
+<br>
+
+
+**43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
+
+&#10230; [RMSprop, Ortalama Karekök yayılımı, Osilasyonları kontrol ederek öğrenme algoritmasını hızlandırır]
+
+<br>
+
+
+**44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
+
+&#10230; [Adam, Uyarlamalı Moment tahmini/kestirimi, En popüler yöntem, Ayarlanacak 4 parametre]
+
+<br>
+
+
+**45. Remark: other methods include Adadelta, Adagrad and SGD.**
+
+&#10230; Not: diğer yöntemler içinde Adadelta, Adagrad ve SGD.
+
+<br>
+
+
+**46. Regularization**
+
+&#10230; Düzenlileştirme
+
+<br>
+
+
+**47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
+
+&#10230; Seyreltme ― Seyreltme, sinir ağlarında, p>0 olasılıklı nöronları silerek eğitim verilerinin fazla kullanılmaması için kullanılan bir tekniktir. Modeli, belirli özellik kümelerine çok fazla güvenmekten kaçınmaya zorlar.
+
+<br>
+
+
+**48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
+
+&#10230; Not: Çoğunlukla derin öğrenme kütüphanleri, 'keep' ('tutma') parametresi 1−p aracılığıyla seyreltmeyi parametrize eder.
+
+<br>
+
+
+**49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
+
+&#10230; Ağırlık düzenlileştirme ― Ağırlıkların çok büyük olmadığından ve modelin eğitim setine uygun olmadığından emin olmak için, genellikle model ağırlıklarında düzenlileştirme teknikleri uygulanır. Temel olanlar aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**50. [LASSO, Ridge, Elastic Net]**
+
+&#10230; [LASSO, Ridge, Elastic Net]
+
+<br>
+
+**50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230; bis. Katsayıları 0'a düşürür, Değişken seçimi için iyi, Katsayıları daha küçük yapar, Değişken seçimi ile küçük katsayılar arasında ödünleşim sağlar]
+
+<br>
+
+**51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
+
+&#10230; Erken durdurma ― Bu düzenleme tekniği, onaylama kaybı bir stabilliğe ulaştığında veya artmaya başladığında eğitim sürecini durdurur.
+
+<br>
+
+
+**52. [Error, Validation, Training, early stopping, Epochs]**
+
+&#10230; [Hata, Geçerleme/Doğrulama, Eğitim, erken durdurma, Epochs]
+
+<br>
+
+
+**53. Good practices**
+
+&#10230; İyi uygulamalar
+
+<br>
+
+
+**54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
+
+&#10230; Küçük kümelerin ezberlenmesi ― Bir modelde hata ayıklama yaparken, modelin mimarisinde büyük bir sorun olup olmadığını görmek için hızlı testler yapmak genellikle yararlıdır. Özellikle, modelin uygun şekilde eğitilebildiğinden emin olmak için, ezberleyecek mi diye görmek için ağ içinde bir mini küme ile eğitilir. Olmazsa, modelin normal boyutta bir eğitim setini bırakmadan, küçük bir kümeyi bile ezberleyecek kadar çok karmaşık ya da yeterince karmaşık olmadığı anlamına gelir. 
+
+<br>
+
+
+**55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
+
+&#10230; Gradyanların kontrolü ― Gradyan kontrolü, bir sinir ağının geriye doğru geçişinin uygulanması sırasında kullanılan bir yöntemdir. Analitik gradyanların değerini verilen noktalardaki sayısal gradyanlarla karşılaştırır ve doğruluk için bir kontrol rolü oynar.
+
+<br>
+
+
+**56. [Type, Numerical gradient, Analytical gradient]**
+
+&#10230; [Tip, Sayısal gradyan, Analitik gradyan]
+
+<br>
+
+
+**57. [Formula, Comments]**
+
+&#10230; [Formül, Açıklamalar]
+
+<br>
+
+
+**58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
+
+&#10230; [Maliyetli; Kayıp, boyut başına iki kere hesaplanmalı, Analitik uygulamanın doğruluğunu anlamak için kullanılır, Ne çok küçük (sayısal dengesizlik) ne de çok büyük (zayıf gradyan yaklaşımı) seçimi yapılmalı, bunun için ödünleşim gerekir]
+
+<br>
+
+
+**59. ['Exact' result, Direct computation, Used in the final implementation]**
+
+&#10230; ['Kesin' sonuç, Doğrudan hesaplama, Son uygulamada kullanılır]
+
+<br>
+
+
+**60. The Deep Learning cheatsheets are now available in [target language].
+
+&#10230; Derin Öğrenme el kitabı şimdi [hedef dilde] mevcuttur.
+
+**61. Original authors**
+
+&#10230; Orijinal yazarlar
+
+<br>
+
+**62.Translated by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından çevirildi
+
+<br>
+
+**63.Reviewed by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından gözden geçirildi
+
+<br>
+
+**64.View PDF version on GitHub**
+
+&#10230; GitHub'da PDF sürümünü görüntüleyin
+
+<br>
+
+**65.By X and Y**
+
+&#10230; X ve Y tarafından
+
+<br>

From 13bbe1c834623a7de4187e645a71a266edd80c1e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Sun, 27 Jan 2019 23:45:38 +0300
Subject: [PATCH 092/531] Update deep-learning-tips-and-tricks.md

---
 tr/deep-learning-tips-and-tricks.md | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/tr/deep-learning-tips-and-tricks.md b/tr/deep-learning-tips-and-tricks.md
index 1470a3875..871f2acd8 100644
--- a/tr/deep-learning-tips-and-tricks.md
+++ b/tr/deep-learning-tips-and-tricks.md
@@ -27,10 +27,6 @@
 
 **5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
 
-&#10230;
-
-**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
-
 &#10230; [Bir sinir ağının eğitilmesi, Epoch, Mini-küme, Çapraz-entropy yitimi (kaybı), Geriye yayılım, Gradyan (Bayır) iniş, Ağırlıkların güncellenmesi, Gradyan kontrolü]
 
 <br>

From d1c17f09c7b3fb199c43e50dcaf45b33904301e7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Mon, 28 Jan 2019 00:23:21 +0300
Subject: [PATCH 093/531] [tr] Convolutional neural networks

As well as I've completed the entire translation of this file.
Please note that the review process can be started. Thank you!
---
 tr/convolutional-neural-networks.md | 712 ++++++++++++++++++++++++++++
 1 file changed, 712 insertions(+)
 create mode 100644 tr/convolutional-neural-networks.md

diff --git a/tr/convolutional-neural-networks.md b/tr/convolutional-neural-networks.md
new file mode 100644
index 000000000..ac5660982
--- /dev/null
+++ b/tr/convolutional-neural-networks.md
@@ -0,0 +1,712 @@
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230; Evrişimli Sinir Ağları el kitabı
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Derin Öğrenme
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+&#10230; [Genel bakış, Mimari yapı]
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230; [Katman tipleri, Evrişim, Ortaklama, Tam bağlantı]
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230; [Filtre hiperparametreleri, Boyut, Adım aralığı/Adım kaydırma, Ekleme/Doldurma]
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230; [Hiperparametrelerin ayarlanması, Parametre uyumluluğu, Model karmaşıklığı, Receptive field]
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230; [Aktivasyon fonksiyonları, Düzeltilmiş Doğrusal Birim, Softmax]
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230; [Nesne algılama, Model tipleri, Algılama, Kesiştirilmiş Bölgeler, Maksimum olmayan bastırma, YOLO, R-CNN]
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230; [Yüz doğrulama/tanıma, Tek atış öğrenme, Siamese ağ, Üçlü yitim/kayıp]
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230; [Sinirsel stil aktarımı, Aktivasyon, Stil matrisi, Stil/içerik maliyet fonksiyonu]
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230; [İşlemsel püf nokta mimarileri, Çekişmeli Üretici Ağ, ResNet, Inception Ağı]
+
+<br>
+
+
+**12. Overview**
+
+&#10230; Genel bakış
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230; Geleneksel bir CNN (Evrişimli Sinir Ağı) mimarisi - CNN'ler olarak da bilinen evrişimli sinir ağları, genellikle aşağıdaki katmanlardan oluşan belirli bir tür sinir ağıdır:
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230; Evrişim katmanı ve ortaklama katmanı, sonraki bölümlerde açıklanan hiperparametreler ile ince ayar (fine-tuned) yapılabilir.
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230; Katman tipleri
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230; Evrişim katmanı (CONV) ― Evrişim katmanı (CONV) evrişim işlemlerini gerçekleştiren filtreleri, I girişini boyutlarına göre tararken kullanır. Hiperparametreleri F filtre boyutunu ve S adımını içerir. Elde edilen çıktı O, öznitelik haritası veya aktivasyon haritası olarak adlandırılır.
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230; Not: evrişim adımı, 1B ve 3B durumlarda da genelleştirilebilir (B: boyut).
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230; Ortaklama (POOL) - Ortaklama katmanı (POOL), tipik olarak bir miktar uzamsal değişkenlik gösteren bir evrişim katmanından sonra uygulanan bir örnekleme işlemidir. Özellikle, maksimum ve ortalama ortaklama, sırasıyla maksimum ve ortalama değerin alındığı özel ortaklama türleridir.
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230; [Tip, Amaç, Görsel Açıklama, Açıklama]
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230; [Maksimum ortaklama, Ortalama ortaklama, Her ortaklama işlemi, geçerli matrisin maksimum değerini seçer, Her ortaklama işlemi, geçerli matrisin değerlerinin ortalaması alır.]
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230; [Algılanan özellikleri korur, En çok kullanılan, Boyut azaltarak örneklenmiştelik öznitelik haritası, LeNet'te kullanılmış]
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230; Tam Bağlantı (FC) ― Tam bağlı katman (FC), her girişin tüm nöronlara bağlı olduğu bir giriş üzerinde çalışır. Eğer varsa, FC katmanları genellikle CNN mimarisinin sonuna doğru bulunur ve sınıf skorları gibi hedefleri optimize etmek için kullanılabilir.
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230; Hiperparametrelerin filtrelenmesi
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230; Evrişim katmanı, hiperparametrelerinin ardındaki anlamı bilmenin önemli olduğu filtreler içerir.
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230; Bir filtrenin boyutları - C kanalları içeren bir girişe uygulanan F×F boyutunda bir filtre, I×I×C boyutundaki bir girişte evrişim gerçekleştiren ve aynı zamanda bir çıkış özniteliği haritası üreten F aktivitesi (aktivasyon olarak da adlandırılır) O) O×O×1 boyutunda harita.
+
+<br>
+
+
+**26. Filter**
+
+&#10230; Filtre
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230; Not: F×F boyutunda K filtrelerinin uygulanması, O×O×K boyutunda bir çıktı öznitelik haritasının oluşmasını sağlar.
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230; Adım aralığı ― Evrişimli veya bir ortaklama işlemi için, S adımı (adım aralığı), her işlemden sonra pencerenin hareket ettiği piksel sayısını belirtir.
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230; Sıfır ekleme/doldurma ― Sıfır ekleme/doldurma, girişin sınırlarının her bir tarafına P sıfır ekleme işlemini belirtir. Bu değer manuel olarak belirlenebilir veya aşağıda detaylandırılan üç moddan biri ile otomatik olarak ayarlanabilir:
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230; [Mod, Değer, Görsel Açıklama, Amaç, Geçerli, Aynı, Tüm]
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230; [Ekleme/doldurma yok, Boyutlar uyuşmuyorsa son evrişimi düşürür, Öznitelik harita büyüklüğüne sahip ekleme/doldurma ⌈IS⌉, Çıktı boyutu matematiksel olarak uygundur, 'Yarım' ekleme olarak da bilinir, Son konvolüsyonların giriş sınırlarına uygulandığı maksimum ekleme, Filtre girişi uçtan uca "görür"]
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230; Hiperparametreleri ayarlama
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230; Evrişim katmanında parametre uyumu - Girdinin hacim büyüklüğü I uzunluğu, F filtresinin uzunluğu, P sıfır ekleme miktarı, S adım aralığı, daha sonra bu boyut boyunca öznitelik haritasının O çıkış büyüklüğü belirtilir:
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230; [Giriş, Filtre, Çıktı]
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230; Not: çoğunlukla, Pstart=Pend≜P, bu durumda Pstart+Pend'i yukarıdaki formülde 2P ile değiştirebiliriz.
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230; Modelin karmaşıklığını anlama - Bir modelin karmaşıklığını değerlendirmek için mimarisinin sahip olacağı parametrelerin sayısını belirlemek genellikle yararlıdır. Bir evrişimsli sinir ağının belirli bir katmanında, aşağıdaki şekilde yapılır:
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230; [Görsel Açıklama, Giriş boyutu, Çıkış boyutu, Parametre sayısı, Not]
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230; [Filtre başına bir bias(önyargı) parametresi, Çoğu durumda, S<F, K için ortak bir seçenek 2C'dir.]
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230; [Ortaklama işlemi kanal bazında yapılır, Çoğu durumda S=F]
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230; [Giriş bağlantılanmış, Nöron başına bir bias parametresi, tam bağlantı (FC) nöronlarının sayısı yapısal kısıtlamalardan arındırılmış]
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230; Evrişim sonucu oluşan haritanın boyutu ― K katmanında filtre çıkışı, k-inci aktivasyon haritasının her bir pikselinin 'görebileceği' girişin Rk×Rk olarak belirtilen alanını ifade eder. Fj, j ve Si katmanlarının filtre boyutu, i katmanının adım aralığı ve S0=1 (ilk adım aralığının 1 seçilmesi durumu) kuralıyla, k katmanındaki işlem sonucunda elde edilen aktivasyon haritasının boyutları bu formülle hesaplanabilir:
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230; Aşağıdaki örnekte, F1=F2=3 ve S1=S2=1 için R2=1+2⋅1+2⋅1=5 sonucu elde edilir.
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230; Yaygın olarak kullanılan aktivasyon fonksiyonları
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230; Düzeltilmiş Doğrusal Birim ― Düzeltilmiş doğrusal birim katmanı (ReLU), (g)'nin tüm elemanlarında kullanılan bir aktivasyon fonksiyonudur. Doğrusal olmamaları ile ağın öğrenmesi amaçlanmaktadır. Çeşitleri aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230;[ReLU, Sızıntı ReLU, ELU, ile]
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230; [Doğrusal olmama karmaşıklığı biyolojik olarak yorumlanabilir, Negatif değerler için ölen ReLU sorununu giderir, Her yerde türevlenebilir]
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230; Softmax ― Softmax adımı, x∈Rn skorlarının bir vektörünü girdi olarak alan ve mimarinin sonunda softmax fonksiyonundan p∈Rn çıkış olasılık vektörünü oluşturan genelleştirilmiş bir lojistik fonksiyon olarak görülebilir. Aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**48. where**
+
+&#10230; buna karşılık
+
+<br>
+
+
+**49. Object detection**
+
+&#10230; Nesne tanıma
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230; Model tipleri ― Burada, nesne tanıma algoritmasının doğası gereği 3 farklı kestirim türü vardır. Aşağıdaki tabloda açıklanmıştır:
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230; [Görüntü sınıflandırma, Sınıflandırma ve lokalizasyon (konumlama), Tanıma]
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230; [Oyuncak ayı, Kitap]
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230; [Bir görüntüyü sınıflandırır, Nesnenin olasılığını tahmin eder, Görüntüdeki bir nesneyi algılar/tanır, Nesnenin olasılığını ve bulunduğu yeri tahmin eder, Bir görüntüdeki birden fazla nesneyi algılar, Nesnelerin olasılıklarını ve nerede olduklarını tahmin eder]
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230; [Geleneksel CNN, Basitleştirilmiş YOLO (You-Only-Look-Once), R-CNN (R: Region - Bölge), YOLO, R-CNN]
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230; Tanıma ― Nesne tespiti bağlamında, nesneyi konumlandırmak veya görüntüdeki daha karmaşık bir şekli tespit etmek isteyip istemediğimize bağlı olarak farklı yöntemler kullanılır. İki ana tablo aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230; [Sınırlayıcı kutu ile tespit, Karakteristik nokta tanıma]
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230; [Görüntüde nesnenin bulunduğu yeri algılar, Bir nesnenin şeklini veya özelliklerini algılar (örneğin gözler), Daha ayrıntılı]
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230; [Kutu merkezi (bx,by), yükseklik bh ve genişlik bw, Referans noktalar (l1x,l1y), ..., (lnx,lny)]
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230; Kesiştirilmiş Bölgeler - Kesiştirilmiş Bölgeler, IoU (Intersection over Union) olarak da bilinir, Birleştirilmiş sınırlama kutusu, tahmin edilen sınırlama kutusu (Bp) ile gerçek sınırlama kutusu Ba üzerinde ne kadar doğru konumlandırıldığını ölçen bir fonksiyondur. Olarak tanımlanır:
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230; Not: Her zaman IoU∈ [0,1] ile başlarız. Kural olarak, Öngörülen bir sınırlama kutusu Bp, IoU (Bp, Ba)⩾0.5 olması durumunda makul derecede iyi olarak kabul edilir.
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230; Öneri (Anchor) kutular, örtüşen sınırlayıcı kutuları öngörmek için kullanılan bir tekniktir. Uygulamada, ağın aynı anda birden fazla kutuyu tahmin etmesine izin verilir, burada her kutu tahmini belirli bir geometrik öznitelik setine sahip olmakla sınırlıdır. Örneğin, ilk tahmin potansiyel olarak verilen bir formun dikdörtgen bir kutusudur, ikincisi ise farklı bir geometrik formun başka bir dikdörtgen kutusudur.
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230; Maksimum olmayan bastırma - Maksimum olmayan bastırma tekniği, nesne için yinelenen ve örtüşen öneri kutuları içinde en uygun temsilleri seçerek örtüşmesi düşük olan kutuları kaldırmayı amaçlar. Olasılık tahmini 0.6'dan daha düşük olan tüm kutuları çıkardıktan sonra, kalan kutular ile aşağıdaki adımlar tekrarlanır:
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230; [Verilen bir sınıf için, Adım 1: En büyük tahmin olasılığı olan kutuyu seçin., Adım 2: Önceki kutuyla IoU⩾0.5 olan herhangi bir kutuyu çıkarın.]
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230; [Kutu tahmini/kestirimi, Maksimum olasılığa göre kutu seçimi, Aynı sınıf için örtüşme kaldırma, Son sınırlama kutuları]
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230; YOLO ― You Only Look Once (YOLO), aşağıdaki adımları uygulayan bir nesne algılama algoritmasıdır:
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230; [Adım 1: Giriş görüntüsünü G×G kare parçalara (hücrelere) bölün., Adım 2: Her bir hücre için, aşağıdaki formdan y'yi öngören bir CNN çalıştırın: k kez tekrarlayın]
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230; pc'nin bir nesneyi algılama olasılığı olduğu durumlarda, bx, by, bh, bw tespit edilen olası sınırlayıcı kutusunun özellikleridir, cl, ..., cp, p sınıflarının tespit edilen one-hot temsildir ve k öneri (anchor) kutularının sayısıdır.
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230; Adım3: Potansiyel yineli çakışan sınırlayıcı kutuları kaldırmak için maksimum olmayan bastırma algoritmasını çalıştır.
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230; [Orijinal görüntü, GxG kare parçalara (hücrelere) bölünmesi, Sınırlayıcı kutu kestirimi, Maksimum olmayan bastırma]
+ 
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230; Not: pc=0 olduğunda, ağ herhangi bir nesne algılamamaktadır. Bu durumda, ilgili bx, ..., cp tahminleri dikkate alınmamalıdır.
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230; R-CNN - Evrişimli Sinir Ağları ile Bölge Bulma (R-CNN), potansiyel olarak sınırlayıcı kutuları bulmak için görüntüyü bölütleyen (segmente eden) ve daha sonra sınırlayıcı kutularda en olası nesneleri bulmak için algılama algoritmasını çalıştıran bir nesne algılama algoritmasıdır.
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230; [Orijinal görüntü, Bölütleme (Segmentasyon), Sınırlayıcu kutu kestirimi, Maksimum olmayan bastırma]
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230; Not: Orijinal algoritma hesaplamalı olarak maliyetli ve yavaş olmasına rağmen, yeni mimariler algoritmanın Hızlı R-CNN ve Daha Hızlı R-CNN gibi daha hızlı çalışmasını sağlamıştır.
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230; Yüz doğrulama ve tanıma
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230; Model tipleri ― İki temel model aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230; [Yüz doğrulama, Yüz tanıma, Sorgu, Kaynak, Veri tabanı]
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230; [Bu doğru kişi mi?, Bire bir arama, Veritabanındaki K kişilerden biri mi?, Bire-çok arama]
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230; Tek Atış (Onr-Shot) Öğrenme - Tek Atış Öğrenme, verilen iki görüntünün ne kadar farklı olduğunu belirleyen benzerlik fonksiyonunu öğrenmek için sınırlı bir eğitim seti kullanan bir yüz doğrulama algoritmasıdır. İki resme uygulanan benzerlik fonksiyonu sıklıkla kaydedilir (resim 1, resim 2).
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230; Siyam (Siamese) Ağı - Siyam Ağı, iki görüntünün ne kadar farklı olduğunu ölçmek için görüntülerin nasıl kodlanacağını öğrenmeyi amaçlar. Belirli bir giriş görüntüsü x(i) için kodlanmış çıkış genellikle f(x(i)) olarak alınır.
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230; Üçlü kayıp - Üçlü kayıp ℓ, A (öneri), P (pozitif) ve N (negatif) görüntülerinin üçlüsünün gömülü gösterimde hesaplanan bir kayıp fonksiyonudur. Öneri ve pozitif örnek aynı sınıfa aitken, negatif örnek bir diğerine aittir. α∈R+ marjın parametresini çağırarak, bu kayıp aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230; Sinirsel stil transferi (aktarımı)
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230; Motivasyon ― Sinirsel stil transferinin amacı, verilen bir C içeriğine ve verilen bir S stiline dayanan bir G görüntüsü oluşturmaktır.
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230; [İçerik C, Stil S, Oluşturulan görüntü G]
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230; Aktivasyon ― Belirli bir l katmanında, aktivasyon [l] olarak gösterilir ve nH×nw×nc boyutlarındadır
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230; İçerik maliyeti fonksiyonu ― İçerik maliyeti fonksiyonu Jcontent(C,G), G oluşturulan görüntüsünün, C orijinal içerik görüntüsünden ne kadar farklı olduğunu belirlemek için kullanılır.Aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230; Stil matrisi - Stil matrisi G[l], belirli bir l katmanının her birinin G[l]kk′ elemanlarının k ve k′ kanallarının ne kadar ilişkili olduğunu belirlediği bir Gram matristir. A[l] aktivasyonlarına göre aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230; Not: Stil görüntüsü ve oluşturulan görüntü için stil matrisi, sırasıyla G[l] (S) ve G[l] (G) olarak belirtilmiştir.
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230; Stil maliyeti fonksiyonu - Stil maliyeti fonksiyonu Jstyle(S,G), oluşturulan G görüntüsünün S stilinden ne kadar farklı olduğunu belirlemek için kullanılır. Aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230; Genel maliyet fonksiyonu - Genel maliyet fonksiyonu, α, β parametreleriyle ağırlıklandırılan içerik ve stil maliyet fonksiyonlarının bir kombinasyonu olarak tanımlanır:
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230; Not: yüksek bir α değeri modelin içeriğe daha fazla önem vermesini sağlarken, yüksek bir β değeri de stile önem verir.
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230; Hesaplama ipuçları kullanan mimariler
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230; Çekişmeli Üretici Ağlar - GAN olarak da bilinen çekişmeli üretici ağlar, modelin üretici denen ve gerçek imajı ayırt etmeyi amaçlayan ayırıcıya beslenecek en doğru çıktının oluşturulmasını amaçladığı üretici ve ayırt edici bir modelden oluşur.
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230; [Eğitim, Gürültü, Gerçek dünya görüntüsü, Üretici, Ayırıcı, Gerçek Sahte]
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230; Not: GAN'ın kullanım alanları, yazıdan görüntüye, müzik üretimi ve sentezi.
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230; ResNet ― Artık Ağ mimarisi (ResNet olarak da bilinir), eğitim hatasını azaltmak için çok sayıda katman içeren artık bloklar kullanır. Artık blok aşağıdaki karakterizasyon denklemine sahiptir:
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230; Inception Ağ ― Bu mimari inception modüllerini kullanır ve özelliklerini çeşitlendirme yoluyla performansını artırmak için farklı evrişim kombinasyonları denemeyi amaçlamaktadır. Özellikle, hesaplama yükünü sınırlamak için 1x1 evrişm hilesini kullanır.
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230; Derinöğrenme el kitabı artık kullanıma hazır [hedef dilde].
+
+<br>
+
+
+**98. Original authors**
+
+&#10230; Orijinal yazarlar
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından çevirildi
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından kontrol edildi
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230; GitHub'da PDF sürümünü görüntüleyin
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230; X ve Y ile
+
+<br>

From ef24c10bc018f2f7e1416fc0b6e80f1b78537db8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Mon, 28 Jan 2019 23:45:06 +0300
Subject: [PATCH 094/531] Update convolutional-neural-networks.md

@shervinea , I made arrangements according to comments. Thank you @yavuzKomecoglu !
---
 tr/convolutional-neural-networks.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tr/convolutional-neural-networks.md b/tr/convolutional-neural-networks.md
index ac5660982..e1fd03e51 100644
--- a/tr/convolutional-neural-networks.md
+++ b/tr/convolutional-neural-networks.md
@@ -336,7 +336,7 @@
 
 **49. Object detection**
 
-&#10230; Nesne tanıma
+&#10230; Nesne algılama
 
 <br>
 
@@ -350,7 +350,7 @@
 
 **51. [Image classification, Classification w. localization, Detection]**
 
-&#10230; [Görüntü sınıflandırma, Sınıflandırma ve lokalizasyon (konumlama), Tanıma]
+&#10230; [Görüntü sınıflandırma, Sınıflandırma ve lokalizasyon (konumlama), Algılama]
 
 <br>
 
@@ -378,14 +378,14 @@
 
 **55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
 
-&#10230; Tanıma ― Nesne tespiti bağlamında, nesneyi konumlandırmak veya görüntüdeki daha karmaşık bir şekli tespit etmek isteyip istemediğimize bağlı olarak farklı yöntemler kullanılır. İki ana tablo aşağıdaki tabloda özetlenmiştir:
+&#10230; Algılama ― Nesne algılama bağlamında, nesneyi konumlandırmak veya görüntüdeki daha karmaşık bir şekli tespit etmek isteyip istemediğimize bağlı olarak farklı yöntemler kullanılır. İki ana tablo aşağıdaki tabloda özetlenmiştir:
 
 <br>
 
 
 **56. [Bounding box detection, Landmark detection]**
 
-&#10230; [Sınırlayıcı kutu ile tespit, Karakteristik nokta tanıma]
+&#10230; [Sınırlayıcı kutu ile tespit, Karakteristik nokta algılama]
 
 <br>
 

From df852b21069179bc35f5058dd6005992070f6e48 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Mon, 28 Jan 2019 23:53:25 +0300
Subject: [PATCH 095/531] Update deep-learning-tips-and-tricks.md

Updated according to review.
---
 tr/deep-learning-tips-and-tricks.md | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/tr/deep-learning-tips-and-tricks.md b/tr/deep-learning-tips-and-tricks.md
index 871f2acd8..e5ac0acd9 100644
--- a/tr/deep-learning-tips-and-tricks.md
+++ b/tr/deep-learning-tips-and-tricks.md
@@ -27,7 +27,7 @@
 
 **5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
 
-&#10230; [Bir sinir ağının eğitilmesi, Epoch, Mini-küme, Çapraz-entropy yitimi (kaybı), Geriye yayılım, Gradyan (Bayır) iniş, Ağırlıkların güncellenmesi, Gradyan kontrolü]
+&#10230; [Bir sinir ağının eğitilmesi, Dönem (Epok), Mini-küme, Çapraz-entropy yitimi (kaybı), Geriye yayılım, Gradyan (Bayır) iniş, Ağırlıkların güncellenmesi, Gradyan (Bayır) kontrolü]
 
 <br>
 
@@ -83,7 +83,7 @@
 
 **13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
 
-&#10230; [Herhangi bir değişiklik yapmamış resim, Görüntünün anlamının korunduğu bir eksene göre çevrilmiş görüntü, Hafif açılı döndürme, Yanlış yatay kalibrasyonu simule eder, Görüntünün bir bölümüne rastgele odaklanma, Arka arkaya birkaç rasgele kesme yapılabilir]
+&#10230; [Herhangi bir değişiklik yapılmamış görüntü, Görüntünün anlamının korunduğu bir eksene göre çevrilmiş görüntü, Hafif açılı döndürme, Yanlış yatay kalibrasyonu simule eder, Görüntünün bir bölümüne rastgele odaklanma, Arka arkaya birkaç rasgele kesme yapılabilir]
 
 <br>
 
@@ -139,7 +139,7 @@
 
 **21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
 
-&#10230; Epoch ― Bir modelin eğitimi kapsamında, modelin ağırlıklarını güncellemek için tüm eğitim setini kullandığı bir yinelemeye ifade etmek için kullanılan bir terimdir.
+&#10230; Dönem (Epok/Epoch) ― Bir modelin eğitimi kapsamında, modelin ağırlıklarını güncellemek için tüm eğitim setini kullandığı bir yinelemeye ifade etmek için kullanılan bir terimdir.
 
 <br>
 
@@ -193,16 +193,17 @@
 <br>
 
 
-**29. [Step 1: Bir küme eğitim verisi alın ve kaybı hesaplamak için ileriye doğru ilerleyin, Step 2: Her ağırlığa göre kaybın derecesini elde etmek için kaybı tekrar geriye doğru yayın, Step 3: Use the gradients to update the weights of the network.]**
+**29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
+
+&#10230; [Adım 1: Bir küme eğitim verisi alın ve kaybı hesaplamak için ileriye doğru ilerleyin, Step 2: Her ağırlığa göre kaybın derecesini elde etmek için kaybı tekrar geriye doğru yayın, Adım 3: Ağın ağırlıklarını güncellemek için gradyanları kullanın.]
 
-&#10230; [Adım 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Ağın ağırlıklarını güncellemek için gradyanları kullanın.]
 
 <br>
 
 
 **30. [Forward propagation, Backpropagation, Weights update]**
 
-&#10230; [İleri yayılım, Geriye yayılım, Ağırlıkların gğncellenmesi]
+&#10230; [İleri yayılım, Geriye yayılım, Ağırlıkların güncellenmesi]
 
 <br>
 
@@ -223,7 +224,7 @@
 
 **33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
 
-&#10230; Xavier başlangıcı ―  Ağırlıkları tamamen rastgele bir şekilde başlatmak yerine, Xavier ilklendirme, mimariye özgü özellikleri dikkate alan ilk ağırlıkların alınmasını sağlar.
+&#10230; Xavier başlangıcı (ilklendirme) ― Ağırlıkları tamamen rastgele bir şekilde başlatmak yerine, Xavier başlangıcı, mimariye özgü özellikleri dikkate alan ilk ağırlıkların alınmasını sağlar.
 
 <br>
 
@@ -271,7 +272,7 @@
 
 **40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
 
-&#10230; Uyarlanabilir öğrenmeoranları ― Bir modelin eğitilmesi sırasında öğrenme oranının değişmesine izin vermek eğitim süresini kısaltabilir ve sayısal optimum çözümü iyileştirebilir. Adam optimizasyonu yöntemi en çok kullanılan teknik olmasına rağmen, diğer yöntemler de faydalı olabilir. Bunlar aşağıdaki tabloda özetlenmiştir:
+&#10230; Uyarlanabilir öğrenme oranları ― Bir modelin eğitilmesi sırasında öğrenme oranının değişmesine izin vermek eğitim süresini kısaltabilir ve sayısal optimum çözümü iyileştirebilir. Adam optimizasyonu yöntemi en çok kullanılan teknik olmasına rağmen, diğer yöntemler de faydalı olabilir. Bunlar aşağıdaki tabloda özetlenmiştir:
 
 <br>
 

From 47b748aef5123b5c4f4d33e03d6d61dcf218a259 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Mon, 28 Jan 2019 23:56:06 +0300
Subject: [PATCH 096/531] Update deep-learning-tips-and-tricks.md

Updated according to reviews.
---
 tr/deep-learning-tips-and-tricks.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tr/deep-learning-tips-and-tricks.md b/tr/deep-learning-tips-and-tricks.md
index e5ac0acd9..8bc96d387 100644
--- a/tr/deep-learning-tips-and-tricks.md
+++ b/tr/deep-learning-tips-and-tricks.md
@@ -48,7 +48,7 @@
 
 **8. [Good practices, Overfitting small batch, Gradient checking]**
 
-&#10230; [İyi örnekler, Küçük kümelerin ezberlenmesi, Gradyanların kontrolü]
+&#10230; [İyi örnekler, Küçük kümelerin aşırı öğrenmesi, Gradyan kontrolü]
 
 <br>
 
@@ -90,7 +90,7 @@
 
 **14. [Color shift, Noise addition, Information loss, Contrast change]**
 
-&#10230; [Renk kaydırma, Gürültü ekleme, Bilgi kaybı, Kontrast değişimi]
+&#10230; [Renk değişimi, Gürültü ekleme, Bilgi kaybı, Kontrast değişimi]
 
 <br>
 
@@ -264,9 +264,9 @@
 <br>
 
 
-**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.
-**
-&#10230; Genellikle α veya bazen η olarak belirtilen öğrenme oranı, ağırlıkların hangi hızda güncellendiğini belirler. Sabitlenebilir veya uyarlanabilir şekilde değiştirilebilir. Mevcut en popüler yöntemin adı Adam'dır ve öğrenme hızını ayarlayan bir yöntemdir.
+**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
+
+&#10230; Öğrenme oranı (adımı) ― Genellikle α veya bazen η olarak belirtilen öğrenme oranı, ağırlıkların hangi hızda güncellendiğini belirler. Sabitlenebilir veya uyarlanabilir şekilde değiştirilebilir. Mevcut en popüler yöntemin adı Adam'dır ve öğrenme hızını ayarlayan bir yöntemdir.
 
 <br>
 
@@ -348,7 +348,7 @@
 
 **50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
-&#10230; bis. Katsayıları 0'a düşürür, Değişken seçimi için iyi, Katsayıları daha küçük yapar, Değişken seçimi ile küçük katsayılar arasında ödünleşim sağlar]
+&#10230; [Katsayıları 0'a düşürür, Değişken seçimi için iyi, Katsayıları daha küçük yapar, Değişken seçimi ile küçük katsayılar arasında ödünleşim sağlar]
 
 <br>
 

From 33c5945f1d76eb972298e1376317cba5f82a9d11 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 28 Jan 2019 14:05:18 -0800
Subject: [PATCH 097/531] Add [tr] contributors

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 041b73cb1..432e08347 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -91,6 +91,9 @@
   Tiago Danin (review of unsupervised learning)
 
 --tr
+  Ayyüce Kızrak (translation of convolutional neural networks)
+  Yavuz Kömeçoğlu (review of convolutional neural networks)
+
   Ekrem Çetinkaya (translation of deep learning)
   Omer Bukte (review of deep learning)
   

From ead1d752a7f74c121cf01a42f2a87beb1f2eb499 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 28 Jan 2019 14:08:17 -0800
Subject: [PATCH 098/531] Add more [tr] contributors

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 432e08347..605ae694c 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -97,6 +97,9 @@
   Ekrem Çetinkaya (translation of deep learning)
   Omer Bukte (review of deep learning)
   
+  Ayyüce Kızrak (translation of deep learning tips and tricks)
+  Yavuz Kömeçoğlu (review of deep learning tips and tricks)
+  
   Kadir Tekeli (translation of linear algebra)
   Ekrem Çetinkaya (review of linear algebra)
   

From 80db8285c943c381ffbcfbcd96e9f6436fb49f2d Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 28 Jan 2019 14:10:16 -0800
Subject: [PATCH 099/531] Add [tr] contributors

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 605ae694c..039dcddc9 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -110,6 +110,9 @@
   Ayyüce Kızrak (translation of probabilities and statistics)
   Başak Buluz (review of probabilities and statistics)
   
+  Ayyüce Kızrak (translation of recurrent neural networks)
+  Yavuz Kömeçoğlu (review of recurrent neural networks)
+  
   Başak Buluz (translation of supervised learning)
   Ayyüce Kızrak (review of supervised learning)
   

From 4372c9633a8c7a044f4e6a38d86db1e0f1867095 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 28 Jan 2019 14:13:09 -0800
Subject: [PATCH 100/531] Turkish translation finished

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index fe1999f98..9b97cb36b 100644
--- a/README.md
+++ b/README.md
@@ -43,9 +43,9 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/117)|not started|not started|
-|Recurrent Neural Nets|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/120)|not started|not started|
-|DL tips and tricks|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/116)|not started|not started|
+|Convolutional Neural Nets|not started|not started|not started|done|not started|not started|
+|Recurrent Neural Nets|not started|not started|not started|done|not started|not started|
+|DL tips and tricks|not started|not started|not started|done|not started|not started|
 
 
 |Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|

From c30ef3ee4a881a41d6cfb4ec17fdd981b276fcc7 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Tue, 29 Jan 2019 23:54:56 +0300
Subject: [PATCH 101/531] Update CONTRIBUTORS

The translation of the translation of recurrent neural networks was done by @basakbuluz , so I corrected it.
Thank you!
---
 CONTRIBUTORS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 039dcddc9..ddcedb310 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -110,7 +110,7 @@
   Ayyüce Kızrak (translation of probabilities and statistics)
   Başak Buluz (review of probabilities and statistics)
   
-  Ayyüce Kızrak (translation of recurrent neural networks)
+  Başak Buluz (translation of recurrent neural networks)
   Yavuz Kömeçoğlu (review of recurrent neural networks)
   
   Başak Buluz (translation of supervised learning)

From 001b982335bdbcf4a3394d77279e8c2ba30ed255 Mon Sep 17 00:00:00 2001
From: Erfan Noury <erfannoury@users.noreply.github.com>
Date: Wed, 6 Feb 2019 00:32:18 -0500
Subject: [PATCH 102/531] Add CS230 templates to fa folder

---
 fa/convolutional-neural-networks.md | 716 ++++++++++++++++++++++++++++
 fa/deep-learning-tips-and-tricks.md | 457 ++++++++++++++++++
 fa/recurrent-neural-networks.md     | 677 ++++++++++++++++++++++++++
 3 files changed, 1850 insertions(+)
 create mode 100644 fa/convolutional-neural-networks.md
 create mode 100644 fa/deep-learning-tips-and-tricks.md
 create mode 100644 fa/recurrent-neural-networks.md

diff --git a/fa/convolutional-neural-networks.md b/fa/convolutional-neural-networks.md
new file mode 100644
index 000000000..1b1283628
--- /dev/null
+++ b/fa/convolutional-neural-networks.md
@@ -0,0 +1,716 @@
+**Convolutional Neural Networks translation**
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230;
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230;
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+&#10230;
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230;
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230;
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230;
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230;
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230;
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230;
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230;
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230;
+
+<br>
+
+
+**12. Overview**
+
+&#10230;
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230;
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230;
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230;
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230;
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230;
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230;
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230;
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230;
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230;
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230;
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230;
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230;
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230;
+
+<br>
+
+
+**26. Filter**
+
+&#10230;
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230;
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230;
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230;
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230;
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230;
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230;
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230;
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230;
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230;
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230;
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230;
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230;
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230;
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230;
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230;
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230;
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230;
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230;
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230;
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230;
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**48. where**
+
+&#10230;
+
+<br>
+
+
+**49. Object detection**
+
+&#10230;
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230;
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230;
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230;
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230;
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230;
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230;
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230;
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230;
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230;
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230;
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230;
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230;
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230;
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230;
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230;
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230;
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230;
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230;
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230;
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230;
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230;
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230;
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230;
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230;
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230;
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230;
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230;
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230;
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230;
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230;
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230;
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230;
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230;
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230;
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230;
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230;
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230;
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230;
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230;
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230;
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230;
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230;
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230;
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+
+**98. Original authors**
+
+&#10230;
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230;
+
+<br>
diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
new file mode 100644
index 000000000..347234ec2
--- /dev/null
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -0,0 +1,457 @@
+**Deep Learning Tips and Tricks translation**
+
+<br>
+
+**1. Deep Learning Tips and Tricks cheatsheet**
+
+&#10230;
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230;
+
+<br>
+
+
+**3. Tips and tricks**
+
+&#10230;
+
+<br>
+
+
+**4. [Data processing, Data augmentation, Batch normalization]**
+
+&#10230;
+
+<br>
+
+
+**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
+
+&#10230;
+
+<br>
+
+
+**6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
+
+&#10230;
+
+<br>
+
+
+**7. [Regularization, Dropout, Weight regularization, Early stopping]**
+
+&#10230;
+
+<br>
+
+
+**8. [Good practices, Overfitting small batch, Gradient checking]**
+
+&#10230;
+
+<br>
+
+
+**9. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**10. Data processing**
+
+&#10230;
+
+<br>
+
+
+**11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
+
+&#10230;
+
+<br>
+
+
+**12. [Original, Flip, Rotation, Random crop]**
+
+&#10230;
+
+<br>
+
+
+**13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
+
+&#10230;
+
+<br>
+
+
+**14. [Color shift, Noise addition, Information loss, Contrast change]**
+
+&#10230;
+
+<br>
+
+
+**15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
+
+&#10230;
+
+<br>
+
+
+**16. Remark: data is usually augmented on the fly during training.**
+
+&#10230;
+
+<br>
+
+
+**17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
+
+&#10230;
+
+<br>
+
+
+**18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
+
+&#10230;
+
+<br>
+
+
+**19. Training a neural network**
+
+&#10230;
+
+<br>
+
+
+**20. Definitions**
+
+&#10230;
+
+<br>
+
+
+**21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
+
+&#10230;
+
+<br>
+
+
+**22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
+
+&#10230;
+
+<br>
+
+
+**23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
+
+&#10230;
+
+<br>
+
+
+**24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**25. Finding optimal weights**
+
+&#10230;
+
+<br>
+
+
+**26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
+
+&#10230;
+
+<br>
+
+
+**27. Using this method, each weight is updated with the rule:**
+
+&#10230;
+
+<br>
+
+
+**28. Updating weights ― In a neural network, weights are updated as follows:**
+
+&#10230;
+
+<br>
+
+
+**29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
+
+&#10230;
+
+<br>
+
+
+**30. [Forward propagation, Backpropagation, Weights update]**
+
+&#10230;
+
+<br>
+
+
+**31. Parameter tuning**
+
+&#10230;
+
+<br>
+
+
+**32. Weights initialization**
+
+&#10230;
+
+<br>
+
+
+**33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
+
+&#10230;
+
+<br>
+
+
+**34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
+
+&#10230;
+
+<br>
+
+
+**35. [Training size, Illustration, Explanation]**
+
+&#10230;
+
+<br>
+
+
+**36. [Small, Medium, Large]**
+
+&#10230;
+
+<br>
+
+
+**37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
+
+&#10230;
+
+<br>
+
+
+**38. Optimizing convergence**
+
+&#10230;
+
+<br>
+
+
+**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.
+**
+
+&#10230;
+
+<br>
+
+
+**40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**41. [Method, Explanation, Update of w, Update of b]**
+
+&#10230;
+
+<br>
+
+
+**42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
+
+&#10230;
+
+<br>
+
+
+**43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
+
+&#10230;
+
+<br>
+
+
+**44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
+
+&#10230;
+
+<br>
+
+
+**45. Remark: other methods include Adadelta, Adagrad and SGD.**
+
+&#10230;
+
+<br>
+
+
+**46. Regularization**
+
+&#10230;
+
+<br>
+
+
+**47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
+
+&#10230;
+
+<br>
+
+
+**48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
+
+&#10230;
+
+<br>
+
+
+**49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**50. [LASSO, Ridge, Elastic Net]**
+
+&#10230;
+
+<br>
+
+**50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230;
+
+<br>
+
+**51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
+
+&#10230;
+
+<br>
+
+
+**52. [Error, Validation, Training, early stopping, Epochs]**
+
+&#10230;
+
+<br>
+
+
+**53. Good practices**
+
+&#10230;
+
+<br>
+
+
+**54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
+
+&#10230;
+
+<br>
+
+
+**55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
+
+&#10230;
+
+<br>
+
+
+**56. [Type, Numerical gradient, Analytical gradient]**
+
+&#10230;
+
+<br>
+
+
+**57. [Formula, Comments]**
+
+&#10230;
+
+<br>
+
+
+**58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
+
+&#10230;
+
+<br>
+
+
+**59. ['Exact' result, Direct computation, Used in the final implementation]**
+
+&#10230;
+
+<br>
+
+
+**60. The Deep Learning cheatsheets are now available in [target language].
+
+&#10230;
+
+
+**61. Original authors**
+
+&#10230;
+
+<br>
+
+**62.Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**63.Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**64.View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+**65.By X and Y**
+
+&#10230;
+
+<br>
diff --git a/fa/recurrent-neural-networks.md b/fa/recurrent-neural-networks.md
new file mode 100644
index 000000000..191e400a1
--- /dev/null
+++ b/fa/recurrent-neural-networks.md
@@ -0,0 +1,677 @@
+**Recurrent Neural Networks translation**
+
+<br>
+
+**1. Recurrent Neural Networks cheatsheet**
+
+&#10230;
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230;
+
+<br>
+
+
+**3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
+
+&#10230;
+
+<br>
+
+
+**4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
+
+&#10230;
+
+<br>
+
+
+**5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
+
+&#10230;
+
+<br>
+
+
+**6. [Comparing words, Cosine similarity, t-SNE]**
+
+&#10230;
+
+<br>
+
+
+**7. [Language model, n-gram, Perplexity]**
+
+&#10230;
+
+<br>
+
+
+**8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
+
+&#10230;
+
+<br>
+
+
+**9. [Attention, Attention model, Attention weights]**
+
+&#10230;
+
+<br>
+
+
+**10. Overview**
+
+&#10230;
+
+<br>
+
+
+**11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
+
+&#10230;
+
+<br>
+
+
+**12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
+
+&#10230;
+
+<br>
+
+
+**13. and**
+
+&#10230;
+
+<br>
+
+
+**14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
+
+&#10230;
+
+<br>
+
+
+**15. The pros and cons of a typical RNN architecture are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
+
+&#10230;
+
+<br>
+
+
+**17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
+
+&#10230;
+
+<br>
+
+
+**18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**19. [Type of RNN, Illustration, Example]**
+
+&#10230;
+
+<br>
+
+
+**20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
+
+&#10230;
+
+<br>
+
+
+**21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
+
+&#10230;
+
+<br>
+
+
+**22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
+
+&#10230;
+
+<br>
+
+
+**23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
+
+&#10230;
+
+<br>
+
+
+**24. Handling long term dependencies**
+
+&#10230;
+
+<br>
+
+
+**25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
+
+&#10230;
+
+<br>
+
+
+**26. [Sigmoid, Tanh, RELU]**
+
+&#10230;
+
+<br>
+
+
+**27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
+
+&#10230;
+
+<br>
+
+
+**28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
+
+&#10230;
+
+<br>
+
+
+**29. clipped**
+
+&#10230;
+
+<br>
+
+
+**30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
+
+&#10230;
+
+<br>
+
+
+**31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**32. [Type of gate, Role, Used in]**
+
+&#10230;
+
+<br>
+
+
+**33. [Update gate, Relevance gate, Forget gate, Output gate]**
+
+&#10230;
+
+<br>
+
+
+**34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
+
+&#10230;
+
+<br>
+
+
+**35. [LSTM, GRU]**
+
+&#10230;
+
+<br>
+
+
+**36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
+
+&#10230;
+
+<br>
+
+
+**37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
+
+&#10230;
+
+<br>
+
+
+**38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
+
+&#10230;
+
+<br>
+
+
+**39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
+
+&#10230;
+
+<br>
+
+
+**40. [Bidirectional (BRNN), Deep (DRNN)]**
+
+&#10230;
+
+<br>
+
+
+**41. Learning word representation**
+
+&#10230;
+
+<br>
+
+
+**42. In this section, we note V the vocabulary and |V| its size.**
+
+&#10230;
+
+<br>
+
+
+**43. Motivation and notations**
+
+&#10230;
+
+<br>
+
+
+**44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**45. [1-hot representation, Word embedding]**
+
+&#10230;
+
+<br>
+
+
+**46. [teddy bear, book, soft]**
+
+&#10230;
+
+<br>
+
+
+**47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
+
+&#10230;
+
+<br>
+
+
+**48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
+
+&#10230;
+
+<br>
+
+
+**49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
+
+&#10230;
+
+<br>
+
+
+**50. Word embeddings**
+
+&#10230;
+
+<br>
+
+
+**51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
+
+&#10230;
+
+<br>
+
+
+**52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
+
+&#10230;
+
+<br>
+
+
+**53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
+
+&#10230;
+
+<br>
+
+
+**54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
+
+&#10230;
+
+<br>
+
+
+**55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
+
+&#10230;
+
+<br>
+
+
+**56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
+
+&#10230;
+
+<br>
+
+
+**57. Remark: this method is less computationally expensive than the skip-gram model.**
+
+&#10230;
+
+<br>
+
+
+**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
+
+&#10230;
+
+<br>
+
+
+**58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
+Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
+
+&#10230;
+
+<br>
+
+
+**59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
+
+&#10230;
+
+<br>
+
+
+**60. Comparing words**
+
+&#10230;
+
+<br>
+
+
+**61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
+
+&#10230;
+
+<br>
+
+
+**62. Remark: θ is the angle between words w1 and w2.**
+
+&#10230;
+
+<br>
+
+
+**63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
+
+&#10230;
+
+<br>
+
+
+**64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
+
+&#10230;
+
+<br>
+
+
+**65. Language model**
+
+&#10230;
+
+<br>
+
+
+**66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
+
+&#10230;
+
+<br>
+
+
+**67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
+
+&#10230;
+
+<br>
+
+
+**68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**69. Remark: PP is commonly used in t-SNE.**
+
+&#10230;
+
+<br>
+
+
+**70. Machine translation**
+
+&#10230;
+
+<br>
+
+
+**71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
+
+&#10230;
+
+<br>
+
+
+**72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
+
+&#10230;
+
+<br>
+
+
+**73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
+
+&#10230;
+
+<br>
+
+
+**74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
+
+&#10230;
+
+<br>
+
+
+**75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
+
+&#10230;
+
+<br>
+
+
+**76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
+
+&#10230;
+
+<br>
+
+
+**77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
+
+&#10230;
+
+<br>
+
+
+**78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
+
+&#10230;
+
+<br>
+
+
+**79. [Case, Root cause, Remedies]**
+
+&#10230;
+
+<br>
+
+
+**80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
+
+&#10230;
+
+<br>
+
+
+**81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**82. where pn is the bleu score on n-gram only defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
+
+&#10230;
+
+<br>
+
+
+**84. Attention**
+
+&#10230;
+
+<br>
+
+
+**85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
+
+&#10230;
+
+<br>
+
+
+**86. with**
+
+&#10230;
+
+<br>
+
+
+**87. Remark: the attention scores are commonly used in image captioning and machine translation.**
+
+&#10230;
+
+<br>
+
+
+**88. A cute teddy bear is reading Persian literature.**
+
+&#10230;
+
+<br>
+
+
+**89. Attention weight ― The amount of attention that the output y<t> should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**90. Remark: computation complexity is quadratic with respect to Tx.**
+
+&#10230;
+
+<br>
+
+
+**91. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+**92. Original authors**
+
+&#10230;
+
+<br>
+
+**93. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**94. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**95. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+**96. By X and Y**
+
+&#10230;
+
+<br>

From 9b7647f6dd7e247b6c5bc1dadf921d9b90ebcf2a Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 5 Feb 2019 23:51:57 -0800
Subject: [PATCH 103/531] Update [fa] progress links

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 9b97cb36b..46c22f32a 100644
--- a/README.md
+++ b/README.md
@@ -37,9 +37,9 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 ## Progression for CS 230 (Deep Learning)
 |Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|not started|done|not started|not started|not started|
-|Recurrent Neural Nets|not started|not started|done|not started|not started|not started|
-|DL tips and tricks|not started|not started|done|not started|not started|not started|
+|Convolutional Neural Nets|not started|[in progress](https://github.com/erfannoury/cheatsheet-translation/issues/9)|done|not started|not started|not started|
+|Recurrent Neural Nets|not started|[in progress](https://github.com/erfannoury/cheatsheet-translation/issues/10)|done|not started|not started|not started|
+|DL tips and tricks|not started|[in progress](https://github.com/erfannoury/cheatsheet-translation/issues/11)|done|not started|not started|not started|
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|

From 8d813889d0a60bc486bbb7882ae546897576f054 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 21:55:55 +0330
Subject: [PATCH 104/531] Update deep-learning-tips-and-tricks.md

---
 fa/deep-learning-tips-and-tricks.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 347234ec2..c765d7e54 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -4,7 +4,9 @@
 
 **1. Deep Learning Tips and Tricks cheatsheet**
 
-&#10230;
+<div dir="rtl">
+راهنمای کوتاه نکات و ترفندهای یادگیری عمیق
+<div>
 
 <br>
 

From c076d700eb39d0a771e20e4ea6e4d53c3be3a8e9 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 21:56:09 +0330
Subject: [PATCH 105/531] Update deep-learning-tips-and-tricks.md

---
 fa/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index c765d7e54..7a2ba914d 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -6,7 +6,7 @@
 
 <div dir="rtl">
 راهنمای کوتاه نکات و ترفندهای یادگیری عمیق
-<div>
+</div>
 
 <br>
 

From d16c9552101320c922e89a3088d992c2fd6ef644 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 22:15:30 +0330
Subject: [PATCH 106/531] Update deep-learning-tips-and-tricks.md

---
 fa/deep-learning-tips-and-tricks.md | 263 +++++++++++++++++++++-------
 1 file changed, 195 insertions(+), 68 deletions(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 7a2ba914d..21341f4a7 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -13,259 +13,332 @@
 
 **2. CS 230 - Deep Learning**
 
-&#10230;
+<div dir="rtl">
+کلاس CS 230 - یادگیری عمیق
+</div>
 
 <br>
 
 
 **3. Tips and tricks**
 
-&#10230;
+<div dir="rtl">
+نکات و ترفندها
+</div>
 
 <br>
 
 
 **4. [Data processing, Data augmentation, Batch normalization]**
 
-&#10230;
+<div dir="rtl">
+] پردازش¬ داده، داده¬افزایی، نرمال¬سازی دسته¬ای[
+</div>
 
 <br>
 
 
 **5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
 
-&#10230;
+<div dir="rtl">
+]آموزش یک شبکه عصبی، تکرار(Epoch)، دسته¬ی¬کوچک، خطای آنتروپی متقاطع، انتشار معکوس، گرادیان نزولی، بروزرسانی وزن¬ها، وارسی گرادیان[
+</div>
 
 <br>
 
 
 **6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
 
-&#10230;
+<div dir="rtl">
+]تنظیم پارامتر، مقداردهی¬اولیه ژاویر،یادگیری انتقالی، نرخ یادگیری، نرخ یادگیری سازگار¬شونده [
+</div>
 
 <br>
 
 
 **7. [Regularization, Dropout, Weight regularization, Early stopping]**
 
-&#10230;
+<div dir="rtl">
+] نظام‌بخشی، برون¬اندازی، نظام¬بخشی وزن، توقف¬زودهنگام[
+</div>
 
 <br>
 
 
 **8. [Good practices, Overfitting small batch, Gradient checking]**
 
-&#10230;
+<div dir="rtl">
+]تمرینات خوب، برارزش دسته کوچک، وارسی گرادیان[
+</div>
 
 <br>
 
 
 **9. View PDF version on GitHub**
 
-&#10230;
+<div dir="rtl">
+نسخه پی¬دی¬اف را در گیت¬هاب ببینید 
+</div>
 
 <br>
 
 
 **10. Data processing**
 
-&#10230;
+<div dir="rtl">
+پردازش¬ داده
+</div>
 
 <br>
 
 
 **11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
 
-&#10230;
+<div dir="rtl">
+داده¬افزایی ― مدل¬های یادگیری عمیق معمولا به داده زیادی نیاز دارند تا بتوانند به خوبی آموزش ببینند. اغلب، استفاده از روش¬های داده افزایی برای گرفتن داده بیشتر از داده موجود، مفید است. اصلی¬ترین آنها در جدول زیر به اختصار آمده اند. به عبارت دقیق¬تر، با در نظر گرفتن تصویر ورودی زیر، روش¬هایی که میتوانم اعمال کرد بدین شرح هستند:
+</div>
 
 <br>
 
 
 **12. [Original, Flip, Rotation, Random crop]**
 
-&#10230;
+<div dir="rtl">
+]آغازین، قرینه، چرخش، برش تصادفی[
+</div>
 
 <br>
 
 
 **13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
 
-&#10230;
-
+<div dir="rtl">
+]تصویر (آغازین) بدون هیچ¬گونه اصلاحی، قرینه شده به نسبت یک محور بطوریکه معنای تصویر حفظ شده است، چرخش با یک زاویه کم، ............، تمرکز تصادفی بر روی یک بخش از تصویر، چندین برش تصادفی را میتوان پشت¬سر¬هم انجام داد[
+</div>
 <br>
 
 
 **14. [Color shift, Noise addition, Information loss, Contrast change]**
 
-&#10230;
+<div dir="rtl">
+]تغییر رنگ، افزودگی نویز، هدر¬رفت اطلاعات، تغییر تباین(کُنتراست)  [
+</div>
 
 <br>
 
 
 **15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
 
-&#10230;
+<div dir="rtl">
+]عناصر RGB کمی تغییر کرده است، نویزی که در هنگام مواجهه با نور رخ می¬دهد را میگیرد، افزودگی نویز، مقاومت بیشتر به تغییر کیفیت ورودی¬ها، بخش¬هایی از تصویر نادیده گرفته میشوند، تقلید (شبیه سازی) هدر¬رفت بالقوه بخش هایی از تصویر، تغییر درخشندگی، با توجه به زمان روز، تفاوت نمایش (تصویر) را کنترل می¬کند[
+</div>
 
 <br>
 
 
 **16. Remark: data is usually augmented on the fly during training.**
 
-&#10230;
+<div dir="rtl">
+نکته: داده معمولا در جریان فرآیند آموزش (به صورت درجا) افزایش پیدا می¬کند.
+</div>
 
 <br>
 
 
 **17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
 
-&#10230;
+<div dir="rtl">
+نرمال¬سازی دسته ای ― یک مرحله از فراعامل‌های γ و β که دسته‌ی {xi} را نرمال می‌کند. نماد μB و σ2B به میانگین و واریانس دسته‌ای که میخواهیم آن را اصلاح کنیم اشاره دارد که به صورت زیر است:
+</div>
 
 <br>
 
 
 **18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
 
-&#10230;
+<div dir="rtl">
+معمولا بعد از یک لایه‌ی تمام‌متصل یا لایه‌ی کانولوشنی و قبل از یک لایه‌ی غیرخطی اعمال می‌شود و امکان استفاده از نرخ یادگیری بالاتر را می‌دهد و همچنین باعث می‌شود که وابستگی شدید مدل به مقداردهی اولیه کاهش یابد.
+</div>
 
 <br>
 
 
 **19. Training a neural network**
 
-&#10230;
+<div dir="rtl">
+آموزش یک شبکه عصبی
+</div>
 
 <br>
 
 
 **20. Definitions**
 
-&#10230;
+<div dir="rtl">
+تعاریف
+</div>
 
 <br>
 
 
 **21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
 
-&#10230;
+<div dir="rtl">
+تکرار (epoch) ― در مضمون آموزش یک مدل، تکرار اصطلاحی است که مدل در یک دوره تکرار تمامی نمونه¬های آموزشی را برای بروزرسانی وزن¬ها می¬بیند.
+</div>
 
 <br>
 
 
 **22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
 
-&#10230;
+<div dir="rtl">
+گرادیان نزولی دسته¬کوچک ―  در فاز آموزش، بروزرسانی وزن¬ها معمولا بر مبنای تمامی مجموعه آموزش به علت پیچیدگی¬های محاسباتی، یا یک نمونه داده به علت مشکل نویز نیست. در عوض، گام بروزرسانی بر روی دسته¬های کوچک انجام می شود، که تعداد نمونه¬های داده در یک دسته فراعاملی است که می¬توان آن را تنظیم کرد.
+</div>
 
 <br>
 
 
 **23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
 
-&#10230;
+<div dir="rtl">
+تابع خطا ―  به منظور سنجش کارایی یک مدل مفروض، معمولا از تابع خطا L برای ارزیابی اینکه تا چه حد خروجی حقیقی y به شکل صحیح توسط خروجی z مدل پیش¬بینی شده¬اند، استفاده میشود. 
+</div>
 
 <br>
 
 
 **24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
 
-&#10230;
+<div dir="rtl">
+خطای آنتروپی متقاطع – در مضمون دسته¬بندی دودویی در شبکه¬های عصبی، عموما از تابع خطای آنتروپی متقاطع L(z,y) استفاده و به صورت زیر تعریف می¬شود:
+</div>
 
 <br>
 
 
 **25. Finding optimal weights**
 
-&#10230;
+<div dir="rtl">
+یافتن وزن¬های بهینه
+</div>
 
 <br>
 
 
 **26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
 
-&#10230;
+<div dir="rtl">
+انتشار معکوس ―  انتشار معکوس روشی برای بروزرسانی وزن‌ها با توجه به خروجی واقعی و خروجی مورد انتظار در شبکه‌ی عصبی است. مشتق نسبت به هر وزن w توسط قاعده‌ی زنجیری محاسبه می‌شود.
+</div>
 
 <br>
 
 
 **27. Using this method, each weight is updated with the rule:**
 
-&#10230;
+<div dir="rtl">
+با استفاده از این روش، هر وزن با قانون زیر بروزرسانی می¬شود:
+</div>
 
 <br>
 
 
 **28. Updating weights ― In a neural network, weights are updated as follows:**
 
-&#10230;
+<div dir="rtl">
+بروزرسانی وزن¬ها – در یک شبکه عصبی، وزن¬ها به شکل زیر بروزرسانی می¬شوند:
+</div>
 
 <br>
 
 
 **29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
 
-&#10230;
+<div dir="rtl">
+]گام 1: یک دسته از داده¬های آموزشی را بگیر و انتشارمستقیم را برای محاسبه خطا اجرا کن، گام 2: خطا را برای گرفتن گرادیان آن به نسبت هر وزن انتشارمعکوس بده، گام 3: از گرادیان¬ها برای بروزرسانی وزن¬های شبکه استفاده کن.[
+</div>
 
 <br>
 
 
 **30. [Forward propagation, Backpropagation, Weights update]**
 
-&#10230;
+<div dir="rtl">
+]انتشارمستقیم، انتشار معکوس، بروزرسانی وزن¬ها[
+</div>
 
 <br>
 
 
 **31. Parameter tuning**
 
-&#10230;
+<div dir="rtl">
+تنظیم پارامتر
+</div>
 
 <br>
 
 
 **32. Weights initialization**
 
-&#10230;
+<div dir="rtl">
+مقداردهی¬اولیه وزن¬ها
+</div>
 
 <br>
 
 
 **33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
 
-&#10230;
+<div dir="rtl">
+مقداردهی¬اولیه ژاویر ―  به¬جای مقداردهی¬اولیه کردن وزن¬ها به شیوه¬ای کاملا تصادفی، مقداردهی¬اولیه ژاویر این امکان را فراهم می¬سازد تا وزن¬های اولیه¬ایی داشته باشیم که ویژگی¬های منحصر به فرد معماری را به حساب می¬آورد.
+</div>
 
 <br>
 
 
 **34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
 
-&#10230;
+<div dir="rtl">
+یادگیری انتقالی ― آموزش یک مدل یادگیری عمیق به داده¬های زیاد و مهم تر از آن به زمان زیادی احتیاج دارد. اغلب استفاده از مزیت وزن¬های ازقبل¬آموزش داده شده بر¬روی پایگاه داده¬های عظیم که روزها/هفته¬ها طول می¬کشند تا آموزش ببینند، مفید است، و  میتوان از قدرت آن برای مورد استفاده-مان بهره جست. بسته به میزان داده¬ای که در اختیار داریم ، در زیر روش¬های مختلفی که میتوان از آنها بهره جست آورده شده¬اند:
+</div>
 
 <br>
 
 
 **35. [Training size, Illustration, Explanation]**
 
-&#10230;
+<div dir="rtl">
+]اندازه آموزش، نگاره، توضیح[
+</div>
 
 <br>
 
 
 **36. [Small, Medium, Large]**
 
-&#10230;
+<div dir="rtl">
+]کوچک، متوسط، بزرگ[
+</div>
 
 <br>
 
 
 **37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
 
-&#10230;
+<div dir="rtl">
+]منجمد کردن تمامی لایه¬ها، آموزش وزن¬ها در بیشینه¬ی¬ هموار، منجمد کردن اکثر لایه¬ها، آموزش وزن¬ها در لایه¬های آخر و بیشینه¬ی هموار، آموزش وزن¬ها در (تمامی) لایه¬ها و بیشینه¬ی هموار با مقداردهی¬اولیه کردن وزن¬ها بر روی مقادیر از¬قبل¬آموزش داده شده[
+</div>
 
 <br>
 
 
 **38. Optimizing convergence**
 
-&#10230;
+<div dir="rtl">
+بهینه¬سازی همگرایی
+</div>
 
 <br>
 
@@ -273,187 +346,241 @@
 **39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.
 **
 
-&#10230;
+<div dir="rtl">
+نرخ یادگیری – نرخ یادگیری اغلب با نماد α و گاهی اوقات با نماد η نمایش داده می‌شود و بیانگر سرعت (گام) بروزرسانی وزن‌ها است که میتواند مقداری ثابت یا به سازگارشونده تغییر کند. محبوب‌ترین روش حال حاضر Adam نام دارد، متدی است که نرخ یادگیری را در حین فرآیند آموزش تنظیم می‌کند.
+</div>
 
 <br>
 
 
 **40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
 
-&#10230;
+<div dir="rtl">
+نرخ های یادگیری سازگارشونده ― داشتن نرخ یادگیری متغیر در فرآیند آموزش یک مدل،  می¬تواند زمان آموزش را کاهش دهد و راه¬حل بهینه عددی را بهبود ببخشد. با آنکه بهینه ساز Adam محبوب¬ترین متد مورد استفاده است، دیگر متد¬ها نیز می¬توانند مفید باشند. این متد ها در جدول زیر به اختصار آمده اند:
+</div>
 
 <br>
 
 
 **41. [Method, Explanation, Update of w, Update of b]**
 
-&#10230;
+<div dir="rtl">
+]روش، توضیح، بروزرسانی w، بروزرسانی  b[
+</div>
 
 <br>
 
 
 **42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
 
-&#10230;
+<div dir="rtl">
+]Momentum، نوسانات را تعدیل میدهد، بهبود SGD، 2 پارامتر نیاز به تنظیم دارند[
+</div>
 
 <br>
 
 
 **43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
 
-&#10230;
+<div dir="rtl">
+]RMSprop، انتشار جذر میانگین مربعات، سرعت بخشیدن به الگوریتم یادگیری با کنترل نوسانات[
+</div>
 
 <br>
 
 
 **44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
 
-&#10230;
+<div dir="rtl">
+]Adam، تخمین سازگارشونده گشتاور، محبوب¬ترین متد، 4 پارامتر نیاز به تنظیم دارند[
+</div>
 
 <br>
 
 
 **45. Remark: other methods include Adadelta, Adagrad and SGD.**
 
-&#10230;
+<div dir="rtl">
+نکته: سایر متد¬ها  شامل Adadelta، Adagrad و SGD هستند.
+</div>
 
 <br>
 
 
 **46. Regularization**
 
-&#10230;
+<div dir="rtl">
+نظام¬بخشی
+</div>
 
 <br>
 
 
 **47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
 
-&#10230;
+<div dir="rtl">
+برون¬اندازی – برون¬اندازی روشی است که در شبکه های عصبی برای جلوگیری از برارزش شدن بر روی داده¬های آموزشی با حذف تصادفی نورون¬ها با احتمال p>0 استفاده میشود. این روش مدل را مجبور می¬کند تا از تکیه کردن بیش از حد بر روی مجموعه¬ خاصی از ویژگی¬ها خودداری کند.
+</div>
 
 <br>
 
 
 **48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
 
-&#10230;
+<div dir="rtl">
+نکته: بیشتر فریم ورک¬های یادگیری عمیق برون¬اندازی را به شکل پارامتر ‘keep’  1-p در¬می-آورند.
+</div>
 
 <br>
 
 
 **49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
 
-&#10230;
+<div dir="rtl">
+نظام¬بخشی وزن – برای اطمینان از اینکه (مقادیر) وزن¬ها زیادی بزرگ نیستند و مدل به مجموعه آموزش بیش¬برارزش نیست، روش¬های نظام¬بخشی معمولا بر روی وزن¬های مدل اجرا می شوند. اصلی¬ترین آنها در جدول زیر به اختصار آمده اند:
+</div>
 
 <br>
 
 
 **50. [LASSO, Ridge, Elastic Net]**
 
-&#10230;
-
+<div dir="rtl">
+[LASSO, Ridge, Elastic Net]
+</div>
 <br>
 
 **50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
-&#10230;
+<div dir="rtl">
+ضرایب را تا ۰ کاهش می‌دهد، برای انتخاب متغیر مناسب است، ضرایب را کوچکتر می‌کند، بین انتخاب متغیر و ضرایب کوچک مصالحه می‌کند
+</div>
 
 <br>
 
 **51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
 
-&#10230;
+<div dir="rtl">
+توقف زودهنگام: این روش نظام¬بخشی، فرآیند آموزش را به محض اینکه خطای اعتبارسنجی ثابت یا شروع به افزایش پیدا کند، متوقف می¬کند.
+</div>
 
 <br>
 
 
 **52. [Error, Validation, Training, early stopping, Epochs]**
 
-&#10230;
+<div dir="rtl">
+]خطا، اعتبارسنجی، آموزش، توقف زودهنگام، تکرارها[
+</div>
 
 <br>
 
 
 **53. Good practices**
 
-&#10230;
+<div dir="rtl">
+تمرینات خوب
+</div>
 
 <br>
 
 
 **54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
 
-&#10230;
+<div dir="rtl">
+بیش¬برارزش کردن دسته¬کوچک ―  هنگام اشکالزدایی یک مدل، اغلب مفید است که یک سری آزمایش¬های سریع برای اطمینان از اینکه هیچ مشکل عمده ای در معماری مدل وجود دارد انجام شود. به طورخاص، برای اطمینان از اینکه مدل می¬تواند به شکل صحیح آموزش ببیند، یک دسته-ی¬کوچک (از داده¬ها) به شبکه داده می¬شود تا دریابیم که مدل میتواند به آنها بیش¬برارزش کند. اگر نتوانید، بدین معناست که مدل یا خیلی پیچیده است یا پیچیدگی لازم برای بیش¬برارزش شدن برروی دسته¬ی¬کوچک را ندارد، چه برسد به یک مجموعه آموزشی با اندازه عادی.
+</div>
 
 <br>
 
 
 **55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
 
-&#10230;
+<div dir="rtl">
+وارسی گرادیان – وارسی گرادیان متدی است که در طول پیاده سازی گذر روبه¬عقب یک شبکه عصبی استفاده می شود. این متد مقدار گرادیان تحلیلی را با گرادیان عددی در نقطه¬های مفروض را مقایسه میکند و نقش بررسی¬درستی را ایفا میکند. 
+</div>
 
 <br>
 
 
 **56. [Type, Numerical gradient, Analytical gradient]**
 
-&#10230;
+<div dir="rtl">
+]نوع، گرادیان عددی، گرادیان تحلیلی[
+</div>
 
 <br>
 
 
 **57. [Formula, Comments]**
 
-&#10230;
+<div dir="rtl">
+]فرمول، توضیحات[
+</div>
 
 <br>
 
 
 **58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
 
-&#10230;
+<div dir="rtl">
+]گران (محاسباتی)،  خطا باید دو بار در هر بُعد محاسبه شود، برای تایید صحت پیاده¬سازی تحلیلی استفاده می¬شود، مصالحه در انتخاب h: نه بسیار کوچک (ناپایداری عددی) و نه خیلی بزرگ (تخمین گرادیان ضعیف) باشد[
+</div>
 
 <br>
 
 
 **59. ['Exact' result, Direct computation, Used in the final implementation]**
 
-&#10230;
+<div dir="rtl">
+]نتیجه ‘عینی، محاسبه مستقیم، در پیاده¬سازی نهایی استفاده می¬شود[
+</div>
 
 <br>
 
 
 **60. The Deep Learning cheatsheets are now available in [target language].
 
-&#10230;
-
+<div dir="rtl">
+راهنمای¬ یادگیری عمیق هم اکنون به زبان ]فارسی[ در دسترس است.
+</div>
 
 **61. Original authors**
 
-&#10230;
+<div dir="rtl">
+متن اصلی از
+</div>
 
 <br>
 
 **62.Translated by X, Y and Z**
 
-&#10230;
+<div dir="rtl">
+ترجمه شده توسط X،Y و Z
+</div>
 
 <br>
 
 **63.Reviewed by X, Y and Z**
 
-&#10230;
+<div dir="rtl">
+بازبینی شده توسط توسط X،Y و Z
+</div>
 
 <br>
 
 **64.View PDF version on GitHub**
 
-&#10230;
+<div dir="rtl">
+نسخه پی¬دی¬اف را در گیت¬هاب ببینید
+</div>
 
 <br>
 
 **65.By X and Y**
 
-&#10230;
+<div dir="rtl">
+توسط X و Y
+</div>
 
 <br>

From 4aa37de55e66f2e09f1e33b91bd1b44a1c3c65ce Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 22:42:31 +0330
Subject: [PATCH 107/531] Update deep-learning-tips-and-tricks.md

---
 fa/deep-learning-tips-and-tricks.md | 102 ++++++++++++++--------------
 1 file changed, 51 insertions(+), 51 deletions(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 21341f4a7..6bfed3e51 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -32,7 +32,7 @@
 **4. [Data processing, Data augmentation, Batch normalization]**
 
 <div dir="rtl">
-] پردازش¬ داده، داده¬افزایی، نرمال¬سازی دسته¬ای[
+[پردازش‌داده، داده‌افزایی، نرمال‌سازی دسته‌ای]
 </div>
 
 <br>
@@ -41,7 +41,7 @@
 **5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
 
 <div dir="rtl">
-]آموزش یک شبکه عصبی، تکرار(Epoch)، دسته¬ی¬کوچک، خطای آنتروپی متقاطع، انتشار معکوس، گرادیان نزولی، بروزرسانی وزن¬ها، وارسی گرادیان[
+[آموزش یک شبکه عصبی، تکرار(Epoch)، دسته‌ی‌کوچک، خطای آنتروپی متقاطع، انتشار معکوس، گرادیان نزولی، بروزرسانی وزن‌ها، وارسی گرادیان]
 </div>
 
 <br>
@@ -50,7 +50,7 @@
 **6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
 
 <div dir="rtl">
-]تنظیم پارامتر، مقداردهی¬اولیه ژاویر،یادگیری انتقالی، نرخ یادگیری، نرخ یادگیری سازگار¬شونده [
+[تنظیم پارامتر، مقداردهی‌اولیه ژاویر،یادگیری انتقالی، نرخ یادگیری، نرخ یادگیری سازگارشونده]
 </div>
 
 <br>
@@ -59,7 +59,7 @@
 **7. [Regularization, Dropout, Weight regularization, Early stopping]**
 
 <div dir="rtl">
-] نظام‌بخشی، برون¬اندازی، نظام¬بخشی وزن، توقف¬زودهنگام[
+[نظام‌بخشی، برون‌اندازی، نظام‌بخشی وزن، توقف‌زودهنگام]
 </div>
 
 <br>
@@ -68,7 +68,7 @@
 **8. [Good practices, Overfitting small batch, Gradient checking]**
 
 <div dir="rtl">
-]تمرینات خوب، برارزش دسته کوچک، وارسی گرادیان[
+[تمرینات خوب، برارزش دسته کوچک، وارسی گرادیان]
 </div>
 
 <br>
@@ -77,7 +77,7 @@
 **9. View PDF version on GitHub**
 
 <div dir="rtl">
-نسخه پی¬دی¬اف را در گیت¬هاب ببینید 
+نسخه پی‌دی‌اف را در گیت‌هاب ببینید 
 </div>
 
 <br>
@@ -86,7 +86,7 @@
 **10. Data processing**
 
 <div dir="rtl">
-پردازش¬ داده
+پردازش داده
 </div>
 
 <br>
@@ -95,7 +95,7 @@
 **11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
 
 <div dir="rtl">
-داده¬افزایی ― مدل¬های یادگیری عمیق معمولا به داده زیادی نیاز دارند تا بتوانند به خوبی آموزش ببینند. اغلب، استفاده از روش¬های داده افزایی برای گرفتن داده بیشتر از داده موجود، مفید است. اصلی¬ترین آنها در جدول زیر به اختصار آمده اند. به عبارت دقیق¬تر، با در نظر گرفتن تصویر ورودی زیر، روش¬هایی که میتوانم اعمال کرد بدین شرح هستند:
+داده‌افزایی ― مدل‌های یادگیری عمیق معمولا به داده زیادی نیاز دارند تا بتوانند به خوبی آموزش ببینند. اغلب، استفاده از روشهای داده افزایی برای گرفتن داده بیشتر از داده موجود، مفید است. اصلی‌ترین آنها در جدول زیر به اختصار آمده اند. به عبارت دقیق‌تر، با در نظر گرفتن تصویر ورودی زیر، روش‌هایی که میتوانم اعمال کرد بدین شرح هستند:
 </div>
 
 <br>
@@ -104,7 +104,7 @@
 **12. [Original, Flip, Rotation, Random crop]**
 
 <div dir="rtl">
-]آغازین، قرینه، چرخش، برش تصادفی[
+[آغازین، قرینه، چرخش، برش تصادفی]
 </div>
 
 <br>
@@ -113,7 +113,7 @@
 **13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
 
 <div dir="rtl">
-]تصویر (آغازین) بدون هیچ¬گونه اصلاحی، قرینه شده به نسبت یک محور بطوریکه معنای تصویر حفظ شده است، چرخش با یک زاویه کم، ............، تمرکز تصادفی بر روی یک بخش از تصویر، چندین برش تصادفی را میتوان پشت¬سر¬هم انجام داد[
+[تصویر (آغازین) بدون هیچ‌گونه اصلاحی، قرینه شده به نسبت یک محور بطوری‌که معنای تصویر حفظ شده است، چرخش با یک زاویه کم، ............، تمرکز تصادفی بر روی یک بخش از تصویر، چندین برش تصادفی را میتوان پشت‌سرهم انجام داد]
 </div>
 <br>
 
@@ -121,7 +121,7 @@
 **14. [Color shift, Noise addition, Information loss, Contrast change]**
 
 <div dir="rtl">
-]تغییر رنگ، افزودگی نویز، هدر¬رفت اطلاعات، تغییر تباین(کُنتراست)  [
+[تغییر رنگ، افزودگی نویز، هدررفت اطلاعات، تغییر تباین(کُنتراست)]
 </div>
 
 <br>
@@ -130,7 +130,7 @@
 **15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
 
 <div dir="rtl">
-]عناصر RGB کمی تغییر کرده است، نویزی که در هنگام مواجهه با نور رخ می¬دهد را میگیرد، افزودگی نویز، مقاومت بیشتر به تغییر کیفیت ورودی¬ها، بخش¬هایی از تصویر نادیده گرفته میشوند، تقلید (شبیه سازی) هدر¬رفت بالقوه بخش هایی از تصویر، تغییر درخشندگی، با توجه به زمان روز، تفاوت نمایش (تصویر) را کنترل می¬کند[
+[عناصر RGB کمی تغییر کرده است، نویزی که در هنگام مواجهه با نور رخ می‌دهد را می‌گیرد، افزودگی نویز، مقاومت بیشتر به تغییر کیفیت ورودی‌ها، بخش‌هایی از تصویر نادیده گرفته میشوند، تقلید (شبیه سازی) هدررفت بالقوه بخش هایی از تصویر، تغییر درخشندگی، با توجه به زمان روز، تفاوت نمایش (تصویر) را کنترل می‌کند]
 </div>
 
 <br>
@@ -139,7 +139,7 @@
 **16. Remark: data is usually augmented on the fly during training.**
 
 <div dir="rtl">
-نکته: داده معمولا در جریان فرآیند آموزش (به صورت درجا) افزایش پیدا می¬کند.
+نکته: داده معمولا در فرآیند آموزش (به صورت درجا) افزایش پیدا می‌کند.
 </div>
 
 <br>
@@ -148,7 +148,7 @@
 **17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
 
 <div dir="rtl">
-نرمال¬سازی دسته ای ― یک مرحله از فراعامل‌های γ و β که دسته‌ی {xi} را نرمال می‌کند. نماد μB و σ2B به میانگین و واریانس دسته‌ای که میخواهیم آن را اصلاح کنیم اشاره دارد که به صورت زیر است:
+نرمال‌سازی دسته‌ای ― یک مرحله از فراعامل‌های γ و β که دسته‌ی {xi} را نرمال می‌کند. نماد μB و σ2B به میانگین و واریانس دسته‌ای که می‌خواهیم آن را اصلاح کنیم اشاره دارد که به صورت زیر است:
 </div>
 
 <br>
@@ -184,7 +184,7 @@
 **21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
 
 <div dir="rtl">
-تکرار (epoch) ― در مضمون آموزش یک مدل، تکرار اصطلاحی است که مدل در یک دوره تکرار تمامی نمونه¬های آموزشی را برای بروزرسانی وزن¬ها می¬بیند.
+تکرار (epoch) ― در مضمون آموزش یک مدل، تکرار اصطلاحی است که مدل در یک دوره تکرار تمامی نمونه‌های آموزشی را برای بروزرسانی وزن‌ها می‌بیند.
 </div>
 
 <br>
@@ -193,7 +193,7 @@
 **22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
 
 <div dir="rtl">
-گرادیان نزولی دسته¬کوچک ―  در فاز آموزش، بروزرسانی وزن¬ها معمولا بر مبنای تمامی مجموعه آموزش به علت پیچیدگی¬های محاسباتی، یا یک نمونه داده به علت مشکل نویز نیست. در عوض، گام بروزرسانی بر روی دسته¬های کوچک انجام می شود، که تعداد نمونه¬های داده در یک دسته فراعاملی است که می¬توان آن را تنظیم کرد.
+گرادیان نزولی دسته‌ی‌کوچک ―  در فاز آموزش، بروزرسانی وزن‌ها معمولا بر مبنای تمامی مجموعه آموزش به علت پیچیدگی‌های محاسباتی، یا یک نمونه داده به علت مشکل نویز نیست. در عوض، گام بروزرسانی بر روی دسته‌های کوچک انجام می شود، که تعداد نمونه‌های داده در یک دسته فراعاملی است که میتوان آن را تنظیم کرد.
 </div>
 
 <br>
@@ -202,7 +202,7 @@
 **23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
 
 <div dir="rtl">
-تابع خطا ―  به منظور سنجش کارایی یک مدل مفروض، معمولا از تابع خطا L برای ارزیابی اینکه تا چه حد خروجی حقیقی y به شکل صحیح توسط خروجی z مدل پیش¬بینی شده¬اند، استفاده میشود. 
+تابع خطا ―  به منظور سنجش کارایی یک مدل مفروض، معمولا از تابع خطا L برای ارزیابی اینکه تا چه حد خروجی حقیقی y به شکل صحیح توسط خروجی z مدل پیش‌بینی شدهاند، استفاده می‌شود. 
 </div>
 
 <br>
@@ -211,7 +211,7 @@
 **24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
 
 <div dir="rtl">
-خطای آنتروپی متقاطع – در مضمون دسته¬بندی دودویی در شبکه¬های عصبی، عموما از تابع خطای آنتروپی متقاطع L(z,y) استفاده و به صورت زیر تعریف می¬شود:
+خطای آنتروپی متقاطع – در مضمون دسته‌بندی دودویی در شبکه‌های عصبی، عموما از تابع خطای آنتروپی متقاطع L(z,y) استفاده و به صورت زیر تعریف میشود:
 </div>
 
 <br>
@@ -220,7 +220,7 @@
 **25. Finding optimal weights**
 
 <div dir="rtl">
-یافتن وزن¬های بهینه
+یافتن وزن‌های بهینه
 </div>
 
 <br>
@@ -238,7 +238,7 @@
 **27. Using this method, each weight is updated with the rule:**
 
 <div dir="rtl">
-با استفاده از این روش، هر وزن با قانون زیر بروزرسانی می¬شود:
+با استفاده از این روش، هر وزن با قانون زیر بروزرسانی می‌شود:
 </div>
 
 <br>
@@ -247,7 +247,7 @@
 **28. Updating weights ― In a neural network, weights are updated as follows:**
 
 <div dir="rtl">
-بروزرسانی وزن¬ها – در یک شبکه عصبی، وزن¬ها به شکل زیر بروزرسانی می¬شوند:
+بروزرسانی وزنها – در یک شبکه عصبی، وزن‌ها به شکل زیر بروزرسانی میشوند:
 </div>
 
 <br>
@@ -256,7 +256,7 @@
 **29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
 
 <div dir="rtl">
-]گام 1: یک دسته از داده¬های آموزشی را بگیر و انتشارمستقیم را برای محاسبه خطا اجرا کن، گام 2: خطا را برای گرفتن گرادیان آن به نسبت هر وزن انتشارمعکوس بده، گام 3: از گرادیان¬ها برای بروزرسانی وزن¬های شبکه استفاده کن.[
+[گام 1: یک دسته از داده‌های آموزشی را بگیر و انتشارمستقیم را برای محاسبه خطا اجرا کن، گام 2: خطا را برای گرفتن گرادیان آن به نسبت هر وزن انتشارمعکوس بده، گام 3: از گرادیان‌ها برای بروزرسانی وزن‌های شبکه استفاده کن.]
 </div>
 
 <br>
@@ -265,7 +265,7 @@
 **30. [Forward propagation, Backpropagation, Weights update]**
 
 <div dir="rtl">
-]انتشارمستقیم، انتشار معکوس، بروزرسانی وزن¬ها[
+[انتشارمستقیم، انتشار معکوس، بروزرسانی وزنها]
 </div>
 
 <br>
@@ -283,7 +283,7 @@
 **32. Weights initialization**
 
 <div dir="rtl">
-مقداردهی¬اولیه وزن¬ها
+مقداردهی‌اولیه وزنها
 </div>
 
 <br>
@@ -292,7 +292,7 @@
 **33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
 
 <div dir="rtl">
-مقداردهی¬اولیه ژاویر ―  به¬جای مقداردهی¬اولیه کردن وزن¬ها به شیوه¬ای کاملا تصادفی، مقداردهی¬اولیه ژاویر این امکان را فراهم می¬سازد تا وزن¬های اولیه¬ایی داشته باشیم که ویژگی¬های منحصر به فرد معماری را به حساب می¬آورد.
+مقداردهی‌اولیه ژاویر ―  به‌جای مقداردهی‌اولیه کردن وزن‌ها به شیوه‌ی کاملا تصادفی، مقداردهی‌اولیه ژاویر این امکان را فراهم میسازد تا وزن‌های اولیه‌ایی داشته باشیم که ویژگی‌های منحصر به فرد معماری را به حساب می‌آورند.
 </div>
 
 <br>
@@ -301,7 +301,7 @@
 **34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
 
 <div dir="rtl">
-یادگیری انتقالی ― آموزش یک مدل یادگیری عمیق به داده¬های زیاد و مهم تر از آن به زمان زیادی احتیاج دارد. اغلب استفاده از مزیت وزن¬های ازقبل¬آموزش داده شده بر¬روی پایگاه داده¬های عظیم که روزها/هفته¬ها طول می¬کشند تا آموزش ببینند، مفید است، و  میتوان از قدرت آن برای مورد استفاده-مان بهره جست. بسته به میزان داده¬ای که در اختیار داریم ، در زیر روش¬های مختلفی که میتوان از آنها بهره جست آورده شده¬اند:
+یادگیری انتقالی ― آموزش یک مدل یادگیری عمیق به داده‌های زیاد و مهم تر از آن به زمان زیادی احتیاج دارد. اغلب استفاده از مزیت وزنهای ازقبل‌آموزش داده شده برروی پایگاه داده‌های عظیم که روزها/هفته‌ها طول می‌کشند تا آموزش ببینند، مفید است، و  می‌توان از قدرت آن برای مورد استفاده‌مان بهره جست. بسته به میزان داده‌هایی که در اختیار داریم ، در زیر روشهای مختلفی که می‌توان از آنها بهره جست آورده شده‌اند:
 </div>
 
 <br>
@@ -310,7 +310,7 @@
 **35. [Training size, Illustration, Explanation]**
 
 <div dir="rtl">
-]اندازه آموزش، نگاره، توضیح[
+[اندازه آموزش، نگاره، توضیح]
 </div>
 
 <br>
@@ -319,7 +319,7 @@
 **36. [Small, Medium, Large]**
 
 <div dir="rtl">
-]کوچک، متوسط، بزرگ[
+[کوچک، متوسط، بزرگ]
 </div>
 
 <br>
@@ -328,7 +328,7 @@
 **37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
 
 <div dir="rtl">
-]منجمد کردن تمامی لایه¬ها، آموزش وزن¬ها در بیشینه¬ی¬ هموار، منجمد کردن اکثر لایه¬ها، آموزش وزن¬ها در لایه¬های آخر و بیشینه¬ی هموار، آموزش وزن¬ها در (تمامی) لایه¬ها و بیشینه¬ی هموار با مقداردهی¬اولیه کردن وزن¬ها بر روی مقادیر از¬قبل¬آموزش داده شده[
+[منجمد کردن تمامی لایه‌ها، آموزش وزن‌ها در بیشینه‌ی هموار، منجمد کردن اکثر لایه‌ها، آموزش وزن‌ها در لایه‌های آخر و بیشینه‌ی هموار، آموزش وزن‌ها در (تمامی) لایه‌ها و بیشینه‌ی هموار با مقداردهی‌اولیه کردن وزن‌ها بر روی مقادیر ازقبل‌آموزش داده شده]
 </div>
 
 <br>
@@ -337,7 +337,7 @@
 **38. Optimizing convergence**
 
 <div dir="rtl">
-بهینه¬سازی همگرایی
+بهینه‌سازی همگرایی
 </div>
 
 <br>
@@ -347,7 +347,7 @@
 **
 
 <div dir="rtl">
-نرخ یادگیری – نرخ یادگیری اغلب با نماد α و گاهی اوقات با نماد η نمایش داده می‌شود و بیانگر سرعت (گام) بروزرسانی وزن‌ها است که میتواند مقداری ثابت یا به سازگارشونده تغییر کند. محبوب‌ترین روش حال حاضر Adam نام دارد، متدی است که نرخ یادگیری را در حین فرآیند آموزش تنظیم می‌کند.
+نرخ یادگیری – نرخ یادگیری اغلب با نماد α و گاهی اوقات با نماد η نمایش داده می‌شود و بیانگر سرعت (گام) بروزرسانی وزن‌ها است که می‌تواند مقداری ثابت یا به سازگارشونده تغییر کند. محبوب‌ترین روش حال حاضر Adam نام دارد، متدی است که نرخ یادگیری را در حین فرآیند آموزش تنظیم می‌کند.
 </div>
 
 <br>
@@ -356,7 +356,7 @@
 **40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
 
 <div dir="rtl">
-نرخ های یادگیری سازگارشونده ― داشتن نرخ یادگیری متغیر در فرآیند آموزش یک مدل،  می¬تواند زمان آموزش را کاهش دهد و راه¬حل بهینه عددی را بهبود ببخشد. با آنکه بهینه ساز Adam محبوب¬ترین متد مورد استفاده است، دیگر متد¬ها نیز می¬توانند مفید باشند. این متد ها در جدول زیر به اختصار آمده اند:
+نرخ‌های یادگیری سازگارشونده ― داشتن نرخ یادگیری متغیر در فرآیند آموزش یک مدل،  می‌تواند زمان آموزش را کاهش دهد و راه‌حل بهینه عددی را بهبود ببخشد. با آنکه بهینه ساز Adam محبوب‌ترین متد مورد استفاده است، دیگر متدها نیز میتوانند مفید باشند. این متد ها در جدول زیر به اختصار آمده‌اند:
 </div>
 
 <br>
@@ -365,7 +365,7 @@
 **41. [Method, Explanation, Update of w, Update of b]**
 
 <div dir="rtl">
-]روش، توضیح، بروزرسانی w، بروزرسانی  b[
+[روش، توضیح، بروزرسانی w، بروزرسانی  b]
 </div>
 
 <br>
@@ -374,7 +374,7 @@
 **42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
 
 <div dir="rtl">
-]Momentum، نوسانات را تعدیل میدهد، بهبود SGD، 2 پارامتر نیاز به تنظیم دارند[
+[Momentum، نوسانات را تعدیل می‌دهد، بهبود SGD، 2 پارامتر نیاز به تنظیم دارند]
 </div>
 
 <br>
@@ -383,7 +383,7 @@
 **43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
 
 <div dir="rtl">
-]RMSprop، انتشار جذر میانگین مربعات، سرعت بخشیدن به الگوریتم یادگیری با کنترل نوسانات[
+[RMSprop، انتشار جذر میانگین مربعات، سرعت بخشیدن به الگوریتم یادگیری با کنترل نوسانات]
 </div>
 
 <br>
@@ -392,7 +392,7 @@
 **44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
 
 <div dir="rtl">
-]Adam، تخمین سازگارشونده گشتاور، محبوب¬ترین متد، 4 پارامتر نیاز به تنظیم دارند[
+[Adam، تخمین سازگارشونده گشتاور، محبوب‌ترین متد، 4 پارامتر نیاز به تنظیم دارند]
 </div>
 
 <br>
@@ -401,7 +401,7 @@
 **45. Remark: other methods include Adadelta, Adagrad and SGD.**
 
 <div dir="rtl">
-نکته: سایر متد¬ها  شامل Adadelta، Adagrad و SGD هستند.
+نکته: سایر متدها  شامل Adadelta، Adagrad و SGD هستند.
 </div>
 
 <br>
@@ -410,7 +410,7 @@
 **46. Regularization**
 
 <div dir="rtl">
-نظام¬بخشی
+نظام‌بخشی
 </div>
 
 <br>
@@ -419,7 +419,7 @@
 **47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
 
 <div dir="rtl">
-برون¬اندازی – برون¬اندازی روشی است که در شبکه های عصبی برای جلوگیری از برارزش شدن بر روی داده¬های آموزشی با حذف تصادفی نورون¬ها با احتمال p>0 استفاده میشود. این روش مدل را مجبور می¬کند تا از تکیه کردن بیش از حد بر روی مجموعه¬ خاصی از ویژگی¬ها خودداری کند.
+برون‌اندازی – برون‌اندازی روشی است که در شبکه های عصبی برای جلوگیری از برارزش شدن بر روی داده‌های آموزشی با حذف تصادفی نورون‌ها با احتمال p>0 استفاده میشود. این روش مدل را مجبور میکند تا از تکیه کردن بیش از حد بر روی مجموعه خاصی از ویژگی‌ها خودداری کند.
 </div>
 
 <br>
@@ -428,7 +428,7 @@
 **48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
 
 <div dir="rtl">
-نکته: بیشتر فریم ورک¬های یادگیری عمیق برون¬اندازی را به شکل پارامتر ‘keep’  1-p در¬می-آورند.
+نکته: بیشتر فریم ورکهای یادگیری عمیق برون‌اندازی را به شکل پارامتر ‘keep’  1-p درمی-آورند.
 </div>
 
 <br>
@@ -437,7 +437,7 @@
 **49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
 
 <div dir="rtl">
-نظام¬بخشی وزن – برای اطمینان از اینکه (مقادیر) وزن¬ها زیادی بزرگ نیستند و مدل به مجموعه آموزش بیش¬برارزش نیست، روش¬های نظام¬بخشی معمولا بر روی وزن¬های مدل اجرا می شوند. اصلی¬ترین آنها در جدول زیر به اختصار آمده اند:
+نظام‌بخشی وزن – برای اطمینان از اینکه (مقادیر) وزن‌ها زیادی بزرگ نیستند و مدل به مجموعه آموزش بیش‌برارزش نیست، روشهای نظام‌بخشی معمولا بر روی وزن‌های مدل اجرا می‌شوند. اصلی‌ترین آنها در جدول زیر به اختصار آمده اند:
 </div>
 
 <br>
@@ -461,7 +461,7 @@
 **51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
 
 <div dir="rtl">
-توقف زودهنگام: این روش نظام¬بخشی، فرآیند آموزش را به محض اینکه خطای اعتبارسنجی ثابت یا شروع به افزایش پیدا کند، متوقف می¬کند.
+توقف زودهنگام ― این روش نظام‌بخشی، فرآیند آموزش را به محض اینکه خطای اعتبارسنجی ثابت یا شروع به افزایش پیدا کند، متوقف می‌کند.
 </div>
 
 <br>
@@ -470,7 +470,7 @@
 **52. [Error, Validation, Training, early stopping, Epochs]**
 
 <div dir="rtl">
-]خطا، اعتبارسنجی، آموزش، توقف زودهنگام، تکرارها[
+[خطا، اعتبارسنجی، آموزش، توقف زودهنگام، تکرارها]
 </div>
 
 <br>
@@ -488,7 +488,7 @@
 **54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
 
 <div dir="rtl">
-بیش¬برارزش کردن دسته¬کوچک ―  هنگام اشکالزدایی یک مدل، اغلب مفید است که یک سری آزمایش¬های سریع برای اطمینان از اینکه هیچ مشکل عمده ای در معماری مدل وجود دارد انجام شود. به طورخاص، برای اطمینان از اینکه مدل می¬تواند به شکل صحیح آموزش ببیند، یک دسته-ی¬کوچک (از داده¬ها) به شبکه داده می¬شود تا دریابیم که مدل میتواند به آنها بیش¬برارزش کند. اگر نتوانید، بدین معناست که مدل یا خیلی پیچیده است یا پیچیدگی لازم برای بیش¬برارزش شدن برروی دسته¬ی¬کوچک را ندارد، چه برسد به یک مجموعه آموزشی با اندازه عادی.
+بیش‌برارزش کردن دسته‌ی‌کوچک ―  هنگام اشکال‌زدایی یک مدل، اغلب مفید است که یک سری آزمایش‌های سریع برای اطمینان از اینکه هیچ مشکل عمده‌ای در معماری مدل وجود دارد انجام شود. به طورخاص، برای اطمینان از اینکه مدل می‌تواند به شکل صحیح آموزش ببیند، یک دسته‌ی‌کوچک (از داده‌ها) به شبکه داده می‌شود تا دریابیم که مدل می‌تواند به آنها بیش‌برارزش کند. اگر نتوانید، بدین معناست که مدل یا خیلی پیچیده است یا پیچیدگی لازم برای بیش‌برارزش شدن برروی دسته‌ی‌کوچک را ندارد، چه برسد به یک مجموعه آموزشی با اندازه عادی.
 </div>
 
 <br>
@@ -497,7 +497,7 @@
 **55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
 
 <div dir="rtl">
-وارسی گرادیان – وارسی گرادیان متدی است که در طول پیاده سازی گذر روبه¬عقب یک شبکه عصبی استفاده می شود. این متد مقدار گرادیان تحلیلی را با گرادیان عددی در نقطه¬های مفروض را مقایسه میکند و نقش بررسی¬درستی را ایفا میکند. 
+وارسی گرادیان – وارسی گرادیان متدی است که در طول پیاده سازی گذر روبه‌عقب یک شبکه عصبی استفاده می شود. این متد مقدار گرادیان تحلیلی را با گرادیان عددی در نقطه‌های مفروض را مقایسه میکند و نقش بررسی‌درستی را ایفا میکند. 
 </div>
 
 <br>
@@ -506,7 +506,7 @@
 **56. [Type, Numerical gradient, Analytical gradient]**
 
 <div dir="rtl">
-]نوع، گرادیان عددی، گرادیان تحلیلی[
+[نوع، گرادیان عددی، گرادیان تحلیلی]
 </div>
 
 <br>
@@ -515,7 +515,7 @@
 **57. [Formula, Comments]**
 
 <div dir="rtl">
-]فرمول، توضیحات[
+[فرمول، توضیحات]
 </div>
 
 <br>
@@ -524,7 +524,7 @@
 **58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
 
 <div dir="rtl">
-]گران (محاسباتی)،  خطا باید دو بار در هر بُعد محاسبه شود، برای تایید صحت پیاده¬سازی تحلیلی استفاده می¬شود، مصالحه در انتخاب h: نه بسیار کوچک (ناپایداری عددی) و نه خیلی بزرگ (تخمین گرادیان ضعیف) باشد[
+[گران (محاسباتی)،  خطا باید دو بار در هر بُعد محاسبه شود، برای تایید صحت پیاده‌سازی تحلیلی استفاده میشود، مصالحه در انتخاب h: نه بسیار کوچک (ناپایداری عددی) و نه خیلی بزرگ (تخمین گرادیان ضعیف) باشد]
 </div>
 
 <br>
@@ -533,7 +533,7 @@
 **59. ['Exact' result, Direct computation, Used in the final implementation]**
 
 <div dir="rtl">
-]نتیجه ‘عینی، محاسبه مستقیم، در پیاده¬سازی نهایی استفاده می¬شود[
+[نتیجه ‘عینی، محاسبه مستقیم، در پیاده‌سازی نهایی استفاده میشود]
 </div>
 
 <br>
@@ -542,7 +542,7 @@
 **60. The Deep Learning cheatsheets are now available in [target language].
 
 <div dir="rtl">
-راهنمای¬ یادگیری عمیق هم اکنون به زبان ]فارسی[ در دسترس است.
+راهنمای یادگیری عمیق هم اکنون به زبان [فارسی] در دسترس است.
 </div>
 
 **61. Original authors**
@@ -572,7 +572,7 @@
 **64.View PDF version on GitHub**
 
 <div dir="rtl">
-نسخه پی¬دی¬اف را در گیت¬هاب ببینید
+نسخه پی‌دی‌اف را در گیت‌هاب ببینید
 </div>
 
 <br>

From 20329f6ed524970239130b763167fbacdcb3cee9 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 22:43:21 +0330
Subject: [PATCH 108/531] Update deep-learning-tips-and-tricks.md

---
 fa/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 6bfed3e51..badfecdec 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -539,7 +539,7 @@
 <br>
 
 
-**60. The Deep Learning cheatsheets are now available in [target language].
+**60. The Deep Learning cheatsheets are now available in [target language].**
 
 <div dir="rtl">
 راهنمای یادگیری عمیق هم اکنون به زبان [فارسی] در دسترس است.

From f07ce042c74c0596c7254f1a7bfb6ccafd1f352f Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 22:46:46 +0330
Subject: [PATCH 109/531]  fix some grammatical mistakes

---
 fa/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index badfecdec..953556648 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -247,7 +247,7 @@
 **28. Updating weights ― In a neural network, weights are updated as follows:**
 
 <div dir="rtl">
-بروزرسانی وزنها – در یک شبکه عصبی، وزن‌ها به شکل زیر بروزرسانی میشوند:
+بروزرسانی وزن‌ها – در یک شبکه عصبی، وزن‌ها به شکل زیر بروزرسانی میشوند:
 </div>
 
 <br>

From 80b3c6ce8c6344143c17f032acb4f83a67006859 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 22:48:57 +0330
Subject: [PATCH 110/531]  fix some grammatical mistakes

---
 fa/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 953556648..5fbc05718 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -497,7 +497,7 @@
 **55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
 
 <div dir="rtl">
-وارسی گرادیان – وارسی گرادیان متدی است که در طول پیاده سازی گذر روبه‌عقب یک شبکه عصبی استفاده می شود. این متد مقدار گرادیان تحلیلی را با گرادیان عددی در نقطه‌های مفروض را مقایسه میکند و نقش بررسی‌درستی را ایفا میکند. 
+وارسی گرادیان – وارسی گرادیان متدی است که در طول پیاده سازی گذر روبه‌عقبِ یک شبکه عصبی استفاده می شود. این متد مقدار گرادیان تحلیلی را با گرادیان عددی در نقطه‌های مفروض را مقایسه میکند و نقش بررسی‌درستی را ایفا میکند. 
 </div>
 
 <br>

From e271f88b5c2760e57be59bf9a5a9f464195fffe6 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 22:49:54 +0330
Subject: [PATCH 111/531]  fixed some grammatical mistakes

---
 fa/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 5fbc05718..23c49d024 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -533,7 +533,7 @@
 **59. ['Exact' result, Direct computation, Used in the final implementation]**
 
 <div dir="rtl">
-[نتیجه ‘عینی، محاسبه مستقیم، در پیاده‌سازی نهایی استفاده میشود]
+[نتیجه 'عینی'، محاسبه مستقیم، در پیاده‌سازی نهایی استفاده میشود]
 </div>
 
 <br>

From 378af371ea06f656d2e0addf09aa8e776fb3bda0 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 22:51:34 +0330
Subject: [PATCH 112/531]  fixed some grammatical mistakes

---
 fa/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 23c49d024..fe9809aaa 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -68,7 +68,7 @@
 **8. [Good practices, Overfitting small batch, Gradient checking]**
 
 <div dir="rtl">
-[تمرینات خوب، برارزش دسته کوچک، وارسی گرادیان]
+[تمرینات خوب، برارزش دسته‌ی‌کوچک، وارسی گرادیان]
 </div>
 
 <br>

From a5d535f2acf38bbb0a20817f409bf3ce436ea9bf Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 22:54:21 +0330
Subject: [PATCH 113/531]  fixed some grammatical mistakes

---
 fa/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index fe9809aaa..e8d419930 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -130,7 +130,7 @@
 **15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
 
 <div dir="rtl">
-[عناصر RGB کمی تغییر کرده است، نویزی که در هنگام مواجهه با نور رخ می‌دهد را می‌گیرد، افزودگی نویز، مقاومت بیشتر به تغییر کیفیت ورودی‌ها، بخش‌هایی از تصویر نادیده گرفته میشوند، تقلید (شبیه سازی) هدررفت بالقوه بخش هایی از تصویر، تغییر درخشندگی، با توجه به زمان روز، تفاوت نمایش (تصویر) را کنترل می‌کند]
+[عناصر RGB کمی تغییر کرده است، نویزی که در هنگام مواجهه شدن با نور رخ می‌دهد را می‌گیرد، افزودگی نویز، مقاومت بیشتر به تغییر کیفیت ورودی‌ها، بخش‌هایی از تصویر نادیده گرفته میشوند، تقلید (شبیه سازی) هدررفت بالقوه بخش هایی از تصویر، تغییر درخشندگی، با توجه به زمان روز، تفاوت نمایش (تصویر) را کنترل می‌کند]
 </div>
 
 <br>

From ed593a369b0ace37bec496175dfd75d1bca23937 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 22:55:30 +0330
Subject: [PATCH 114/531]  fixed some grammatical mistakes

---
 fa/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index e8d419930..7bcaec52d 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -130,7 +130,7 @@
 **15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
 
 <div dir="rtl">
-[عناصر RGB کمی تغییر کرده است، نویزی که در هنگام مواجهه شدن با نور رخ می‌دهد را می‌گیرد، افزودگی نویز، مقاومت بیشتر به تغییر کیفیت ورودی‌ها، بخش‌هایی از تصویر نادیده گرفته میشوند، تقلید (شبیه سازی) هدررفت بالقوه بخش هایی از تصویر، تغییر درخشندگی، با توجه به زمان روز، تفاوت نمایش (تصویر) را کنترل می‌کند]
+[عناصر RGB کمی تغییر کرده است، نویزی که در هنگام مواجهه شدن با نور رخ می‌دهد را می‌گیرد، افزودگی نویز، مقاومت بیشتر به تغییر کیفیت ورودی‌ها، بخش‌هایی از تصویر نادیده گرفته می‌شوند، تقلید (شبیه سازی) هدررفت بالقوه بخش هایی از تصویر، تغییر درخشندگی، با توجه به زمان روز، تفاوت نمایش (تصویر) را کنترل می‌کند]
 </div>
 
 <br>

From 6fcdacf20f95605e88323deb595dcfc2515563a8 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 22:56:39 +0330
Subject: [PATCH 115/531]  fixed some grammatical mistakes

---
 fa/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 7bcaec52d..84391211f 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -130,7 +130,7 @@
 **15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
 
 <div dir="rtl">
-[عناصر RGB کمی تغییر کرده است، نویزی که در هنگام مواجهه شدن با نور رخ می‌دهد را می‌گیرد، افزودگی نویز، مقاومت بیشتر به تغییر کیفیت ورودی‌ها، بخش‌هایی از تصویر نادیده گرفته می‌شوند، تقلید (شبیه سازی) هدررفت بالقوه بخش هایی از تصویر، تغییر درخشندگی، با توجه به زمان روز، تفاوت نمایش (تصویر) را کنترل می‌کند]
+[عناصر RGB کمی تغییر کرده است، نویزی که در هنگام مواجهه شدن با نور رخ می‌دهد را می‌گیرد، افزودگی نویز، مقاومت بیشتر به تغییر کیفیت ورودی‌ها، بخش‌هایی از تصویر نادیده گرفته می‌شوند، تقلید (شبیه سازی) هدررفت بالقوه بخش هایی از تصویر، تغییر درخشندگی، با توجه به زمان روز تفاوت نمایش (تصویر) را کنترل می‌کند]
 </div>
 
 <br>

From ff5c4f0c47c4723458114671454452387dc0c55e Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 22:59:38 +0330
Subject: [PATCH 116/531]  fixed some grammatical mistakes

---
 fa/deep-learning-tips-and-tricks.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 84391211f..f41e35f63 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -193,7 +193,7 @@
 **22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
 
 <div dir="rtl">
-گرادیان نزولی دسته‌ی‌کوچک ―  در فاز آموزش، بروزرسانی وزن‌ها معمولا بر مبنای تمامی مجموعه آموزش به علت پیچیدگی‌های محاسباتی، یا یک نمونه داده به علت مشکل نویز نیست. در عوض، گام بروزرسانی بر روی دسته‌های کوچک انجام می شود، که تعداد نمونه‌های داده در یک دسته فراعاملی است که میتوان آن را تنظیم کرد.
+گرادیان نزولی دسته‌ی‌کوچک ―  در فاز آموزش، بروزرسانی وزن‌ها معمولا بر مبنای تمامی مجموعه آموزش به علت پیچیدگی‌های محاسباتی، یا یک نمونه داده به علت مشکل نویز، نیست. در عوض، گام بروزرسانی بر روی دسته‌های کوچک انجام می شود، که تعداد نمونه‌های داده در یک دسته فراعاملی است که میتوان آن را تنظیم کرد.
 </div>
 
 <br>
@@ -202,7 +202,7 @@
 **23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
 
 <div dir="rtl">
-تابع خطا ―  به منظور سنجش کارایی یک مدل مفروض، معمولا از تابع خطا L برای ارزیابی اینکه تا چه حد خروجی حقیقی y به شکل صحیح توسط خروجی z مدل پیش‌بینی شدهاند، استفاده می‌شود. 
+تابع خطا ―  به منظور سنجش کارایی یک مدل مفروض، معمولا از تابع خطا L برای ارزیابی اینکه تا چه حد خروجی حقیقی y به شکل صحیح توسط خروجی z مدل پیش‌بینی شده‌اند، استفاده می‌شود. 
 </div>
 
 <br>

From 368d824a224f798854c14dd83d4ffa04b7586ab5 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 23:01:55 +0330
Subject: [PATCH 117/531]  fixed some grammatical mistakes

---
 fa/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index f41e35f63..329676008 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -247,7 +247,7 @@
 **28. Updating weights ― In a neural network, weights are updated as follows:**
 
 <div dir="rtl">
-بروزرسانی وزن‌ها – در یک شبکه عصبی، وزن‌ها به شکل زیر بروزرسانی میشوند:
+بروزرسانی وزن‌ها – در یک شبکه عصبی، وزن‌ها به شکل زیر بروزرسانی می‌شوند:
 </div>
 
 <br>

From 2507ce7d111baf02168816518750cf2a67ee9f8f Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 23:08:14 +0330
Subject: [PATCH 118/531]  fix some grammatical mistakes

---
 fa/deep-learning-tips-and-tricks.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 329676008..0164b5e0e 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -283,7 +283,7 @@
 **32. Weights initialization**
 
 <div dir="rtl">
-مقداردهی‌اولیه وزنها
+مقداردهی‌اولیه وزن‌ها
 </div>
 
 <br>
@@ -301,7 +301,7 @@
 **34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
 
 <div dir="rtl">
-یادگیری انتقالی ― آموزش یک مدل یادگیری عمیق به داده‌های زیاد و مهم تر از آن به زمان زیادی احتیاج دارد. اغلب استفاده از مزیت وزنهای ازقبل‌آموزش داده شده برروی پایگاه داده‌های عظیم که روزها/هفته‌ها طول می‌کشند تا آموزش ببینند، مفید است، و  می‌توان از قدرت آن برای مورد استفاده‌مان بهره جست. بسته به میزان داده‌هایی که در اختیار داریم ، در زیر روشهای مختلفی که می‌توان از آنها بهره جست آورده شده‌اند:
+یادگیری انتقالی ― آموزش یک مدل یادگیری عمیق به داده‌های زیاد و مهم تر از آن به زمان زیادی احتیاج دارد. اغلب استفاده از مزیت وزنهای ازقبل‌آموزش داده شده برروی پایگاه داده‌های عظیم که روزها/هفته‌ها طول می‌کشند تا آموزش ببینند، مفید است، و  می‌توان از قدرت آن برای مورد استفاده‌مان بهره جست. بسته به میزان داده‌هایی که در اختیار داریم ، در زیر روش‌های مختلفی که می‌توان از آنها بهره جست آورده شده‌اند:
 </div>
 
 <br>
@@ -356,7 +356,7 @@
 **40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
 
 <div dir="rtl">
-نرخ‌های یادگیری سازگارشونده ― داشتن نرخ یادگیری متغیر در فرآیند آموزش یک مدل،  می‌تواند زمان آموزش را کاهش دهد و راه‌حل بهینه عددی را بهبود ببخشد. با آنکه بهینه ساز Adam محبوب‌ترین متد مورد استفاده است، دیگر متدها نیز میتوانند مفید باشند. این متد ها در جدول زیر به اختصار آمده‌اند:
+نرخ‌های یادگیری سازگارشونده ― داشتن نرخ یادگیری متغیر در فرآیند آموزش یک مدل،  می‌تواند زمان آموزش را کاهش دهد و راه‌حل بهینه عددی را بهبود ببخشد. با آنکه بهینه ساز Adam محبوب‌ترین متد مورد استفاده است، دیگر متدها نیز می‌توانند مفید باشند. این متد ها در جدول زیر به اختصار آمده‌اند:
 </div>
 
 <br>
@@ -419,7 +419,7 @@
 **47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
 
 <div dir="rtl">
-برون‌اندازی – برون‌اندازی روشی است که در شبکه های عصبی برای جلوگیری از برارزش شدن بر روی داده‌های آموزشی با حذف تصادفی نورون‌ها با احتمال p>0 استفاده میشود. این روش مدل را مجبور میکند تا از تکیه کردن بیش از حد بر روی مجموعه خاصی از ویژگی‌ها خودداری کند.
+برون‌اندازی – برون‌اندازی روشی است که در شبکه‌های عصبی برای جلوگیری از برارزش شدن بر روی داده‌های آموزشی با حذف تصادفی نورون‌ها با احتمال p>0 استفاده می‌شود. این روش مدل را مجبور می‌کند تا از تکیه کردن بیش از حد بر روی مجموعه خاصی از ویژگی‌ها خودداری کند.
 </div>
 
 <br>
@@ -428,7 +428,7 @@
 **48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
 
 <div dir="rtl">
-نکته: بیشتر فریم ورکهای یادگیری عمیق برون‌اندازی را به شکل پارامتر ‘keep’  1-p درمی-آورند.
+نکته: بیشتر فریم ورک‌های یادگیری عمیق برون‌اندازی را به شکل پارامتر ‘keep’  1-p درمی-آورند.
 </div>
 
 <br>
@@ -437,7 +437,7 @@
 **49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
 
 <div dir="rtl">
-نظام‌بخشی وزن – برای اطمینان از اینکه (مقادیر) وزن‌ها زیادی بزرگ نیستند و مدل به مجموعه آموزش بیش‌برارزش نیست، روشهای نظام‌بخشی معمولا بر روی وزن‌های مدل اجرا می‌شوند. اصلی‌ترین آنها در جدول زیر به اختصار آمده اند:
+نظام‌بخشی وزن – برای اطمینان از اینکه (مقادیر) وزن‌ها زیادی بزرگ نیستند و مدل به مجموعه آموزش بیش‌برارزش نیست، روشهای نظام‌بخشی معمولا بر روی وزن‌های مدل اجرا می‌شوند. اصلی‌ترین آنها در جدول زیر به اختصار آمده‌اند:
 </div>
 
 <br>
@@ -497,7 +497,7 @@
 **55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
 
 <div dir="rtl">
-وارسی گرادیان – وارسی گرادیان متدی است که در طول پیاده سازی گذر روبه‌عقبِ یک شبکه عصبی استفاده می شود. این متد مقدار گرادیان تحلیلی را با گرادیان عددی در نقطه‌های مفروض را مقایسه میکند و نقش بررسی‌درستی را ایفا میکند. 
+وارسی گرادیان – وارسی گرادیان متدی است که در طول پیاده سازی گذر روبه‌عقبِ یک شبکه عصبی استفاده می‌شود. این متد مقدار گرادیان تحلیلی را با گرادیان عددی در نقطه‌های مفروض را مقایسه می‌کند و نقش بررسی‌درستی را ایفا میکند. 
 </div>
 
 <br>
@@ -524,7 +524,7 @@
 **58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
 
 <div dir="rtl">
-[گران (محاسباتی)،  خطا باید دو بار در هر بُعد محاسبه شود، برای تایید صحت پیاده‌سازی تحلیلی استفاده میشود، مصالحه در انتخاب h: نه بسیار کوچک (ناپایداری عددی) و نه خیلی بزرگ (تخمین گرادیان ضعیف) باشد]
+[گران (محاسباتی)،  خطا باید دو بار در هر بُعد محاسبه شود، برای تایید صحت پیاده‌سازی تحلیلی استفاده می‌شود، مصالحه در انتخاب h: نه بسیار کوچک (ناپایداری عددی) و نه خیلی بزرگ (تخمین گرادیان ضعیف) باشد]
 </div>
 
 <br>
@@ -533,7 +533,7 @@
 **59. ['Exact' result, Direct computation, Used in the final implementation]**
 
 <div dir="rtl">
-[نتیجه 'عینی'، محاسبه مستقیم، در پیاده‌سازی نهایی استفاده میشود]
+[نتیجه 'عینی'، محاسبه مستقیم، در پیاده‌سازی نهایی استفاده می‌شود]
 </div>
 
 <br>

From 72dd52d90814559f4076379caca2c2128456cf8a Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Thu, 7 Feb 2019 23:28:48 +0330
Subject: [PATCH 119/531] Update deep-learning-tips-and-tricks.md

---
 fa/deep-learning-tips-and-tricks.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 0164b5e0e..d81e964b2 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -100,7 +100,6 @@
 
 <br>
 
-
 **12. [Original, Flip, Rotation, Random crop]**
 
 <div dir="rtl">

From f3c8548d62fc683fe74d0d14eb8c7c83fc133912 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Sun, 10 Feb 2019 12:18:28 +0330
Subject: [PATCH 120/531] Ready to merge

---
 fa/deep-learning-tips-and-tricks.md | 71 +++++++++++++++--------------
 1 file changed, 38 insertions(+), 33 deletions(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index d81e964b2..9cb0410b7 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -1,3 +1,4 @@
+ 
 **Deep Learning Tips and Tricks translation**
 
 <br>
@@ -41,7 +42,7 @@
 **5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
 
 <div dir="rtl">
-[آموزش یک شبکه عصبی، تکرار(Epoch)، دسته‌ی‌کوچک، خطای آنتروپی متقاطع، انتشار معکوس، گرادیان نزولی، بروزرسانی وزن‌ها، وارسی گرادیان]
+[آموزش یک شبکه‌ی عصبی، تکرار(Epoch)، دسته‌ی‌کوچک، خطای آنتروپی متقاطع، انتشار معکوس، گرادیان نزولی، بروزرسانی وزن‌ها، وارسی گرادیان]
 </div>
 
 <br>
@@ -50,7 +51,7 @@
 **6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
 
 <div dir="rtl">
-[تنظیم پارامتر، مقداردهی‌اولیه ژاویر،یادگیری انتقالی، نرخ یادگیری، نرخ یادگیری سازگارشونده]
+[تنظیم فراسنج، مقداردهی اولیه Xavier،یادگیری انتقالی، نرخ یادگیری، نرخ یادگیری سازگارشونده]
 </div>
 
 <br>
@@ -59,7 +60,7 @@
 **7. [Regularization, Dropout, Weight regularization, Early stopping]**
 
 <div dir="rtl">
-[نظام‌بخشی، برون‌اندازی، نظام‌بخشی وزن، توقف‌زودهنگام]
+[نظام‌بخشی، برون‌اندازی، نظام‌بخشی وزن، توقف زودهنگام]
 </div>
 
 <br>
@@ -68,7 +69,7 @@
 **8. [Good practices, Overfitting small batch, Gradient checking]**
 
 <div dir="rtl">
-[تمرینات خوب، برارزش دسته‌ی‌کوچک، وارسی گرادیان]
+[عادت‌های خوب، بیش‌برارزش دسته‌ی کوچک، وارسی گرادیان]
 </div>
 
 <br>
@@ -95,7 +96,7 @@
 **11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
 
 <div dir="rtl">
-داده‌افزایی ― مدل‌های یادگیری عمیق معمولا به داده زیادی نیاز دارند تا بتوانند به خوبی آموزش ببینند. اغلب، استفاده از روشهای داده افزایی برای گرفتن داده بیشتر از داده موجود، مفید است. اصلی‌ترین آنها در جدول زیر به اختصار آمده اند. به عبارت دقیق‌تر، با در نظر گرفتن تصویر ورودی زیر، روش‌هایی که میتوانم اعمال کرد بدین شرح هستند:
+داده‌افزایی ― مدل‌های یادگیری عمیق معمولا به داده‌های زیادی نیاز دارند تا بتوانند به خوبی آموزش ببینند. اغلب، استفاده از روشهای داده‌افزایی برای گرفتن داده بیشتر از داده‌های موجود، مفید است. اصلی‌ترین آنها در جدول زیر به اختصار آمده‌اند. به عبارت دقیق‌تر، با در نظر گرفتن تصویر ورودی زیر، روش‌هایی که می‌توانم اعمال کرد بدین شرح هستند:
 </div>
 
 <br>
@@ -103,7 +104,7 @@
 **12. [Original, Flip, Rotation, Random crop]**
 
 <div dir="rtl">
-[آغازین، قرینه، چرخش، برش تصادفی]
+[تصویر اصلی، قرینه، چرخش، برش تصادفی]
 </div>
 
 <br>
@@ -112,7 +113,7 @@
 **13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
 
 <div dir="rtl">
-[تصویر (آغازین) بدون هیچ‌گونه اصلاحی، قرینه شده به نسبت یک محور بطوری‌که معنای تصویر حفظ شده است، چرخش با یک زاویه کم، ............، تمرکز تصادفی بر روی یک بخش از تصویر، چندین برش تصادفی را میتوان پشت‌سرهم انجام داد]
+[تصویر (آغازین) بدون هیچ‌گونه تغییری، قرینه‌شده نسبت به محوری که معنای (محتوای) تصویر را حفظ می‌کند، چرخش با زاویه‌ی اندک، خط افق نادرست را شبیه‌سازی می‌کند، روی ناحیه‌ای تصادفی از تصویر متمرکز می‌شود، چندین برش تصادفی را میتوان پشت‌سرهم انجام داد]
 </div>
 <br>
 
@@ -120,7 +121,7 @@
 **14. [Color shift, Noise addition, Information loss, Contrast change]**
 
 <div dir="rtl">
-[تغییر رنگ، افزودگی نویز، هدررفت اطلاعات، تغییر تباین(کُنتراست)]
+[تغییر رنگ، اضافه‌کردن نویز، هدررفت اطلاعات، تغییر تباین(کُنتراست)]
 </div>
 
 <br>
@@ -129,7 +130,7 @@
 **15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
 
 <div dir="rtl">
-[عناصر RGB کمی تغییر کرده است، نویزی که در هنگام مواجهه شدن با نور رخ می‌دهد را می‌گیرد، افزودگی نویز، مقاومت بیشتر به تغییر کیفیت ورودی‌ها، بخش‌هایی از تصویر نادیده گرفته می‌شوند، تقلید (شبیه سازی) هدررفت بالقوه بخش هایی از تصویر، تغییر درخشندگی، با توجه به زمان روز تفاوت نمایش (تصویر) را کنترل می‌کند]
+[عناصر RGB کمی تغییر کرده است، نویزی که در هنگام مواجه شدن با نور رخ می‌دهد را شبیه‌سازی می‌کند، افزودگی نویز، مقاومت بیشتر نسبت به تغییر کیفیت تصاویر ورودی، بخش‌هایی از تصویر نادیده گرفته می‌شوند، تقلید (شبیه سازی) هدررفت بالقوه بخش هایی از تصویر، تغییر درخشندگی، با توجه به زمان روز تفاوت نمایش (تصویر) را کنترل می‌کند]
 </div>
 
 <br>
@@ -138,7 +139,7 @@
 **16. Remark: data is usually augmented on the fly during training.**
 
 <div dir="rtl">
-نکته: داده معمولا در فرآیند آموزش (به صورت درجا) افزایش پیدا می‌کند.
+نکته: داده‌ها معمولا در فرآیند آموزش (به صورت درجا) افزایش پیدا می‌کنند.
 </div>
 
 <br>
@@ -147,7 +148,7 @@
 **17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
 
 <div dir="rtl">
-نرمال‌سازی دسته‌ای ― یک مرحله از فراعامل‌های γ و β که دسته‌ی {xi} را نرمال می‌کند. نماد μB و σ2B به میانگین و واریانس دسته‌ای که می‌خواهیم آن را اصلاح کنیم اشاره دارد که به صورت زیر است:
+نرمال‌سازی دسته‌ای ― یک مرحله از فراسنج‌های γ و β که دسته‌ی {xi} را نرمال می‌کند. نماد μB و σ2B به میانگین و وردایی دسته‌ای که می‌خواهیم آن را اصلاح کنیم اشاره دارد که به صورت زیر است:
 </div>
 
 <br>
@@ -165,7 +166,7 @@
 **19. Training a neural network**
 
 <div dir="rtl">
-آموزش یک شبکه عصبی
+آموزش یک شبکه‌ی عصبی
 </div>
 
 <br>
@@ -192,7 +193,7 @@
 **22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
 
 <div dir="rtl">
-گرادیان نزولی دسته‌ی‌کوچک ―  در فاز آموزش، بروزرسانی وزن‌ها معمولا بر مبنای تمامی مجموعه آموزش به علت پیچیدگی‌های محاسباتی، یا یک نمونه داده به علت مشکل نویز، نیست. در عوض، گام بروزرسانی بر روی دسته‌های کوچک انجام می شود، که تعداد نمونه‌های داده در یک دسته فراعاملی است که میتوان آن را تنظیم کرد.
+گرادیان نزولی دسته‌ی‌کوچک ―  در فاز آموزش، بروزرسانی وزن‌ها معمولا بر مبنای تمامی مجموعه آموزش به علت پیچیدگی‌های محاسباتی، یا یک نمونه داده به علت مشکل نویز، نیست. در عوض، گام بروزرسانی بر روی دسته‌های کوچک انجام می شود، که تعداد نمونه‌های داده در یک دسته ابرفراسنج است که میتوان آن را تنظیم کرد.
 </div>
 
 <br>
@@ -201,7 +202,7 @@
 **23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
 
 <div dir="rtl">
-تابع خطا ―  به منظور سنجش کارایی یک مدل مفروض، معمولا از تابع خطا L برای ارزیابی اینکه تا چه حد خروجی حقیقی y به شکل صحیح توسط خروجی z مدل پیش‌بینی شده‌اند، استفاده می‌شود. 
+تابع خطا ―  به منظور سنجش کارایی یک مدل مفروض، معمولا از تابع خطای L برای ارزیابی اینکه تا چه حد خروجی حقیقی y به شکل صحیح توسط خروجی z مدل پیش‌بینی شده‌اند، استفاده می‌شود. 
 </div>
 
 <br>
@@ -246,7 +247,7 @@
 **28. Updating weights ― In a neural network, weights are updated as follows:**
 
 <div dir="rtl">
-بروزرسانی وزن‌ها – در یک شبکه عصبی، وزن‌ها به شکل زیر بروزرسانی می‌شوند:
+بروزرسانی وزن‌ها – در یک شبکه‌ی عصبی، وزن‌ها به شکل زیر بروزرسانی می‌شوند:
 </div>
 
 <br>
@@ -255,7 +256,7 @@
 **29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
 
 <div dir="rtl">
-[گام 1: یک دسته از داده‌های آموزشی را بگیر و انتشارمستقیم را برای محاسبه خطا اجرا کن، گام 2: خطا را برای گرفتن گرادیان آن به نسبت هر وزن انتشارمعکوس بده، گام 3: از گرادیان‌ها برای بروزرسانی وزن‌های شبکه استفاده کن.]
+[گام 1: یک دسته از داده‌های آموزشی گرفته شده و با استفاده از انتشار مستقیم خطا محاسبه می‌شود، گام 2: با استفاده از انتشار معکوس مشتق خطا نسبت به هر وزن محاسبه می‌شود، گام 3: با استفاده از مشتقات، وزن‌های شبکه به‌روزرسانی می‌شوند.]
 </div>
 
 <br>
@@ -264,7 +265,7 @@
 **30. [Forward propagation, Backpropagation, Weights update]**
 
 <div dir="rtl">
-[انتشارمستقیم، انتشار معکوس، بروزرسانی وزنها]
+[انتشار مستقیم، انتشار معکوس، به‌روزرسانی وزنها]
 </div>
 
 <br>
@@ -273,7 +274,7 @@
 **31. Parameter tuning**
 
 <div dir="rtl">
-تنظیم پارامتر
+تنظیم فراسنج
 </div>
 
 <br>
@@ -282,7 +283,7 @@
 **32. Weights initialization**
 
 <div dir="rtl">
-مقداردهی‌اولیه وزن‌ها
+مقداردهی اولیه وزن‌ها
 </div>
 
 <br>
@@ -291,7 +292,7 @@
 **33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
 
 <div dir="rtl">
-مقداردهی‌اولیه ژاویر ―  به‌جای مقداردهی‌اولیه کردن وزن‌ها به شیوه‌ی کاملا تصادفی، مقداردهی‌اولیه ژاویر این امکان را فراهم میسازد تا وزن‌های اولیه‌ایی داشته باشیم که ویژگی‌های منحصر به فرد معماری را به حساب می‌آورند.
+مقداردهی‌ اولیه Xavier ―  به‌جای مقداردهی اولیه‌ی وزن‌ها به شیوه‌ی کاملا تصادفی، مقداردهی اولیه Xavier  این امکان را فراهم می‌سازد تا وزن‌های اولیه‌ای داشته باشیم که ویژگی‌های منحصر به فرد معماری را به حساب می‌آورند.
 </div>
 
 <br>
@@ -300,7 +301,7 @@
 **34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
 
 <div dir="rtl">
-یادگیری انتقالی ― آموزش یک مدل یادگیری عمیق به داده‌های زیاد و مهم تر از آن به زمان زیادی احتیاج دارد. اغلب استفاده از مزیت وزنهای ازقبل‌آموزش داده شده برروی پایگاه داده‌های عظیم که روزها/هفته‌ها طول می‌کشند تا آموزش ببینند، مفید است، و  می‌توان از قدرت آن برای مورد استفاده‌مان بهره جست. بسته به میزان داده‌هایی که در اختیار داریم ، در زیر روش‌های مختلفی که می‌توان از آنها بهره جست آورده شده‌اند:
+یادگیری انتقالی ― آموزش یک مدل یادگیری عمیق به داده‌های زیاد و مهم تر از آن به زمان زیادی احتیاج دارد. اغلب بهتر است که از وزن‌های پیش‌آموخته روی پایگاه داده‌های عظیم که آموزش بر روی آن‌ها روزها یا هفته‌ها طول می‌کشند استفاده کرد، و آن‌ها را برای موارد استفاده‌ی خود به کار برد. بسته به میزان داده‌هایی که در اختیار داریم، در زیر روش‌های مختلفی که می‌توان از آنها بهره جست آورده شده‌اند:
 </div>
 
 <br>
@@ -309,7 +310,7 @@
 **35. [Training size, Illustration, Explanation]**
 
 <div dir="rtl">
-[اندازه آموزش، نگاره، توضیح]
+[تعداد داده‌های آموزش، نگاره، توضیح]
 </div>
 
 <br>
@@ -346,7 +347,7 @@
 **
 
 <div dir="rtl">
-نرخ یادگیری – نرخ یادگیری اغلب با نماد α و گاهی اوقات با نماد η نمایش داده می‌شود و بیانگر سرعت (گام) بروزرسانی وزن‌ها است که می‌تواند مقداری ثابت یا به سازگارشونده تغییر کند. محبوب‌ترین روش حال حاضر Adam نام دارد، متدی است که نرخ یادگیری را در حین فرآیند آموزش تنظیم می‌کند.
+نرخ یادگیری – نرخ یادگیری اغلب با نماد α و گاهی اوقات با نماد η نمایش داده می‌شود و بیانگر سرعت (گام) بروزرسانی وزن‌ها است که می‌تواند مقداری ثابت داشته باشد یا به سازگارشونده تغییر کند. محبوب‌ترین روش حال حاضر Adam نام دارد، روشی است که نرخ یادگیری را در حین فرآیند آموزش تنظیم می‌کند.
 </div>
 
 <br>
@@ -355,7 +356,7 @@
 **40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
 
 <div dir="rtl">
-نرخ‌های یادگیری سازگارشونده ― داشتن نرخ یادگیری متغیر در فرآیند آموزش یک مدل،  می‌تواند زمان آموزش را کاهش دهد و راه‌حل بهینه عددی را بهبود ببخشد. با آنکه بهینه ساز Adam محبوب‌ترین متد مورد استفاده است، دیگر متدها نیز می‌توانند مفید باشند. این متد ها در جدول زیر به اختصار آمده‌اند:
+نرخ‌های یادگیری سازگارشونده ― داشتن نرخ یادگیری متغیر در فرآیند آموزش یک مدل،  می‌تواند زمان آموزش را کاهش دهد و راه‌حل بهینه عددی را بهبود ببخشد. با آنکه بهینه‌ساز Adam محبوب‌ترین روش مورد استفاده است، دیگر روش‌ها نیز می‌توانند مفید باشند. این روش‌ها در جدول زیر به اختصار آمده‌اند:
 </div>
 
 <br>
@@ -364,7 +365,7 @@
 **41. [Method, Explanation, Update of w, Update of b]**
 
 <div dir="rtl">
-[روش، توضیح، بروزرسانی w، بروزرسانی  b]
+[روش، توضیح، به‌روزرسانی w، به‌روزرسانی  b]
 </div>
 
 <br>
@@ -373,7 +374,7 @@
 **42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
 
 <div dir="rtl">
-[Momentum، نوسانات را تعدیل می‌دهد، بهبود SGD، 2 پارامتر نیاز به تنظیم دارند]
+[تکانه، نوسانات را تعدیل می‌دهد، بهبود SGD، دو  فراسنج که نیاز به تنظیم دارند]
 </div>
 
 <br>
@@ -391,7 +392,7 @@
 **44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
 
 <div dir="rtl">
-[Adam، تخمین سازگارشونده گشتاور، محبوب‌ترین متد، 4 پارامتر نیاز به تنظیم دارند]
+[Adam، تخمین سازگارشونده گشتاور، محبوب‌ترین متد، چهار فراسنج که نیاز به تنظیم دارند]
 </div>
 
 <br>
@@ -427,7 +428,7 @@
 **48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
 
 <div dir="rtl">
-نکته: بیشتر فریم ورک‌های یادگیری عمیق برون‌اندازی را به شکل پارامتر ‘keep’  1-p درمی-آورند.
+نکته: بیشتر فریم ورک‌های یادگیری عمیق برون‌اندازی را به شکل فراسنج ‘keep’  1-p درمی-آورند.
 </div>
 
 <br>
@@ -478,7 +479,7 @@
 **53. Good practices**
 
 <div dir="rtl">
-تمرینات خوب
+عادت‌های خوب
 </div>
 
 <br>
@@ -487,7 +488,7 @@
 **54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
 
 <div dir="rtl">
-بیش‌برارزش کردن دسته‌ی‌کوچک ―  هنگام اشکال‌زدایی یک مدل، اغلب مفید است که یک سری آزمایش‌های سریع برای اطمینان از اینکه هیچ مشکل عمده‌ای در معماری مدل وجود دارد انجام شود. به طورخاص، برای اطمینان از اینکه مدل می‌تواند به شکل صحیح آموزش ببیند، یک دسته‌ی‌کوچک (از داده‌ها) به شبکه داده می‌شود تا دریابیم که مدل می‌تواند به آنها بیش‌برارزش کند. اگر نتوانید، بدین معناست که مدل یا خیلی پیچیده است یا پیچیدگی لازم برای بیش‌برارزش شدن برروی دسته‌ی‌کوچک را ندارد، چه برسد به یک مجموعه آموزشی با اندازه عادی.
+بیش‌برارزش روی دسته‌ی ‌کوچک ―  هنگام اشکال‌زدایی یک مدل، اغلب مفید است که یک سری آزمایش‌های سریع برای اطمینان از اینکه هیچ مشکل عمده‌ای در معماری مدل وجود ندارد، انجام شود. به طورخاص، برای اطمینان از اینکه مدل می‌تواند به شکل صحیح آموزش ببیند، یک دسته‌ی‌ کوچک (از داده‌ها) به شبکه داده می‌شود تا دریابیم که مدل می‌تواند به آنها بیش‌برارزش کند. اگر نتواند، بدین معناست که مدل از پیچیدگی بالایی برخوردار است یا پیچیدگی کافی برای بیش‌برارزش شدن روی دستهی کوچک ندارد، چه برسد به یک مجموعه آموزشی با اندازه عادی.
 </div>
 
 <br>
@@ -496,7 +497,7 @@
 **55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
 
 <div dir="rtl">
-وارسی گرادیان – وارسی گرادیان متدی است که در طول پیاده سازی گذر روبه‌عقبِ یک شبکه عصبی استفاده می‌شود. این متد مقدار گرادیان تحلیلی را با گرادیان عددی در نقطه‌های مفروض را مقایسه می‌کند و نقش بررسی‌درستی را ایفا میکند. 
+وارسی گرادیان – وارسی گرادیان روشی است که در طول پیادهسازی گذر روبه‌عقبِ یک شبکهی عصبی استفاده می‌شود. این روش مقدار گرادیان تحلیلی را با گرادیان عددی در نقطه‌های مفروض مقایسه می‌کند و نقش بررسی درستی را ایفا میکند. 
 </div>
 
 <br>
@@ -523,7 +524,7 @@
 **58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
 
 <div dir="rtl">
-[گران (محاسباتی)،  خطا باید دو بار در هر بُعد محاسبه شود، برای تایید صحت پیاده‌سازی تحلیلی استفاده می‌شود، مصالحه در انتخاب h: نه بسیار کوچک (ناپایداری عددی) و نه خیلی بزرگ (تخمین گرادیان ضعیف) باشد]
+[پرهزینه (از نظر محاسباتی)،  خطا باید دو بار برای هر بُعد محاسبه شود، برای تایید صحت پیاده‌سازی تحلیلی استفاده می‌شود، مصالحه در انتخاب h: نه بسیار کوچک (ناپایداری عددی) و نه خیلی بزرگ (تخمین گرادیان ضعیف) باشد]
 </div>
 
 <br>
@@ -547,7 +548,7 @@
 **61. Original authors**
 
 <div dir="rtl">
-متن اصلی از
+نویسندگان اصلی
 </div>
 
 <br>
@@ -583,3 +584,7 @@
 </div>
 
 <br>
+
+
+
+

From 97b46e53fdc9906f4b298678aeabd142fb19a05d Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Sun, 10 Feb 2019 13:06:04 +0330
Subject: [PATCH 121/531] Ready to merge #2

---
 fa/deep-learning-tips-and-tricks.md | 28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 9cb0410b7..9acd12a30 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -1,4 +1,4 @@
- 
+
 **Deep Learning Tips and Tricks translation**
 
 <br>
@@ -42,7 +42,7 @@
 **5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
 
 <div dir="rtl">
-[آموزش یک شبکه‌ی عصبی، تکرار(Epoch)، دسته‌ی‌کوچک، خطای آنتروپی متقاطع، انتشار معکوس، گرادیان نزولی، بروزرسانی وزن‌ها، وارسی گرادیان]
+[آموزش یک شبکه‌ی عصبی، تکرار(Epoch)، دسته‌ی‌کوچک، خطای آنتروپی متقاطع، انتشار معکوس، گرادیان نزولی، به‌روزرسانیوزن‌ها، وارسی گرادیان]
 </div>
 
 <br>
@@ -184,7 +184,7 @@
 **21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
 
 <div dir="rtl">
-تکرار (epoch) ― در مضمون آموزش یک مدل، تکرار اصطلاحی است که مدل در یک دوره تکرار تمامی نمونه‌های آموزشی را برای بروزرسانی وزن‌ها می‌بیند.
+تکرار (epoch) ― در مضمون آموزش یک مدل، تکرار اصطلاحی است که مدل در یک دوره تکرار تمامی نمونه‌های آموزشی را برای به‌روزرسانیوزن‌ها می‌بیند.
 </div>
 
 <br>
@@ -193,7 +193,7 @@
 **22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
 
 <div dir="rtl">
-گرادیان نزولی دسته‌ی‌کوچک ―  در فاز آموزش، بروزرسانی وزن‌ها معمولا بر مبنای تمامی مجموعه آموزش به علت پیچیدگی‌های محاسباتی، یا یک نمونه داده به علت مشکل نویز، نیست. در عوض، گام بروزرسانی بر روی دسته‌های کوچک انجام می شود، که تعداد نمونه‌های داده در یک دسته ابرفراسنج است که میتوان آن را تنظیم کرد.
+گرادیان نزولی دسته‌ی‌کوچک ―  در فاز آموزش، به‌روزرسانیوزن‌ها معمولا بر مبنای تمامی مجموعه آموزش به علت پیچیدگی‌های محاسباتی، یا یک نمونه داده به علت مشکل نویز، نیست. در عوض، گام به‌روزرسانیبر روی دسته‌های کوچک انجام می شود، که تعداد نمونه‌های داده در یک دسته ابرفراسنج است که میتوان آن را تنظیم کرد.
 </div>
 
 <br>
@@ -229,7 +229,7 @@
 **26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
 
 <div dir="rtl">
-انتشار معکوس ―  انتشار معکوس روشی برای بروزرسانی وزن‌ها با توجه به خروجی واقعی و خروجی مورد انتظار در شبکه‌ی عصبی است. مشتق نسبت به هر وزن w توسط قاعده‌ی زنجیری محاسبه می‌شود.
+انتشار معکوس ―  انتشار معکوس روشی برای به‌روزرسانیوزن‌ها با توجه به خروجی واقعی و خروجی مورد انتظار در شبکه‌ی عصبی است. مشتق نسبت به هر وزن w توسط قاعده‌ی زنجیری محاسبه می‌شود.
 </div>
 
 <br>
@@ -238,7 +238,7 @@
 **27. Using this method, each weight is updated with the rule:**
 
 <div dir="rtl">
-با استفاده از این روش، هر وزن با قانون زیر بروزرسانی می‌شود:
+با استفاده از این روش، هر وزن با قانون زیر به‌روزرسانیمی‌شود:
 </div>
 
 <br>
@@ -247,7 +247,7 @@
 **28. Updating weights ― In a neural network, weights are updated as follows:**
 
 <div dir="rtl">
-بروزرسانی وزن‌ها – در یک شبکه‌ی عصبی، وزن‌ها به شکل زیر بروزرسانی می‌شوند:
+به‌روزرسانیوزن‌ها – در یک شبکه‌ی عصبی، وزن‌ها به شکل زیر به‌روزرسانیمی‌شوند:
 </div>
 
 <br>
@@ -347,7 +347,7 @@
 **
 
 <div dir="rtl">
-نرخ یادگیری – نرخ یادگیری اغلب با نماد α و گاهی اوقات با نماد η نمایش داده می‌شود و بیانگر سرعت (گام) بروزرسانی وزن‌ها است که می‌تواند مقداری ثابت داشته باشد یا به سازگارشونده تغییر کند. محبوب‌ترین روش حال حاضر Adam نام دارد، روشی است که نرخ یادگیری را در حین فرآیند آموزش تنظیم می‌کند.
+نرخ یادگیری – نرخ یادگیری اغلب با نماد α و گاهی اوقات با نماد η نمایش داده می‌شود و بیانگر سرعت (گام) به‌روزرسانیوزن‌ها است که می‌تواند مقداری ثابت داشته باشد یا به سازگارشونده تغییر کند. محبوب‌ترین روش حال حاضر Adam نام دارد، روشی است که نرخ یادگیری را در حین فرآیند آموزش تنظیم می‌کند.
 </div>
 
 <br>
@@ -419,7 +419,7 @@
 **47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
 
 <div dir="rtl">
-برون‌اندازی – برون‌اندازی روشی است که در شبکه‌های عصبی برای جلوگیری از برارزش شدن بر روی داده‌های آموزشی با حذف تصادفی نورون‌ها با احتمال p>0 استفاده می‌شود. این روش مدل را مجبور می‌کند تا از تکیه کردن بیش از حد بر روی مجموعه خاصی از ویژگی‌ها خودداری کند.
+برون‌اندازی – برون‌اندازی روشی است که در شبکه‌های عصبی برای جلوگیری از بیش‌برارزش شدن بر روی داده‌های آموزشی با حذف تصادفی نورون‌ها با احتمال p>0 استفاده می‌شود. این روش مدل را مجبور می‌کند تا از تکیه کردن بیش از حد بر روی مجموعه خاصی از ویژگی‌ها خودداری کند.
 </div>
 
 <br>
@@ -437,7 +437,7 @@
 **49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
 
 <div dir="rtl">
-نظام‌بخشی وزن – برای اطمینان از اینکه (مقادیر) وزن‌ها زیادی بزرگ نیستند و مدل به مجموعه آموزش بیش‌برارزش نیست، روشهای نظام‌بخشی معمولا بر روی وزن‌های مدل اجرا می‌شوند. اصلی‌ترین آنها در جدول زیر به اختصار آمده‌اند:
+نظام‌بخشی وزن – برای اطمینان از اینکه (مقادیر) وزن‌ها بیش‌ازحد بزرگ نیستند و مدل به مجموعه‌ی آموزش بیش‌برارزش نمی‌کند، روشهای نظام‌بخشی معمولا بر روی وزن‌های مدل اجرا می‌شوند. اصلی‌ترین آنها در جدول زیر به اختصار آمده‌اند:
 </div>
 
 <br>
@@ -453,7 +453,7 @@
 **50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
 <div dir="rtl">
-ضرایب را تا ۰ کاهش می‌دهد، برای انتخاب متغیر مناسب است، ضرایب را کوچکتر می‌کند، بین انتخاب متغیر و ضرایب کوچک مصالحه می‌کند
+ضرایب را تا صفر کاهش می‌دهد، برای انتخاب متغیر مناسب است، ضرایب را کوچکتر می‌کند، بین انتخاب متغیر و ضرایب کوچک مصالحه می‌کند
 </div>
 
 <br>
@@ -461,7 +461,7 @@
 **51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
 
 <div dir="rtl">
-توقف زودهنگام ― این روش نظام‌بخشی، فرآیند آموزش را به محض اینکه خطای اعتبارسنجی ثابت یا شروع به افزایش پیدا کند، متوقف می‌کند.
+توقف زودهنگام ― این روش نظام‌بخشی، فرآیند آموزش را به محض اینکه خطای اعتبارسنجی ثابت می‌شود یا شروع به افزایش پیدا کند، متوقف می‌کند.
 </div>
 
 <br>
@@ -584,7 +584,3 @@
 </div>
 
 <br>
-
-
-
-

From cd3939d031a5b4e55804d9cbcc03c15d91da9639 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Sun, 10 Feb 2019 13:15:48 +0330
Subject: [PATCH 122/531] Ready to merge #3

---
 fa/deep-learning-tips-and-tricks.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 9acd12a30..80c0dd5c8 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -42,7 +42,7 @@
 **5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
 
 <div dir="rtl">
-[آموزش یک شبکه‌ی عصبی، تکرار(Epoch)، دسته‌ی‌کوچک، خطای آنتروپی متقاطع، انتشار معکوس، گرادیان نزولی، به‌روزرسانیوزن‌ها، وارسی گرادیان]
+[آموزش یک شبکه‌ی عصبی، تکرار(Epoch)، دسته‌ی‌کوچک، خطای آنتروپی متقاطع، انتشار معکوس، گرادیان نزولی، به‌روزرسانی وزن‌ها، وارسی گرادیان]
 </div>
 
 <br>
@@ -184,7 +184,7 @@
 **21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
 
 <div dir="rtl">
-تکرار (epoch) ― در مضمون آموزش یک مدل، تکرار اصطلاحی است که مدل در یک دوره تکرار تمامی نمونه‌های آموزشی را برای به‌روزرسانیوزن‌ها می‌بیند.
+تکرار (epoch) ― در مضمون آموزش یک مدل، تکرار اصطلاحی است که مدل در یک دوره تکرار تمامی نمونه‌های آموزشی را برای به‌روزرسانی وزن‌ها می‌بیند.
 </div>
 
 <br>
@@ -193,7 +193,7 @@
 **22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
 
 <div dir="rtl">
-گرادیان نزولی دسته‌ی‌کوچک ―  در فاز آموزش، به‌روزرسانیوزن‌ها معمولا بر مبنای تمامی مجموعه آموزش به علت پیچیدگی‌های محاسباتی، یا یک نمونه داده به علت مشکل نویز، نیست. در عوض، گام به‌روزرسانیبر روی دسته‌های کوچک انجام می شود، که تعداد نمونه‌های داده در یک دسته ابرفراسنج است که میتوان آن را تنظیم کرد.
+گرادیان نزولی دسته‌ی‌کوچک ―  در فاز آموزش، به‌روزرسانی وزن‌ها معمولا بر مبنای تمامی مجموعه آموزش به علت پیچیدگی‌های محاسباتی، یا یک نمونه داده به علت مشکل نویز، نیست. در عوض، گام به‌روزرسانی بر روی دسته‌های کوچک انجام می شود، که تعداد نمونه‌های داده در یک دسته ابرفراسنج است که میتوان آن را تنظیم کرد.
 </div>
 
 <br>
@@ -229,7 +229,7 @@
 **26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
 
 <div dir="rtl">
-انتشار معکوس ―  انتشار معکوس روشی برای به‌روزرسانیوزن‌ها با توجه به خروجی واقعی و خروجی مورد انتظار در شبکه‌ی عصبی است. مشتق نسبت به هر وزن w توسط قاعده‌ی زنجیری محاسبه می‌شود.
+انتشار معکوس ―  انتشار معکوس روشی برای به‌روزرسانی وزن‌ها با توجه به خروجی واقعی و خروجی مورد انتظار در شبکه‌ی عصبی است. مشتق نسبت به هر وزن w توسط قاعده‌ی زنجیری محاسبه می‌شود.
 </div>
 
 <br>
@@ -238,7 +238,7 @@
 **27. Using this method, each weight is updated with the rule:**
 
 <div dir="rtl">
-با استفاده از این روش، هر وزن با قانون زیر به‌روزرسانیمی‌شود:
+با استفاده از این روش، هر وزن با قانون زیر به‌روزرسانی می‌شود:
 </div>
 
 <br>
@@ -247,7 +247,7 @@
 **28. Updating weights ― In a neural network, weights are updated as follows:**
 
 <div dir="rtl">
-به‌روزرسانیوزن‌ها – در یک شبکه‌ی عصبی، وزن‌ها به شکل زیر به‌روزرسانیمی‌شوند:
+به‌روزرسانی وزن‌ها – در یک شبکه‌ی عصبی، وزن‌ها به شکل زیر به‌روزرسانی می‌شوند:
 </div>
 
 <br>
@@ -347,7 +347,7 @@
 **
 
 <div dir="rtl">
-نرخ یادگیری – نرخ یادگیری اغلب با نماد α و گاهی اوقات با نماد η نمایش داده می‌شود و بیانگر سرعت (گام) به‌روزرسانیوزن‌ها است که می‌تواند مقداری ثابت داشته باشد یا به سازگارشونده تغییر کند. محبوب‌ترین روش حال حاضر Adam نام دارد، روشی است که نرخ یادگیری را در حین فرآیند آموزش تنظیم می‌کند.
+نرخ یادگیری – نرخ یادگیری اغلب با نماد α و گاهی اوقات با نماد η نمایش داده می‌شود و بیانگر سرعت (گام) به‌روزرسانی وزن‌ها است که می‌تواند مقداری ثابت داشته باشد یا به سازگارشونده تغییر کند. محبوب‌ترین روش حال حاضر Adam نام دارد، روشی است که نرخ یادگیری را در حین فرآیند آموزش تنظیم می‌کند.
 </div>
 
 <br>

From e84c523ef5cf419c1531bf4ecf1326386c774b80 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Sun, 10 Feb 2019 13:32:16 +0330
Subject: [PATCH 123/531] =?UTF-8?q?Fix=20=D9=85=DB=8C=D8=AA=D9=88=D8=A7?=
 =?UTF-8?q?=D9=86=D9=85?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 fa/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 80c0dd5c8..823214a54 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -96,7 +96,7 @@
 **11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
 
 <div dir="rtl">
-داده‌افزایی ― مدل‌های یادگیری عمیق معمولا به داده‌های زیادی نیاز دارند تا بتوانند به خوبی آموزش ببینند. اغلب، استفاده از روشهای داده‌افزایی برای گرفتن داده بیشتر از داده‌های موجود، مفید است. اصلی‌ترین آنها در جدول زیر به اختصار آمده‌اند. به عبارت دقیق‌تر، با در نظر گرفتن تصویر ورودی زیر، روش‌هایی که می‌توانم اعمال کرد بدین شرح هستند:
+داده‌افزایی ― مدل‌های یادگیری عمیق معمولا به داده‌های زیادی نیاز دارند تا بتوانند به خوبی آموزش ببینند. اغلب، استفاده از روشهای داده‌افزایی برای گرفتن داده بیشتر از داده‌های موجود، مفید است. اصلی‌ترین آنها در جدول زیر به اختصار آمده‌اند. به عبارت دقیق‌تر، با در نظر گرفتن تصویر ورودی زیر، روش‌هایی که می‌توان اعمال کرد بدین شرح هستند:
 </div>
 
 <br>

From 0141c6f356db98b83dce690b36f7877d7dce7bdc Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Sun, 10 Feb 2019 13:44:50 +0330
Subject: [PATCH 124/531] Fix #44

---
 fa/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 823214a54..2e27a0014 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -392,7 +392,7 @@
 **44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
 
 <div dir="rtl">
-[Adam، تخمین سازگارشونده گشتاور، محبوب‌ترین متد، چهار فراسنج که نیاز به تنظیم دارند]
+[Adam، تخمین سازگارشونده ممان، محبوب‌ترین روش، چهار فراسنج که نیاز به تنظیم دارند]
 </div>
 
 <br>

From 1811d08954da78279f49805a2c8366209753d618 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Sun, 10 Feb 2019 13:53:07 +0330
Subject: [PATCH 125/531] Fix #48

---
 fa/deep-learning-tips-and-tricks.md | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 2e27a0014..0fb55ccfb 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -428,7 +428,9 @@
 **48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
 
 <div dir="rtl">
-نکته: بیشتر فریم ورک‌های یادگیری عمیق برون‌اندازی را به شکل فراسنج ‘keep’  1-p درمی-آورند.
+نکته: بیشتر کتابخانه‌های یادگیری عمیق برون‌اندازی را با استفاده از فراسنج 'نگه‌داشتن' 1-p کنترل می‌کنند
+
+
 </div>
 
 <br>

From 93d4d42d09eb33700311328b2e590c5cff7f1d46 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Sun, 10 Feb 2019 13:59:44 +0330
Subject: [PATCH 126/531] Mirror fix

---
 fa/deep-learning-tips-and-tricks.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 0fb55ccfb..21de0700d 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -419,7 +419,7 @@
 **47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
 
 <div dir="rtl">
-برون‌اندازی – برون‌اندازی روشی است که در شبکه‌های عصبی برای جلوگیری از بیش‌برارزش شدن بر روی داده‌های آموزشی با حذف تصادفی نورون‌ها با احتمال p>0 استفاده می‌شود. این روش مدل را مجبور می‌کند تا از تکیه کردن بیش از حد بر روی مجموعه خاصی از ویژگی‌ها خودداری کند.
+برون‌اندازی – برون‌اندازی روشی است که در شبکه‌های عصبی برای جلوگیری از بیش‌برارزش شدن بر روی داده‌های آموزشی با حذف تصادفی نورون‌ها با احتمال p>0 استفاده می‌شود. این روش مدل را مجبور می‌کند تا از تکیه کردن بیش‌از‌حد بر روی مجموعه خاصی از ویژگی‌ها خودداری کند.
 </div>
 
 <br>
@@ -428,7 +428,9 @@
 **48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
 
 <div dir="rtl">
-نکته: بیشتر کتابخانه‌های یادگیری عمیق برون‌اندازی را با استفاده از فراسنج 'نگه‌داشتن' 1-p کنترل می‌کنند
+
+
+نکته: بیشتر کتابخانه‌های یادگیری عمیق برون‌اندازی را با استفاده از فراسنج 'نگه‌داشتن' 1−p کنترل می‌کنند.
 
 
 </div>

From 269c97cf82d7001b89be5bf02f54e12a1ca51b56 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Sun, 10 Feb 2019 14:14:44 +0330
Subject: [PATCH 127/531] Minor fix

---
 fa/deep-learning-tips-and-tricks.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 21de0700d..6906fb2f3 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -42,7 +42,7 @@
 **5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
 
 <div dir="rtl">
-[آموزش یک شبکه‌ی عصبی، تکرار(Epoch)، دسته‌ی‌کوچک، خطای آنتروپی متقاطع، انتشار معکوس، گرادیان نزولی، به‌روزرسانی وزن‌ها، وارسی گرادیان]
+[آموزش یک شبکه‌ی عصبی، تکرار(Epoch)، دسته‌ی کوچک، خطای آنتروپی متقاطع، انتشار معکوس، گرادیان نزولی، به‌روزرسانی وزن‌ها، وارسی گرادیان]
 </div>
 
 <br>
@@ -492,7 +492,7 @@
 **54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
 
 <div dir="rtl">
-بیش‌برارزش روی دسته‌ی ‌کوچک ―  هنگام اشکال‌زدایی یک مدل، اغلب مفید است که یک سری آزمایش‌های سریع برای اطمینان از اینکه هیچ مشکل عمده‌ای در معماری مدل وجود ندارد، انجام شود. به طورخاص، برای اطمینان از اینکه مدل می‌تواند به شکل صحیح آموزش ببیند، یک دسته‌ی‌ کوچک (از داده‌ها) به شبکه داده می‌شود تا دریابیم که مدل می‌تواند به آنها بیش‌برارزش کند. اگر نتواند، بدین معناست که مدل از پیچیدگی بالایی برخوردار است یا پیچیدگی کافی برای بیش‌برارزش شدن روی دستهی کوچک ندارد، چه برسد به یک مجموعه آموزشی با اندازه عادی.
+بیش‌برارزش روی دسته‌ی ‌کوچک ―  هنگام اشکال‌زدایی یک مدل، اغلب مفید است که یک سری آزمایش‌های سریع برای اطمینان از اینکه هیچ مشکل عمده‌ای در معماری مدل وجود ندارد، انجام شود. به طورخاص، برای اطمینان از اینکه مدل می‌تواند به شکل صحیح آموزش ببیند، یک دسته‌ی‌ کوچک (از داده‌ها) به شبکه داده می‌شود تا دریابیم که مدل می‌تواند به آنها بیش‌برارزش کند. اگر نتواند، بدین معناست که مدل از پیچیدگی بالایی برخوردار است یا پیچیدگی کافی برای بیش‌برارزش شدن روی دسته‌ی‌ کوچک ندارد، چه برسد به یک مجموعه آموزشی با اندازه عادی.
 </div>
 
 <br>
@@ -501,7 +501,7 @@
 **55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
 
 <div dir="rtl">
-وارسی گرادیان – وارسی گرادیان روشی است که در طول پیادهسازی گذر روبه‌عقبِ یک شبکهی عصبی استفاده می‌شود. این روش مقدار گرادیان تحلیلی را با گرادیان عددی در نقطه‌های مفروض مقایسه می‌کند و نقش بررسی درستی را ایفا میکند. 
+وارسی گرادیان – وارسی گرادیان روشی است که در طول پیاده‌سازی گذر روبه‌عقبِ یک شبکه‌ی عصبی استفاده می‌شود. این روش مقدار گرادیان تحلیلی را با گرادیان عددی در نقطه‌های مفروض مقایسه می‌کند و نقش بررسی درستی را ایفا میکند. 
 </div>
 
 <br>

From 793b3269b43527a5a247d71c2e6587d03b7eaf7c Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Sun, 10 Feb 2019 15:40:15 +0330
Subject: [PATCH 128/531] a real quick fix

---
 fa/deep-learning-tips-and-tricks.md | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index 6906fb2f3..c29d6a4cf 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -96,7 +96,7 @@
 **11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
 
 <div dir="rtl">
-داده‌افزایی ― مدل‌های یادگیری عمیق معمولا به داده‌های زیادی نیاز دارند تا بتوانند به خوبی آموزش ببینند. اغلب، استفاده از روشهای داده‌افزایی برای گرفتن داده بیشتر از داده‌های موجود، مفید است. اصلی‌ترین آنها در جدول زیر به اختصار آمده‌اند. به عبارت دقیق‌تر، با در نظر گرفتن تصویر ورودی زیر، روش‌هایی که می‌توان اعمال کرد بدین شرح هستند:
+داده‌افزایی ― مدل‌های یادگیری عمیق معمولا به داده‌های زیادی نیاز دارند تا بتوانند به خوبی آموزش ببینند. اغلب، استفاده از روش‌های داده‌افزایی برای گرفتن داده بیشتر از داده‌های موجود، مفید است. اصلی‌ترین آنها در جدول زیر به اختصار آمده‌اند. به عبارت دقیق‌تر، با در نظر گرفتن تصویر ورودی زیر، روش‌هایی که می‌توان اعمال کرد بدین شرح هستند:
 </div>
 
 <br>
@@ -328,7 +328,7 @@
 **37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
 
 <div dir="rtl">
-[منجمد کردن تمامی لایه‌ها، آموزش وزن‌ها در بیشینه‌ی هموار، منجمد کردن اکثر لایه‌ها، آموزش وزن‌ها در لایه‌های آخر و بیشینه‌ی هموار، آموزش وزن‌ها در (تمامی) لایه‌ها و بیشینه‌ی هموار با مقداردهی‌اولیه کردن وزن‌ها بر روی مقادیر ازقبل‌آموزش داده شده]
+[منجمد کردن تمامی لایه‌ها، آموزش وزن‌ها در بیشینه‌ی هموار، منجمد کردن اکثر لایه‌ها، آموزش وزن‌ها در لایه‌های آخر و بیشینه‌ی هموار، آموزش وزن‌ها در (تمامی) لایه‌ها و بیشینه‌ی هموار با مقداردهی‌اولیه کردن وزن‌ها بر روی مقادیر پیش‌آموخته]
 </div>
 
 <br>
@@ -428,11 +428,7 @@
 **48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
 
 <div dir="rtl">
-
-
-نکته: بیشتر کتابخانه‌های یادگیری عمیق برون‌اندازی را با استفاده از فراسنج 'نگه‌داشتن' 1−p کنترل می‌کنند.
-
-
+نکته: بیشتر کتابخانه‌های یادگیری عمیق برون‌اندازی را با استفاده از فراسنج 'نگه‌داشتن' 1-p کنترل می‌کنند.
 </div>
 
 <br>

From 9c93724773e300f846243fac55c3b8df5c3214bc Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Sun, 10 Feb 2019 16:11:45 +0330
Subject: [PATCH 129/531] Delete convolutional-neural-networks.md

---
 fa/convolutional-neural-networks.md | 716 ----------------------------
 1 file changed, 716 deletions(-)
 delete mode 100644 fa/convolutional-neural-networks.md

diff --git a/fa/convolutional-neural-networks.md b/fa/convolutional-neural-networks.md
deleted file mode 100644
index 1b1283628..000000000
--- a/fa/convolutional-neural-networks.md
+++ /dev/null
@@ -1,716 +0,0 @@
-**Convolutional Neural Networks translation**
-
-<br>
-
-**1. Convolutional Neural Networks cheatsheet**
-
-&#10230;
-
-<br>
-
-
-**2. CS 230 - Deep Learning**
-
-&#10230;
-
-<br>
-
-
-**3. [Overview, Architecture structure]**
-
-&#10230;
-
-<br>
-
-
-**4. [Types of layer, Convolution, Pooling, Fully connected]**
-
-&#10230;
-
-<br>
-
-
-**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
-
-&#10230;
-
-<br>
-
-
-**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
-
-&#10230;
-
-<br>
-
-
-**7. [Activation functions, Rectified Linear Unit, Softmax]**
-
-&#10230;
-
-<br>
-
-
-**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
-
-&#10230;
-
-<br>
-
-
-**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
-
-&#10230;
-
-<br>
-
-
-**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
-
-&#10230;
-
-<br>
-
-
-**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
-
-&#10230;
-
-<br>
-
-
-**12. Overview**
-
-&#10230;
-
-<br>
-
-
-**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
-
-&#10230;
-
-<br>
-
-
-**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
-
-&#10230;
-
-<br>
-
-
-**15. Types of layer**
-
-&#10230;
-
-<br>
-
-
-**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
-
-&#10230;
-
-<br>
-
-
-**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
-
-&#10230;
-
-<br>
-
-
-**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
-
-&#10230;
-
-<br>
-
-
-**19. [Type, Purpose, Illustration, Comments]**
-
-&#10230;
-
-<br>
-
-
-**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
-
-&#10230;
-
-<br>
-
-
-**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
-
-&#10230;
-
-<br>
-
-
-**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
-
-&#10230;
-
-<br>
-
-
-**23. Filter hyperparameters**
-
-&#10230;
-
-<br>
-
-
-**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
-
-&#10230;
-
-<br>
-
-
-**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
-
-&#10230;
-
-<br>
-
-
-**26. Filter**
-
-&#10230;
-
-<br>
-
-
-**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
-
-&#10230;
-
-<br>
-
-
-**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
-
-&#10230;
-
-<br>
-
-
-**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
-
-&#10230;
-
-<br>
-
-
-**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
-
-&#10230;
-
-<br>
-
-
-**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
-
-&#10230;
-
-<br>
-
-
-**32. Tuning hyperparameters**
-
-&#10230;
-
-<br>
-
-
-**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
-
-&#10230;
-
-<br>
-
-
-**34. [Input, Filter, Output]**
-
-&#10230;
-
-<br>
-
-
-**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
-
-&#10230;
-
-<br>
-
-
-**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
-
-&#10230;
-
-<br>
-
-
-**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
-
-&#10230;
-
-<br>
-
-
-**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
-
-&#10230;
-
-<br>
-
-
-**39. [Pooling operation done channel-wise, In most cases, S=F]**
-
-&#10230;
-
-<br>
-
-
-**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
-
-&#10230;
-
-<br>
-
-
-**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
-
-&#10230;
-
-<br>
-
-
-**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
-
-&#10230;
-
-<br>
-
-
-**43. Commonly used activation functions**
-
-&#10230;
-
-<br>
-
-
-**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
-
-&#10230;
-
-<br>
-
-
-**45. [ReLU, Leaky ReLU, ELU, with]**
-
-&#10230;
-
-<br>
-
-
-**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
-
-&#10230;
-
-<br>
-
-
-**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**48. where**
-
-&#10230;
-
-<br>
-
-
-**49. Object detection**
-
-&#10230;
-
-<br>
-
-
-**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
-
-&#10230;
-
-<br>
-
-
-**51. [Image classification, Classification w. localization, Detection]**
-
-&#10230;
-
-<br>
-
-
-**52. [Teddy bear, Book]**
-
-&#10230;
-
-<br>
-
-
-**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
-
-&#10230;
-
-<br>
-
-
-**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
-
-&#10230;
-
-<br>
-
-
-**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-
-**56. [Bounding box detection, Landmark detection]**
-
-&#10230;
-
-<br>
-
-
-**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
-
-&#10230;
-
-<br>
-
-
-**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
-
-&#10230;
-
-<br>
-
-
-**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
-
-&#10230;
-
-<br>
-
-
-**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
-
-&#10230;
-
-<br>
-
-
-**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
-
-&#10230;
-
-<br>
-
-
-**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
-
-&#10230;
-
-<br>
-
-
-**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
-
-&#10230;
-
-<br>
-
-
-**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
-
-&#10230;
-
-<br>
-
-
-**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
-
-&#10230;
-
-<br>
-
-
-**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
-
-&#10230;
-
-<br>
-
-
-**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
-
-&#10230;
-
-<br>
-
-
-**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
-
-&#10230;
-
-<br>
-
-
-**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
-
-&#10230;
-
-<br>
-
-
-**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
-
-&#10230;
-
-<br>
-
-
-**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
-
-&#10230;
-
-<br>
-
-
-**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
-
-&#10230;
-
-<br>
-
-
-**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
-
-&#10230;
-
-<br>
-
-
-**74. Face verification and recognition**
-
-&#10230;
-
-<br>
-
-
-**75. Types of models ― Two main types of model are summed up in table below:**
-
-&#10230;
-
-<br>
-
-
-**76. [Face verification, Face recognition, Query, Reference, Database]**
-
-&#10230;
-
-<br>
-
-
-**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
-
-&#10230;
-
-<br>
-
-
-**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
-
-&#10230;
-
-<br>
-
-
-**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
-
-&#10230;
-
-<br>
-
-
-**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**81. Neural style transfer**
-
-&#10230;
-
-<br>
-
-
-**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
-
-&#10230;
-
-<br>
-
-
-**83. [Content C, Style S, Generated image G]**
-
-&#10230;
-
-<br>
-
-
-**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
-
-&#10230;
-
-<br>
-
-
-**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
-
-&#10230;
-
-<br>
-
-
-**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
-
-&#10230;
-
-<br>
-
-
-**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
-
-&#10230;
-
-<br>
-
-
-**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
-
-&#10230;
-
-<br>
-
-
-**91. Architectures using computational tricks**
-
-&#10230;
-
-<br>
-
-
-**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
-
-&#10230;
-
-<br>
-
-
-**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
-
-&#10230;
-
-<br>
-
-
-**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
-
-&#10230;
-
-<br>
-
-
-**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
-
-&#10230;
-
-<br>
-
-
-**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
-
-&#10230;
-
-<br>
-
-
-**97. The Deep Learning cheatsheets are now available in [target language].**
-
-&#10230;
-
-<br>
-
-
-**98. Original authors**
-
-&#10230;
-
-<br>
-
-
-**99. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-
-**100. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-
-**101. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-
-**102. By X and Y**
-
-&#10230;
-
-<br>

From b587f11dc4775d7ad39a0f066076dff4a21b3677 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Sun, 10 Feb 2019 16:13:46 +0330
Subject: [PATCH 130/531] Delete recurrent-neural-networks.md

---
 fa/recurrent-neural-networks.md | 677 --------------------------------
 1 file changed, 677 deletions(-)
 delete mode 100644 fa/recurrent-neural-networks.md

diff --git a/fa/recurrent-neural-networks.md b/fa/recurrent-neural-networks.md
deleted file mode 100644
index 191e400a1..000000000
--- a/fa/recurrent-neural-networks.md
+++ /dev/null
@@ -1,677 +0,0 @@
-**Recurrent Neural Networks translation**
-
-<br>
-
-**1. Recurrent Neural Networks cheatsheet**
-
-&#10230;
-
-<br>
-
-
-**2. CS 230 - Deep Learning**
-
-&#10230;
-
-<br>
-
-
-**3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
-
-&#10230;
-
-<br>
-
-
-**4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
-
-&#10230;
-
-<br>
-
-
-**5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
-
-&#10230;
-
-<br>
-
-
-**6. [Comparing words, Cosine similarity, t-SNE]**
-
-&#10230;
-
-<br>
-
-
-**7. [Language model, n-gram, Perplexity]**
-
-&#10230;
-
-<br>
-
-
-**8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
-
-&#10230;
-
-<br>
-
-
-**9. [Attention, Attention model, Attention weights]**
-
-&#10230;
-
-<br>
-
-
-**10. Overview**
-
-&#10230;
-
-<br>
-
-
-**11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
-
-&#10230;
-
-<br>
-
-
-**12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
-
-&#10230;
-
-<br>
-
-
-**13. and**
-
-&#10230;
-
-<br>
-
-
-**14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
-
-&#10230;
-
-<br>
-
-
-**15. The pros and cons of a typical RNN architecture are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-
-**16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
-
-&#10230;
-
-<br>
-
-
-**17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
-
-&#10230;
-
-<br>
-
-
-**18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-
-**19. [Type of RNN, Illustration, Example]**
-
-&#10230;
-
-<br>
-
-
-**20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
-
-&#10230;
-
-<br>
-
-
-**21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
-
-&#10230;
-
-<br>
-
-
-**22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
-
-&#10230;
-
-<br>
-
-
-**23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
-
-&#10230;
-
-<br>
-
-
-**24. Handling long term dependencies**
-
-&#10230;
-
-<br>
-
-
-**25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
-
-&#10230;
-
-<br>
-
-
-**26. [Sigmoid, Tanh, RELU]**
-
-&#10230;
-
-<br>
-
-
-**27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
-
-&#10230;
-
-<br>
-
-
-**28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
-
-&#10230;
-
-<br>
-
-
-**29. clipped**
-
-&#10230;
-
-<br>
-
-
-**30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
-
-&#10230;
-
-<br>
-
-
-**31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-
-**32. [Type of gate, Role, Used in]**
-
-&#10230;
-
-<br>
-
-
-**33. [Update gate, Relevance gate, Forget gate, Output gate]**
-
-&#10230;
-
-<br>
-
-
-**34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
-
-&#10230;
-
-<br>
-
-
-**35. [LSTM, GRU]**
-
-&#10230;
-
-<br>
-
-
-**36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
-
-&#10230;
-
-<br>
-
-
-**37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
-
-&#10230;
-
-<br>
-
-
-**38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
-
-&#10230;
-
-<br>
-
-
-**39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
-
-&#10230;
-
-<br>
-
-
-**40. [Bidirectional (BRNN), Deep (DRNN)]**
-
-&#10230;
-
-<br>
-
-
-**41. Learning word representation**
-
-&#10230;
-
-<br>
-
-
-**42. In this section, we note V the vocabulary and |V| its size.**
-
-&#10230;
-
-<br>
-
-
-**43. Motivation and notations**
-
-&#10230;
-
-<br>
-
-
-**44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-
-**45. [1-hot representation, Word embedding]**
-
-&#10230;
-
-<br>
-
-
-**46. [teddy bear, book, soft]**
-
-&#10230;
-
-<br>
-
-
-**47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
-
-&#10230;
-
-<br>
-
-
-**48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
-
-&#10230;
-
-<br>
-
-
-**49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
-
-&#10230;
-
-<br>
-
-
-**50. Word embeddings**
-
-&#10230;
-
-<br>
-
-
-**51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
-
-&#10230;
-
-<br>
-
-
-**52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
-
-&#10230;
-
-<br>
-
-
-**53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
-
-&#10230;
-
-<br>
-
-
-**54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
-
-&#10230;
-
-<br>
-
-
-**55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
-
-&#10230;
-
-<br>
-
-
-**56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
-
-&#10230;
-
-<br>
-
-
-**57. Remark: this method is less computationally expensive than the skip-gram model.**
-
-&#10230;
-
-<br>
-
-
-**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
-
-&#10230;
-
-<br>
-
-
-**58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
-Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
-
-&#10230;
-
-<br>
-
-
-**59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
-
-&#10230;
-
-<br>
-
-
-**60. Comparing words**
-
-&#10230;
-
-<br>
-
-
-**61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
-
-&#10230;
-
-<br>
-
-
-**62. Remark: θ is the angle between words w1 and w2.**
-
-&#10230;
-
-<br>
-
-
-**63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
-
-&#10230;
-
-<br>
-
-
-**64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
-
-&#10230;
-
-<br>
-
-
-**65. Language model**
-
-&#10230;
-
-<br>
-
-
-**66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
-
-&#10230;
-
-<br>
-
-
-**67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
-
-&#10230;
-
-<br>
-
-
-**68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**69. Remark: PP is commonly used in t-SNE.**
-
-&#10230;
-
-<br>
-
-
-**70. Machine translation**
-
-&#10230;
-
-<br>
-
-
-**71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
-
-&#10230;
-
-<br>
-
-
-**72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
-
-&#10230;
-
-<br>
-
-
-**73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
-
-&#10230;
-
-<br>
-
-
-**74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
-
-&#10230;
-
-<br>
-
-
-**75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
-
-&#10230;
-
-<br>
-
-
-**76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
-
-&#10230;
-
-<br>
-
-
-**77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
-
-&#10230;
-
-<br>
-
-
-**78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
-
-&#10230;
-
-<br>
-
-
-**79. [Case, Root cause, Remedies]**
-
-&#10230;
-
-<br>
-
-
-**80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
-
-&#10230;
-
-<br>
-
-
-**81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**82. where pn is the bleu score on n-gram only defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
-
-&#10230;
-
-<br>
-
-
-**84. Attention**
-
-&#10230;
-
-<br>
-
-
-**85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
-
-&#10230;
-
-<br>
-
-
-**86. with**
-
-&#10230;
-
-<br>
-
-
-**87. Remark: the attention scores are commonly used in image captioning and machine translation.**
-
-&#10230;
-
-<br>
-
-
-**88. A cute teddy bear is reading Persian literature.**
-
-&#10230;
-
-<br>
-
-
-**89. Attention weight ― The amount of attention that the output y<t> should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
-
-&#10230;
-
-<br>
-
-
-**90. Remark: computation complexity is quadratic with respect to Tx.**
-
-&#10230;
-
-<br>
-
-
-**91. The Deep Learning cheatsheets are now available in [target language].**
-
-&#10230;
-
-<br>
-
-**92. Original authors**
-
-&#10230;
-
-<br>
-
-**93. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**94. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**95. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-**96. By X and Y**
-
-&#10230;
-
-<br>

From 366c9addcf03c071ec22c99da9d9af05ca832508 Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Sun, 10 Feb 2019 16:13:59 +0330
Subject: [PATCH 131/531] Delete CONTRIBUTORS

---
 CONTRIBUTORS | 144 ---------------------------------------------------
 1 file changed, 144 deletions(-)
 delete mode 100644 CONTRIBUTORS

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
deleted file mode 100644
index 6a7498fe5..000000000
--- a/CONTRIBUTORS
+++ /dev/null
@@ -1,144 +0,0 @@
---ar
-  Amjad Khatabi (translation of deep learning)
-  Zaid Alyafeai (review of deep learning)
-
---de
-
---es
-  Erick Gabriel Mendoza Flores (translation of deep learning)
-  Fernando Diaz (review of deep learning)
-  Fernando González-Herrera (review of deep learning)
-  Mariano Ramirez (review of deep learning)
-  Juan P. Chavat (review of deep learning)
-  Alonso Melgar López (review of deep learning)
-  Gustavo Velasco-Hernández (review of deep learning)
-  Juan Manuel Nava Zamudio (review of deep learning)
-
-  Fernando González-Herrera (translation of linear algebra)
-  Fernando Diaz (review of linear algebra)
-  Gustavo Velasco-Hernández (review of linear algebra)
-  Juan P. Chavat (review of linear algebra)
-
-  David Jiménez Paredes (translation of machine learning tips and tricks)
-  Fernando Diaz (translation of machine learning tips and tricks)
-  Gustavo Velasco-Hernández (review of machine learning tips and tricks)
-  Alonso Melgar-Lopez (review of machine learning tips and tricks)
-
-  Fermin Ordaz (translation of probabilities and statistics)
-  Fernando González-Herrera (review of probabilities and statistics)
-  Alonso Melgar López (review of probabilities and statistics)
-
-  Juan P. Chavat (translation of supervised learning)
-  Fernando Gonzalez-Herrera (review of supervised learning)
-  Fernando Diaz (review of supervised learning)
-  Alonso Melgar-Lopez (review of supervised learning)
-
-  Jaime Noel Alvarez Luna (translation of unsupervised learning)
-  Alonso Melgar López (review of unsupervised learning)
-  Fernando Diaz (review of unsupervised learning)
-
---fa
-  AlisterTA (translation of deep learning)
-  Mohammad Karimi (review of deep learning)
-  Erfan Noury (review of deep learning)
-
-  Erfan Noury (translation of linear algebra)
-  Mohammad Karimi (review of linear algebra)
-
-  AlisterTA (translation of machine learning tips and tricks)
-  Mohammad Reza (translation of machine learning tips and tricks)
-  Erfan Noury (review of machine learning tips and tricks)
-  Mohammad Karimi (review of machine learning tips and tricks)
-
-  Erfan Noury (translation of probabilities and statistics)
-  Mohammad Karimi (review of probabilities and statistics)
-
-  Amirhosein Kazemnejad (translation of supervised learning)
-  Erfan Noury (review of supervised learning)
-  Mohammad Karimi (review of supervised learning)
-
-  Erfan Noury (translation of unsupervised learning)
-  Mohammad Karimi (review of unsupervised learning)
-
---fr
-  Original authors
-
---he
-
---hi
-
---ja
-
---pt
-  Gabriel Fonseca (translation of deep learning)
-  Leticia Portella (review of deep learning)
-
-  Gabriel Fonseca (translation of linear algebra)
-  Leticia Portella (review of linear algebra)
-
-  Fernando Santos (translation of machine learning tips and tricks)
-  Leticia Portella (review of machine learning tips and tricks)
-  Gabriel Fonseca (review of machine learning tips and tricks)
-
-  Leticia Portella (translation of probabilities and statistics)
-  Flavio Clesio (review of probabilities and statistics)
-
-  Leticia Portella (translation of supervised learning)
-  Gabriel Fonseca (review of supervised learning)
-  Flavio Clesio (review of supervised learning)
-
-  Gabriel Fonseca (translation of unsupervised learning)
-  Tiago Danin (review of unsupervised learning)
-
---tr
-  Ayyüce Kızrak (translation of convolutional neural networks)
-  Yavuz Kömeçoğlu (review of convolutional neural networks)
-
-  Ekrem Çetinkaya (translation of deep learning)
-  Omer Bukte (review of deep learning)
-
-  Ayyüce Kızrak (translation of deep learning tips and tricks)
-  Yavuz Kömeçoğlu (review of deep learning tips and tricks)
-
-  Kadir Tekeli (translation of linear algebra)
-  Ekrem Çetinkaya (review of linear algebra)
-
-  Seray Beşer (translation of machine learning tips and tricks)
-  Ayyüce Kızrak (review of machine learning tips and tricks)
-  Yavuz Kömeçoğlu (review of machine learning tips and tricks)
-
-  Ayyüce Kızrak (translation of probabilities and statistics)
-  Başak Buluz (review of probabilities and statistics)
-
-  Başak Buluz (translation of recurrent neural networks)
-  Yavuz Kömeçoğlu (review of recurrent neural networks)
-
-  Başak Buluz (translation of supervised learning)
-  Ayyüce Kızrak (review of supervised learning)
-
-  Yavuz Kömeçoğlu (translation of unsupervised learning)
-  Başak Buluz (review of unsupervised learning)
-
---uk
-  Gregory Reshetniak (translation of probabilities and statistics)
-  Denys (review of probabilities and statistics)
-
---zh
-  Wang Hongnian (translation of supervised learning)
-  Xiaohu Zhu (朱小虎) (review of supervised learning)
-  Chaoying Xue (review of supervised learning)
-
---zh-tw
-  kevingo (translation of deep learning)
-  TobyOoO (review of deep learning)
-
-  kevingo (translation of supervised learning)
-  accelsao (review of supervised learning)
-
-  kevingo (translation of unsupervised learning)
-  imironhead (review of unsupervised learning)
-  johnnychhsu (review of unsupervised learning)
-
-  kevingo (translation of probabilities and statistics)
-  johnnychhsu (review of probabilities and statistics)
-

From 6f292806b6f16a286cec3674e3baeaad7a2292cf Mon Sep 17 00:00:00 2001
From: AlisterTA <19950298+AlisterTA@users.noreply.github.com>
Date: Mon, 11 Feb 2019 06:50:27 +0330
Subject: [PATCH 132/531] You're all set!

---
 fa/deep-learning-tips-and-tricks.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/deep-learning-tips-and-tricks.md
index c29d6a4cf..1248a06bf 100644
--- a/fa/deep-learning-tips-and-tricks.md
+++ b/fa/deep-learning-tips-and-tricks.md
@@ -33,7 +33,7 @@
 **4. [Data processing, Data augmentation, Batch normalization]**
 
 <div dir="rtl">
-[پردازش‌داده، داده‌افزایی، نرمال‌سازی دسته‌ای]
+[پردازش داده، داده‌افزایی، نرمال‌سازی دسته‌ای]
 </div>
 
 <br>
@@ -96,7 +96,7 @@
 **11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
 
 <div dir="rtl">
-داده‌افزایی ― مدل‌های یادگیری عمیق معمولا به داده‌های زیادی نیاز دارند تا بتوانند به خوبی آموزش ببینند. اغلب، استفاده از روش‌های داده‌افزایی برای گرفتن داده بیشتر از داده‌های موجود، مفید است. اصلی‌ترین آنها در جدول زیر به اختصار آمده‌اند. به عبارت دقیق‌تر، با در نظر گرفتن تصویر ورودی زیر، روش‌هایی که می‌توان اعمال کرد بدین شرح هستند:
+داده‌افزایی ― مدل‌های یادگیری عمیق معمولا به داده‌های زیادی نیاز دارند تا بتوانند به خوبی آموزش ببینند. اغلب، استفاده از روش‌های داده‌افزایی برای گرفتن داده‌ی بیشتر از داده‌های موجود، مفید است. اصلی‌ترین آنها در جدول زیر به اختصار آمده‌اند. به عبارت دقیق‌تر، با در نظر گرفتن تصویر ورودی زیر، روش‌هایی که می‌توان اعمال کرد بدین شرح هستند:
 </div>
 
 <br>
@@ -130,7 +130,7 @@
 **15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
 
 <div dir="rtl">
-[عناصر RGB کمی تغییر کرده است، نویزی که در هنگام مواجه شدن با نور رخ می‌دهد را شبیه‌سازی می‌کند، افزودگی نویز، مقاومت بیشتر نسبت به تغییر کیفیت تصاویر ورودی، بخش‌هایی از تصویر نادیده گرفته می‌شوند، تقلید (شبیه سازی) هدررفت بالقوه بخش هایی از تصویر، تغییر درخشندگی، با توجه به زمان روز تفاوت نمایش (تصویر) را کنترل می‌کند]
+[عناصر RGB کمی تغییر کرده است، نویزی که در هنگام مواجه شدن با نور رخ می‌دهد را شبیه‌سازی می‌کند، افزودگی نویز، مقاومت بیشتر نسبت به تغییر کیفیت تصاویر ورودی، بخش‌هایی از تصویر نادیده گرفته می‌شوند، تقلید (شبیه سازی) هدررفت بالقوه بخش‌هایی از تصویر، تغییر درخشندگی، با توجه به زمان روز تفاوت نمایش (تصویر) را کنترل می‌کند]
 </div>
 
 <br>
@@ -193,7 +193,7 @@
 **22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
 
 <div dir="rtl">
-گرادیان نزولی دسته‌ی‌کوچک ―  در فاز آموزش، به‌روزرسانی وزن‌ها معمولا بر مبنای تمامی مجموعه آموزش به علت پیچیدگی‌های محاسباتی، یا یک نمونه داده به علت مشکل نویز، نیست. در عوض، گام به‌روزرسانی بر روی دسته‌های کوچک انجام می شود، که تعداد نمونه‌های داده در یک دسته ابرفراسنج است که میتوان آن را تنظیم کرد.
+گرادیان نزولی دسته‌ی‌کوچک ―  در فاز آموزش، به‌روزرسانی وزن‌ها معمولا بر مبنای تمامی مجموعه آموزش به علت پیچیدگی‌های محاسباتی، یا یک نمونه داده به علت مشکل نویز، نیست. در عوض، گام به‌روزرسانی بر روی دسته‌های کوچک انجام می شود، که تعداد نمونه‌های داده در یک دسته یک ابرفراسنج است که میتوان آن را تنظیم کرد.
 </div>
 
 <br>
@@ -283,7 +283,7 @@
 **32. Weights initialization**
 
 <div dir="rtl">
-مقداردهی اولیه وزن‌ها
+مقداردهی اولیه‌ی وزن‌ها
 </div>
 
 <br>
@@ -301,7 +301,7 @@
 **34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
 
 <div dir="rtl">
-یادگیری انتقالی ― آموزش یک مدل یادگیری عمیق به داده‌های زیاد و مهم تر از آن به زمان زیادی احتیاج دارد. اغلب بهتر است که از وزن‌های پیش‌آموخته روی پایگاه داده‌های عظیم که آموزش بر روی آن‌ها روزها یا هفته‌ها طول می‌کشند استفاده کرد، و آن‌ها را برای موارد استفاده‌ی خود به کار برد. بسته به میزان داده‌هایی که در اختیار داریم، در زیر روش‌های مختلفی که می‌توان از آنها بهره جست آورده شده‌اند:
+یادگیری انتقالی ― آموزش یک مدل یادگیری عمیق به داده‌های زیاد و مهم‌تر از آن به زمان زیادی احتیاج دارد. اغلب بهتر است که از وزن‌های پیش‌آموخته روی پایگاه داده‌های عظیم که آموزش بر روی آن‌ها روزها یا هفته‌ها طول می‌کشند استفاده کرد، و آن‌ها را برای موارد استفاده‌ی خود به کار برد. بسته به میزان داده‌هایی که در اختیار داریم، در زیر روش‌های مختلفی که می‌توان از آنها بهره جست آورده شده‌اند:
 </div>
 
 <br>
@@ -328,7 +328,7 @@
 **37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
 
 <div dir="rtl">
-[منجمد کردن تمامی لایه‌ها، آموزش وزن‌ها در بیشینه‌ی هموار، منجمد کردن اکثر لایه‌ها، آموزش وزن‌ها در لایه‌های آخر و بیشینه‌ی هموار، آموزش وزن‌ها در (تمامی) لایه‌ها و بیشینه‌ی هموار با مقداردهی‌اولیه کردن وزن‌ها بر روی مقادیر پیش‌آموخته]
+[منجمد کردن تمامی لایه‌ها، آموزش وزن‌ها در بیشینه‌ی هموار، منجمد کردن اکثر لایه‌ها، آموزش وزن‌ها در لایه‌های آخر و بیشینه‌ی هموار، آموزش وزن‌ها در (تمامی) لایه‌ها و بیشینه‌ی هموار با مقداردهی‌اولیه‌ی وزن‌ها بر طبق مقادیر پیش‌آموخته]
 </div>
 
 <br>
@@ -347,7 +347,7 @@
 **
 
 <div dir="rtl">
-نرخ یادگیری – نرخ یادگیری اغلب با نماد α و گاهی اوقات با نماد η نمایش داده می‌شود و بیانگر سرعت (گام) به‌روزرسانی وزن‌ها است که می‌تواند مقداری ثابت داشته باشد یا به سازگارشونده تغییر کند. محبوب‌ترین روش حال حاضر Adam نام دارد، روشی است که نرخ یادگیری را در حین فرآیند آموزش تنظیم می‌کند.
+نرخ یادگیری – نرخ یادگیری اغلب با نماد α و گاهی اوقات با نماد η نمایش داده می‌شود و بیانگر سرعت (گام) به‌روزرسانی وزن‌ها است که می‌تواند مقداری ثابت داشته باشد یا به صورت سازگارشونده تغییر کند. محبوب‌ترین روش حال حاضر Adam نام دارد، روشی است که نرخ یادگیری را در حین فرآیند آموزش تنظیم می‌کند.
 </div>
 
 <br>
@@ -419,7 +419,7 @@
 **47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
 
 <div dir="rtl">
-برون‌اندازی – برون‌اندازی روشی است که در شبکه‌های عصبی برای جلوگیری از بیش‌برارزش شدن بر روی داده‌های آموزشی با حذف تصادفی نورون‌ها با احتمال p>0 استفاده می‌شود. این روش مدل را مجبور می‌کند تا از تکیه کردن بیش‌از‌حد بر روی مجموعه خاصی از ویژگی‌ها خودداری کند.
+برون‌اندازی – برون‌اندازی روشی است که در شبکه‌های عصبی برای جلوگیری از بیش‌برارزش بر روی داده‌های آموزشی با حذف تصادفی نورون‌ها با احتمال p>0 استفاده می‌شود. این روش مدل را مجبور می‌کند تا از تکیه کردن بیش‌از‌حد بر روی مجموعه خاصی از ویژگی‌ها خودداری کند.
 </div>
 
 <br>

From b3cf27ddab5cb01f113e5383001271cc5ae9e273 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 10 Feb 2019 19:36:03 -0800
Subject: [PATCH 133/531] Add CONTRIBUTORS

---
 CONTRIBUTORS | 143 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 143 insertions(+)
 create mode 100644 CONTRIBUTORS

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
new file mode 100644
index 000000000..7956ce345
--- /dev/null
+++ b/CONTRIBUTORS
@@ -0,0 +1,143 @@
+--ar
+  Amjad Khatabi (translation of deep learning)
+  Zaid Alyafeai (review of deep learning)
+  
+--de
+
+--es 
+  Erick Gabriel Mendoza Flores (translation of deep learning)
+  Fernando Diaz (review of deep learning)
+  Fernando González-Herrera (review of deep learning)
+  Mariano Ramirez (review of deep learning)
+  Juan P. Chavat (review of deep learning)
+  Alonso Melgar López (review of deep learning)
+  Gustavo Velasco-Hernández (review of deep learning)
+  Juan Manuel Nava Zamudio (review of deep learning)
+  
+  Fernando González-Herrera (translation of linear algebra)
+  Fernando Diaz (review of linear algebra)
+  Gustavo Velasco-Hernández (review of linear algebra)
+  Juan P. Chavat (review of linear algebra)
+  
+  David Jiménez Paredes (translation of machine learning tips and tricks)
+  Fernando Diaz (translation of machine learning tips and tricks)
+  Gustavo Velasco-Hernández (review of machine learning tips and tricks)
+  Alonso Melgar-Lopez (review of machine learning tips and tricks)
+
+  Fermin Ordaz (translation of probabilities and statistics)
+  Fernando González-Herrera (review of probabilities and statistics)
+  Alonso Melgar López (review of probabilities and statistics)
+
+  Juan P. Chavat (translation of supervised learning)
+  Fernando Gonzalez-Herrera (review of supervised learning)
+  Fernando Diaz (review of supervised learning)
+  Alonso Melgar-Lopez (review of supervised learning)
+
+  Jaime Noel Alvarez Luna (translation of unsupervised learning)
+  Alonso Melgar López (review of unsupervised learning)
+  Fernando Diaz (review of unsupervised learning)
+  
+--fa
+  AlisterTA (translation of deep learning)
+  Mohammad Karimi (review of deep learning)
+  Erfan Noury (review of deep learning)
+
+  Erfan Noury (translation of linear algebra)
+  Mohammad Karimi (review of linear algebra)
+  
+  AlisterTA (translation of machine learning tips and tricks)
+  Mohammad Reza (translation of machine learning tips and tricks)
+  Erfan Noury (review of machine learning tips and tricks)
+  Mohammad Karimi (review of machine learning tips and tricks)
+
+  Erfan Noury (translation of probabilities and statistics)
+  Mohammad Karimi (review of probabilities and statistics)
+  
+  Amirhosein Kazemnejad (translation of supervised learning)
+  Erfan Noury (review of supervised learning)
+  Mohammad Karimi (review of supervised learning)
+  
+  Erfan Noury (translation of unsupervised learning)
+  Mohammad Karimi (review of unsupervised learning)
+  
+--fr
+  Original authors
+
+--he
+
+--hi
+
+--ja
+
+--pt
+  Gabriel Fonseca (translation of deep learning)
+  Leticia Portella (review of deep learning)
+
+  Gabriel Fonseca (translation of linear algebra)
+  Leticia Portella (review of linear algebra)
+  
+  Fernando Santos (translation of machine learning tips and tricks)
+  Leticia Portella (review of machine learning tips and tricks)
+  Gabriel Fonseca (review of machine learning tips and tricks)
+
+  Leticia Portella (translation of probabilities and statistics)
+  Flavio Clesio (review of probabilities and statistics)
+
+  Leticia Portella (translation of supervised learning)
+  Gabriel Fonseca (review of supervised learning)
+  Flavio Clesio (review of supervised learning)
+  
+  Gabriel Fonseca (translation of unsupervised learning)
+  Tiago Danin (review of unsupervised learning)
+
+--tr
+  Ayyüce Kızrak (translation of convolutional neural networks)
+  Yavuz Kömeçoğlu (review of convolutional neural networks)
+
+  Ekrem Çetinkaya (translation of deep learning)
+  Omer Bukte (review of deep learning)
+  
+  Ayyüce Kızrak (translation of deep learning tips and tricks)
+  Yavuz Kömeçoğlu (review of deep learning tips and tricks)
+  
+  Kadir Tekeli (translation of linear algebra)
+  Ekrem Çetinkaya (review of linear algebra)
+  
+  Seray Beşer (translation of machine learning tips and tricks)
+  Ayyüce Kızrak (review of machine learning tips and tricks)
+  Yavuz Kömeçoğlu (review of machine learning tips and tricks)
+
+  Ayyüce Kızrak (translation of probabilities and statistics)
+  Başak Buluz (review of probabilities and statistics)
+  
+  Başak Buluz (translation of recurrent neural networks)
+  Yavuz Kömeçoğlu (review of recurrent neural networks)
+  
+  Başak Buluz (translation of supervised learning)
+  Ayyüce Kızrak (review of supervised learning)
+  
+  Yavuz Kömeçoğlu (translation of unsupervised learning)
+  Başak Buluz (review of unsupervised learning)
+  
+--uk
+  Gregory Reshetniak (translation of probabilities and statistics)
+  Denys (review of probabilities and statistics)
+  
+--zh
+  Wang Hongnian (translation of supervised learning)
+  Xiaohu Zhu (朱小虎) (review of supervised learning)
+  Chaoying Xue (review of supervised learning)
+
+--zh-tw
+  kevingo (translation of deep learning)
+  TobyOoO (review of deep learning)
+
+  kevingo (translation of supervised learning)
+  accelsao (review of supervised learning)
+
+  kevingo (translation of unsupervised learning)
+  imironhead (review of unsupervised learning)
+  johnnychhsu (review of unsupervised learning)
+
+  kevingo (translation of probabilities and statistics)
+  johnnychhsu (review of probabilities and statistics)

From a307a9b610f32e17fafc54af94639f4d0e28022b Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 10 Feb 2019 19:37:45 -0800
Subject: [PATCH 134/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 7956ce345..8b9b18062 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -41,6 +41,9 @@
   AlisterTA (translation of deep learning)
   Mohammad Karimi (review of deep learning)
   Erfan Noury (review of deep learning)
+  
+  AlisterTA (translation of deep learning tips and tricks)
+  Erfan Noury (review of deep learning tips and tricks)
 
   Erfan Noury (translation of linear algebra)
   Mohammad Karimi (review of linear algebra)

From 5994a6aa63918c2912e3f318c32144a7582a72d3 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Wed, 13 Feb 2019 21:12:32 -0800
Subject: [PATCH 135/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 46c22f32a..691a5e4d1 100644
--- a/README.md
+++ b/README.md
@@ -37,7 +37,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 ## Progression for CS 230 (Deep Learning)
 |Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|[in progress](https://github.com/erfannoury/cheatsheet-translation/issues/9)|done|not started|not started|not started|
+|Convolutional Neural Nets|not started|[in progress](https://github.com/erfannoury/cheatsheet-translation/issues/9)|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
 |Recurrent Neural Nets|not started|[in progress](https://github.com/erfannoury/cheatsheet-translation/issues/10)|done|not started|not started|not started|
 |DL tips and tricks|not started|[in progress](https://github.com/erfannoury/cheatsheet-translation/issues/11)|done|not started|not started|not started|
 

From 6d7a45980e388cedaa3f7e2fe67e9b9028657f67 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Wed, 13 Feb 2019 21:14:55 -0800
Subject: [PATCH 136/531] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 691a5e4d1..2c88fef85 100644
--- a/README.md
+++ b/README.md
@@ -74,8 +74,8 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|not started|not started|done|not started|not started|
 
 
-|Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|
-|:---|:---:|:---:|:---:|:---:|:---:|
+|Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|Magyar|
+|:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|
 |Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|
 |Unsupervised learning|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/93)|

From 02e1e16e1721c8fde12b16b551f1fdbe4d2a5a33 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Wed, 13 Feb 2019 21:15:38 -0800
Subject: [PATCH 137/531] Update README.md

---
 README.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index 2c88fef85..23168c484 100644
--- a/README.md
+++ b/README.md
@@ -76,12 +76,12 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 |Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|Magyar|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|
-|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|
-|Unsupervised learning|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/93)|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/91)|
-|Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/92)|
-|Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/94)|
+|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|not started|
+|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|not started|
+|Unsupervised learning|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/93)|not started|
+|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/91)|not started|
+|Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/92)|not started|
+|Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/94)|not started|
 
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From a2b80e6759d2334b98fd7b02266fc93448f92d49 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Wed, 13 Feb 2019 21:21:39 -0800
Subject: [PATCH 138/531] Update README.md

---
 README.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index 23168c484..1f937522e 100644
--- a/README.md
+++ b/README.md
@@ -47,7 +47,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Recurrent Neural Nets|not started|not started|not started|done|not started|not started|
 |DL tips and tricks|not started|not started|not started|done|not started|not started|
 
-
+started
 |Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|
 |:---|:---:|:---:|:---:|:---:|:---:|
 |Convolutional Neural Nets|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|
@@ -76,12 +76,12 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 |Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|Magyar|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|not started|
-|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|not started|
-|Unsupervised learning|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/93)|not started|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/91)|not started|
-|Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/92)|not started|
-|Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/94)|not started|
+|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
+|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
+|Unsupervised learning|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/93)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
+|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/91)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
+|Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/92)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
+|Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/94)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
 
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From b4714d402178ea0878492ca75bae27ff0bbae49d Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Wed, 13 Feb 2019 21:32:43 -0800
Subject: [PATCH 139/531] Update README.md

---
 README.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 1f937522e..9c54161fa 100644
--- a/README.md
+++ b/README.md
@@ -78,10 +78,10 @@ started
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
 |Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
-|Unsupervised learning|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/93)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/91)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
-|Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/92)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
-|Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/94)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
+|Unsupervised learning|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
+|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
+|Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
+|Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
 
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From 56aaf518373f407a4501ce397500bcf3d9382a17 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Wed, 13 Feb 2019 21:39:47 -0800
Subject: [PATCH 140/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 89224fee1..f1c9a4747 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -72,7 +72,9 @@
 
 --ko
   Wooil Jeong (translation of machine learning tips and tricks)
+  
   Wooil Jeong (translation of probabilities and statistics)
+  
   Kwang Hyeok Ahn (translation of Unsupervised Learning)
 
 --ja

From cfcedc365595054b97cc07ead63681d9488d8d7e Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Wed, 13 Feb 2019 21:41:29 -0800
Subject: [PATCH 141/531] Update README.md

---
 README.md | 1 -
 1 file changed, 1 deletion(-)

diff --git a/README.md b/README.md
index 9c54161fa..b70e6359d 100644
--- a/README.md
+++ b/README.md
@@ -47,7 +47,6 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Recurrent Neural Nets|not started|not started|not started|done|not started|not started|
 |DL tips and tricks|not started|not started|not started|done|not started|not started|
 
-started
 |Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|
 |:---|:---:|:---:|:---:|:---:|:---:|
 |Convolutional Neural Nets|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|

From 16e77ae9e45495bed79f34edc617150c73c785ba Mon Sep 17 00:00:00 2001
From: Leticia Portella <leportella@gmail.com>
Date: Mon, 11 Feb 2019 23:49:25 +0000
Subject: [PATCH 142/531] 80% of pt translation of CNNs

---
 pt/convolutional-neural-networks.md | 718 ++++++++++++++++++++++++++++
 1 file changed, 718 insertions(+)
 create mode 100644 pt/convolutional-neural-networks.md

diff --git a/pt/convolutional-neural-networks.md b/pt/convolutional-neural-networks.md
new file mode 100644
index 000000000..27201dd91
--- /dev/null
+++ b/pt/convolutional-neural-networks.md
@@ -0,0 +1,718 @@
+**Convolutional Neural Networks translation**
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230; Dicas de Redes Neurais Convolucionais
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Aprendizagem profunda
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+&#10230; [Visão geral, Estrutura arquitetural]
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230; [Tipos de camadas, Convolução, Pooling, Totalmente conectada]
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230; [Hiperparâmetros de filtro, Dimensões, Passo, Preenchimento]
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230;[Ajustando hiperparâmetros, Compatibilidade de parâmetros, Complexidade de modelo, Campo receptivo]
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230; [Funções de Ativação, Unidade Linear Retificada, Softmax]
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230;[Detecção de objetos, Tipos de modelos, Detecção, Intersecção por União, Supressão não-máxima, YOLO, R-CNN]
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230; [Verificação / reconhecimento facial, Aprendizado de um tiro, Rede siamesa, Perda tripla]
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230; [Transferência de estilo neural, Ativação, Matriz de estilo, Função de custo de estilo/conteúdo]
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230; [Arquiteturas de truques computacionais, Rede Adversarial Generativa, ResNet, Rede de Iniciação]
+
+<br>
+
+
+**12. Overview**
+
+&#10230; Visão geral
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230; Arquitetura de uma RNC (CNN) - Redes neurais convolucionais, também conhecidas como CNN (em inglês), são tipos específicos de redes neurais que geralmente são compostas pelas seguintes camadas:
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230; A camada convolucional e a camadas de pooling podem ter um ajuste fino considerando os hiperparâmetros que estão descritos na próxima seção. 
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230; Tipos de camadas
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230; Camada convolucional (CONV) - A camada convolucional (CONV) usa filtros que realizam operações de convolução conforme eles escabeuan a entrada I com relação a suas dimensões. Seus hiperparâmetros incluem o tamanho do filtro F e o passo S. O resultado O é chamado de mapa de recursos (feature map) ou mapa de ativação.
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230; Observação: o passo de convolução também pode ser generalizado para os casos 1D e 3D.
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230; Pooling (POOL) - A camada de pooling (POOL) é uma operação de amostragem, tipicamente aplicada depois de uma camada convolucional, que faz alguma invariância espacial. Em particular, pooling máximo e médio são casos especiais de pooling onde o máximo e o médio valor são obtidos, respectivamente.
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230; [Tipo, Propósito, Ilustração, Comentários]
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230; [Pooling máximo, Pooling médio, Cada operação de pooling seleciona o valor máximo da exibição atual, Cada operação de pooling calcula a média dos valores da exibição atual]
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230; [Preserva os recursos detectados, Mais comumente usados, Mapa de recursos de amostragem, Usado no LeNet]
+
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230; Totalmente Conectado (FC) - A camada totalmente conectada (FC opera em uma entrada achatada, onde cada entrada é conectada a todos os neurônios. Se estiver presente, as camadas FC geralmente são encontradas no final das arquiteturas da CNN e podem ser usadas para otimizar objetivos, como pontuações de classes.
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230; Hiperparâmetros de filtros
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230; A camada de convolução contém filtros para os quais é importante conhecer o significado por trás de seus hiperparâmetros.
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230; Dimensões de um filtro - Um filtro de tamanho F×F aplicado a uma entrada contendo C canais é um volume de tamanho F×F×C que executa convoluções em uma entrada de tamanho I×I×C e produz um mapa de recursos (também chamado de mapa de ativação) da saída de tamanho O×O×1.
+
+<br>
+
+
+**26. Filter**
+
+&#10230; Filtros
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230; Observação: a aplicação de K filtros de tamanho F×F resulta em um mapa de recursos de saída de tamanho O×O×K.
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230; Passo - Para uma operação convolucional ou de pooling, o passo S denota o número de pixels que a janela se move após cada operação.
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230; Zero preenchimento (Zero-padding) - Zero preenchimento denota o processo de adicionar P zeros em cada lado das fronteiras do input. Esse valor pode ser especificado manualmente ou automaticamente ajustado através de um dos três modelos abaixo:
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230; [Modo, Valor, Ilustração, Propósito, Válido, Idêntico, Completo]
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230; [Sem preenchimento, Descarta a última convolução se as dimensões não corresponderem, Preenchimento de tal forma que o tamanho do mapa de recursos tenha tamanho ⌈IS⌉, Tamanho da saída é matematicamente conveniente, Também chamado de 'meio' preenchimento, Preenchimento máximo de tal forma que convoluções finais são aplicadas nos limites de a entrada, Filtro 'vê' a entrada de ponta a ponta]
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230; Ajuste de hiperparâmetros
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230; Compatibilidade de parâmetro na camada convolucional - Considerando I o comprimento do tamanho do volume da entrada, F o tamanho do filtro, P a quantidade de preenchimento de zero (zero-padding) e S o tamanho do passo, então o tamanho de saída O do mapa de recursos ao longo dessa dimensão é dado por:
+
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230; [Entrada, Filtro, Saída]
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230; Observação: diversas vezes, Pstart=Pend≜P, em cujo caso podemos substituir Pstart+Pen por 2P na fórmula acima.
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230; Entendendo a complexidade do modelo - Para avaliar a complexidade de um modelo, é geralmente útil determinar o número de parâmetros que a arquitetura deverá ter. Em uma determinada camada de uma rede neural convolucional, ela é dada da seguinte forma: 
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230; [Ilustração, Tamanho da entrada, Tamanho da saída, Número de parâmetros, Observações]
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230; [Um parâmetro de viés (bias parameter) por filtro, Na maioria dos casos, S<F, Uma escolha comum para K é 2C]
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230; [Operação de pooling feita pelo canal, Na maior parte dos casos, S=F]
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230; [Entrada é achatada, Um parâmetro de viés (bias parameter) por neurônio, O número de neurônios FC está livre de restrições estruturais]
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230; Campo receptive - O campo receptivo na camada k é a área denotada por Rk×Rk da entrada que cada pixel do k-ésimo mapa de ativação pode 'ver'. Ao chamar Fj o tamanho do filtro da camada j e Si o valor do passo da camada i e com a convenção S0=1, o campo receptivo na camada k pode ser calculado com a fórmula:
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230; No exemplo abaixo, temos que F1=F2=3 e S1=S2=1, que resulta em R2=1+2⋅1+2⋅1=5.
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230; Funções de ativação comumente usadas
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230; Unidade Linear Retificada (Rectified Linear Unit) - A camada unitária linear retificada (ReLU) é uma função de ativação g que é usada em todos os elementos do volume. Tem como objetivo introduzir não linearidades na rede. Suas variantes estão resumidas na tabela abaixo:
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230; [ReLU, Leaky ReLU, ELU, com]
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230; [Complexidades de não-linearidade biologicamente interpretáveis, Endereça o problema da ReLU para valores negativos, Diferenciável em todos os lugares]
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230; Softmax - O passo de softmax pode ser visto como uma função logística generalizada que pega como entrada um vetor de pontuações x∈Rn e retorna um vetor de probabilidades p∈Rn através de uma função softmax no final da arquitetura. É definida como:
+
+<br>
+
+
+**48. where**
+
+&#10230; onde
+
+<br>
+
+
+**49. Object detection**
+
+&#10230; Detecção de objeto
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230; Tipos de modelos - Existem 3 tipos de algoritmos de reconhecimento de objetos, para o qual a natureza do que é previsto é diferente para cada um. Eles estão descritos na tabela abaixo:
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230; [Classificação de imagem, Classificação com localização, Detecção]
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230; [Urso de pelúcia, Livro]
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230; [Classifica uma imagem, Prevê a probabilidade de um objeto, Detecta um objeto em uma imagem, Prevê a probabilidade de objeto e onde ele está localizado, Detecta vários objetos em uma imagem, Prevê probabilidades de objetos e onde eles estão localizados]
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230; [CNN tradicional, YOLO simplificado, R-CNN, YOLO, R-CNN]
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230; Detecção - No contexto da detecção de objetos, diferentes métodos são usados dependendo se apenas queremos localizar o objeto ou detectar uma forma mais complexa na imagem. Os dois principais são resumidos na tabela abaixo:
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230; [Detecção de caixa limite, Detecção de marco]
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230; [Detecta parte da imagem onde o objeto está localizado, Detecta a forma ou característica de um objeto (e.g. olhos), Mais granular]
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230; [Caixa central (bx,by), altura bh e largura bw, Pontos de referência (l1x,l1y), ..., (lnx,lny)]
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230; Interseção sobre União (Intersection over Union) - Interseção sobre União, também conhecida como IoU, é uma funçãi que quantifica quão corretamente posicionado uma caixa de delimitação predita Bp está sobre a caixa de delimitação real Ba. É definida por:
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230; Observação: temos que IoU∈[0,1]. Por convenção, uma caixa de delimitação predita Bp é considerada razoavelmente boa se IoU(Bp,Ba)⩾0.5.
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230; Caixas de ancoragem (Anchor boxes) - Caixas de ancoragem é uma técnica usada para predizer caixas de delimitação que se sobrepões. Na prática, a rede tem permissão para predizer mais de uma caixa simultaneamente, oonde cada caixa prevista é restrita a ter um dado conjunto de propriedades geométricas. Por exemplo, a primeira predição pode ser potencialmente uma caixa retangular de uma determinada forma, enquanto a segunda pode ser outra caixa retangular de uma forma geométrica diferente.
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230; Supressão não máxima (Non-max suppression) - A técnica supressão não máxima visa remover caixas de delimitação de um mesmo objeto que estão duplicadas e se sobrepõe, selecionando as mais representativas. Depois de ter removido todas as caixas que contém uma predição menor que 0.6. os seguintes passos são repetidos enquanto existem caixas remanescentes:
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230; [Para uma dada classe, Passo 1: Pegue a caixa com a maior predição de probabilidade., Passo 2: Descarte todas as caixas que tem IoU⩾0.5 com a caixa anterior.]
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230; [Predição de caixa, Seleção de caixa com máxima probabilidade, Remoção de sobreposições da mesma classe, Caixas de delimitação final]
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230; YOLO - Você Apenas Vê Uma Vez (You Only Look Once - YOLO) é um algoritmo de detecção de objeto que realiza os seguintes passos:
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230; [Passo 1: Divide a imagem de input em uma grade G×G., Passo 2: Para cada célula da grade, rode uma CNN que prevê o valor y da seguinte forma:, repita k vezes]
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230; onde pc é a probabilidade de detecção do objeto, bx,by,bh,bw são as proprioedades das caixas delimitadoras detectadas, c1,...,cp é uma representação única (one-hot representation) de quais das classes p foram detectadas, e k é o número de caixas de ancoragem.
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230; Passo 3:  Rode o algoritmo de supressão não máximo para remover qualquer caixa delimitadora duplicada e que se sobrepõe.
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230; [Imagem original, Divisão em uma grade GxG, Caixa delimitadora prevista, Supressão não máxima]
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230;
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230;
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230;
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230;
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230;
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230;
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230;
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230;
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230;
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230;
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230; Transferência de estilo neural
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230; Motivação - O objetivo da transferência de estilo neural é gerar uma imagem G baseada num dado conteúdo C com um estilo S. 
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230; [Conteúdo C, Estulo S, Imagem gerada G]
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230; Ativação - Em uma dada camada l, a ativação é denotada como a[l] e suas dimensões são nH×nw×nc
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230;
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230;
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230;
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230;
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230;
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230;
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230;
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230;
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230;
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230;
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230; Os resumos de Aprendizagem Profunda estão disponíveis em português.
+
+<br>
+
+
+**98. Original authors**
+
+&#10230; Autores Originais
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230; Traduzido por Leticia Portella
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230; Revisado por
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230; Ver versão em PDF no GitHub.
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230;
+
+<br>

From 75b252f10dab70c20ab20beb70975faa910afd21 Mon Sep 17 00:00:00 2001
From: Leticia Portella <leportella@gmail.com>
Date: Mon, 11 Feb 2019 23:52:06 +0000
Subject: [PATCH 143/531] Update README and CONTRIBUTORS to CNNs in pt

---
 CONTRIBUTORS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index f1c9a4747..f8e720dfc 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -100,6 +100,8 @@
   Gabriel Fonseca (translation of unsupervised learning)
   Tiago Danin (review of unsupervised learning)
 
+  Leticia Portella (translation of CNNs)
+
 --tr
   Ayyüce Kızrak (translation of convolutional neural networks)
   Yavuz Kömeçoğlu (review of convolutional neural networks)

From f5337aac405e0f7bac1c6096a204deaae22532b3 Mon Sep 17 00:00:00 2001
From: Leticia Portella <leportella@gmail.com>
Date: Fri, 15 Feb 2019 16:30:39 +0000
Subject: [PATCH 144/531] Added 100% on PT CNN

---
 pt/convolutional-neural-networks.md | 50 ++++++++++++++---------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/pt/convolutional-neural-networks.md b/pt/convolutional-neural-networks.md
index 27201dd91..5d7565a7c 100644
--- a/pt/convolutional-neural-networks.md
+++ b/pt/convolutional-neural-networks.md
@@ -489,77 +489,77 @@
 
 **70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
 
-&#10230;
+&#10230; Observação: Quando pc=0, então a rede não detecta nenhum objeto. Nesse caso, as predições correspondentes bx,...,cp devem ser ignoradas.
 
 <br>
 
 
 **71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
 
-&#10230;
+&#10230; R-CNN - Região com Redes Neurais Convolucionais (R-CNN) é um algoritmo de detecção de objetos que primeiro segmenta a imagem para encontrar potenciais caixas de delimitação relevantes e então roda o algoritmo de detecção para encontrar os objetos mais prováveis dentro das caixas de delimitação.
 
 <br>
 
 
 **72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
 
-&#10230;
+&#10230; [Imagem original, Segmentação, Predição da caixa delimitadora, Supressão não-máxima]
 
 <br>
 
 
 **73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
 
-&#10230;
+&#10230; Observação: embora o algoritmo original seja computacionalmente caro e lento, arquiteturas mais recentes, como o Fast R-CNN e o Faster R-CNN, permitiram que o algoritmo fosse executado mais rapidamente.
 
 <br>
 
 
 **74. Face verification and recognition**
 
-&#10230;
+&#10230; Verificação facial e reconhecimento
 
 <br>
 
 
 **75. Types of models ― Two main types of model are summed up in table below:**
 
-&#10230;
+&#10230; Tipos de modelos - Os dois principais tipos de modelos são resumidos na tabela abaixo:
 
 <br>
 
 
 **76. [Face verification, Face recognition, Query, Reference, Database]**
 
-&#10230;
+&#10230; [Verificação facial, Reconhecimento facial, Consulta, Referência, Banco de dados]
 
 <br>
 
 
 **77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
 
-&#10230;
+&#10230; [Esta é a pessoa correta?, Pesquisa um-para-um, Esta é uma das K pessoas no banco de dados?, Pesquisa um-para-muitos]
 
 <br>
 
 
 **78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
 
-&#10230;
+&#10230; Aprendizado de Tiro Único (One Shot Learning) - One Shot Learning é um algoritmo de verificação facial que utiliza um conjunto de treinamento limitado para aprender uma função de similaridade que quantifica o quão diferentes são as duas imagens. A função de similaridade aplicada a duas imagens é frequentemente denotada como  d(imagem 1, imagem 2).
 
 <br>
 
 
 **79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
 
-&#10230;
+&#10230; Rede Siamesa (Siamese Network) - Siamese Networks buscam aprender como codificar imagens para depois quantificar quão diferentes são as duas imagens. Para uma imagem de entrada x(i), o resultado codificado é normalmente denotado como f(x(i)).
 
 <br>
 
 
 **80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
 
-&#10230;
+&#10230; Perda tripla (Triplet loss) - A perda tripla ℓ é uma função de perda (loss function) computada na representação da encorporação de três imagens A (âncora), P (positiva) e N (negativa). O exemplo da âncora e positivo pertencem à mesma classe, enquanto o exemplo negativo pertence a uma classe diferente. Chamando o parâmetro de margem de α∈R+, essa função de perda é calculada da seguinte forma:
 
 <br>
 
@@ -594,84 +594,84 @@
 
 **85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
 
-&#10230;
+&#10230; Função de custo de conteúdo (Content cost function) - A função de custo de conteúdo Jcontent(C,G) é usada para determinar como a imagem gerada G difere da imagem de conteúdo original C. Ela é definida da seguinte forma:
 
 <br>
 
 
 **86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
 
-&#10230;
+&#10230; Matriz de estilo - A matriz de estilo G[l] de uma determinada camada l é a matriz de Gram em que cada um dos seus elementos G[l]kk′ quantificam quão correlacionados são os canais k e k′. Ela é definida com respeito às ativações a[l] da seguinte forma:
 
 <br>
 
 
 **87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
 
-&#10230;
+&#10230; Observação: a matriz de estilo para a imagem estilizada e para a imagem gerada são denotadas como G[l] (S) e G[l] (G), respectivamente.
 
 <br>
 
 
 **88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
 
-&#10230;
+&#10230; Função de custo de estilo (Style cost function) - A função de custo de estilo Jstyle(S,G) é usada para determinar como a imagem gerada G difere do estilo S. Ela é definida da seguinte forma:
 
 <br>
 
 
 **89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
 
-&#10230;
+&#10230; Função de custo geral (Overall cost function) é definida como sendo a combinação das funções de custo do conteúdo e do estilo, ponderada pelos parâmetros α,β, como mostrado abaixo:
 
 <br>
 
 
 **90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
 
-&#10230;
+&#10230; Observação: um valor de α maior irá fazer com que o modelo se preocupe mais com o conteúdo enquanto um maior valor de β irá fazer com que ele se preocupe mais com o estilo.
 
 <br>
 
 
 **91. Architectures using computational tricks**
 
-&#10230;
+&#10230; Arquiteturas usando truques computacionais
 
 <br>
 
 
 **92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
 
-&#10230;
+&#10230; Rede Adversarial Gerativa (Generative Adversarial Network) - As Generaive Adversarial Networks, também conhecidas como GANs, são compostas de um modelo generativo e um modelo discriminativo, onde o modelo generativo visa gerar a saída mais verdadeira que será alimentada na discriminativa que visa diferenciar a imagem gerada e verdadeira.
 
 <br>
 
 
 **93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
 
-&#10230;
+&#10230; [Treinamento, Ruído, Imagem real, Gerador, Discriminador, Falsa real]
 
 <br>
 
 
 **94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
 
-&#10230;
+&#10230; Observação: casos de uso usando variações de GANs incluem texto para imagem, geração de música e síntese.
 
 <br>
 
 
 **95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
 
-&#10230;
+&#10230; ResNet - A arquitetura de Rede Residual (também chamada de ResNet) usa blocos residuais com um alto número de camadas para diminuir o erro de treinamento. O bloco residual possui a seguinte equação caracterizadora:
 
 <br>
 
 
 **96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
 
-&#10230;
+&#10230; Rede de Iniciação - Esta arquitetura utiliza módulos de iniciação e visa experimentar diferentes convoluções, a fim de aumentar seu desempenho através da diversificação de recursos. Em particular, ele usa o truque de convolução 1×1 para limitar a carga computacional.
 
 <br>
 
@@ -699,7 +699,7 @@
 
 **100. Reviewed by X, Y and Z**
 
-&#10230; Revisado por
+&#10230; Revisado por X, Y e Z
 
 <br>
 
@@ -713,6 +713,6 @@
 
 **102. By X and Y**
 
-&#10230;
+&#10230; Por X e Y
 
 <br>

From 9853e1c8a1abcf1f21104cd7e5154603688c2407 Mon Sep 17 00:00:00 2001
From: Leticia Portella <leportella@gmail.com>
Date: Sat, 16 Feb 2019 14:00:41 +0000
Subject: [PATCH 145/531] Added revision fixes for CNN in pt

---
 pt/convolutional-neural-networks.md | 30 ++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/pt/convolutional-neural-networks.md b/pt/convolutional-neural-networks.md
index 5d7565a7c..d31fb0695 100644
--- a/pt/convolutional-neural-networks.md
+++ b/pt/convolutional-neural-networks.md
@@ -60,7 +60,7 @@
 
 **9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
 
-&#10230; [Verificação / reconhecimento facial, Aprendizado de um tiro, Rede siamesa, Perda tripla]
+&#10230; [Verificação / reconhecimento facial, Aprendizado de disparo único, Rede siamesa, Perda tripla]
 
 <br>
 
@@ -88,14 +88,14 @@
 
 **13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
 
-&#10230; Arquitetura de uma RNC (CNN) - Redes neurais convolucionais, também conhecidas como CNN (em inglês), são tipos específicos de redes neurais que geralmente são compostas pelas seguintes camadas:
+&#10230; Arquitetura de uma RNC tradicional (CNN) - Redes neurais convolucionais, também conhecidas como CNN (em inglês), são tipos específicos de redes neurais que geralmente são compostas pelas seguintes camadas:
 
 <br>
 
 
 **14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
 
-&#10230; A camada convolucional e a camadas de pooling podem ter um ajuste fino considerando os hiperparâmetros que estão descritos na próxima seção. 
+&#10230; A camada convolucional e a camadas de pooling podem ter um ajuste fino considerando os hiperparâmetros que estão descritos nas próximas seções. 
 
 <br>
 
@@ -109,7 +109,7 @@
 
 **16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
 
-&#10230; Camada convolucional (CONV) - A camada convolucional (CONV) usa filtros que realizam operações de convolução conforme eles escabeuan a entrada I com relação a suas dimensões. Seus hiperparâmetros incluem o tamanho do filtro F e o passo S. O resultado O é chamado de mapa de recursos (feature map) ou mapa de ativação.
+&#10230; Camada convolucional (CONV) - A camada convolucional (CONV) usa filtros que realizam operações de convolução conforme eles escaneiam a entrada I com relação a suas dimensões. Seus hiperparâmetros incluem o tamanho do filtro F e o passo S. O resultado O é chamado de mapa de recursos (feature map) ou mapa de ativação.
 
 <br>
 
@@ -123,7 +123,7 @@
 
 **18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
 
-&#10230; Pooling (POOL) - A camada de pooling (POOL) é uma operação de amostragem, tipicamente aplicada depois de uma camada convolucional, que faz alguma invariância espacial. Em particular, pooling máximo e médio são casos especiais de pooling onde o máximo e o médio valor são obtidos, respectivamente.
+&#10230; Pooling (POOL) - A camada de pooling (POOL) é uma operação de amostragem (downsampling), tipicamente aplicada depois de uma camada convolucional, que faz alguma invariância espacial. Em particular, pooling máximo e médio são casos especiais de pooling onde o máximo e o médio valor são obtidos, respectivamente.
 
 <br>
 
@@ -144,7 +144,7 @@
 
 **21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
 
-&#10230; [Preserva os recursos detectados, Mais comumente usados, Mapa de recursos de amostragem, Usado no LeNet]
+&#10230; [Preserva os recursos detectados, Mais comumente usados, Mapa de recursos de amostragem (downsample), Usado no LeNet]
 
 
 <br>
@@ -201,7 +201,7 @@
 
 **29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
 
-&#10230; Zero preenchimento (Zero-padding) - Zero preenchimento denota o processo de adicionar P zeros em cada lado das fronteiras do input. Esse valor pode ser especificado manualmente ou automaticamente ajustado através de um dos três modelos abaixo:
+&#10230; Zero preenchimento (Zero-padding) - Zero preenchimento denota o processo de adicionar P zeros em cada lado das fronteiras de entrada. Esse valor pode ser especificado manualmente ou automaticamente ajustado através de um dos três modelos abaixo:
 
 <br>
 
@@ -286,14 +286,14 @@
 
 **41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
 
-&#10230; Campo receptive - O campo receptivo na camada k é a área denotada por Rk×Rk da entrada que cada pixel do k-ésimo mapa de ativação pode 'ver'. Ao chamar Fj o tamanho do filtro da camada j e Si o valor do passo da camada i e com a convenção S0=1, o campo receptivo na camada k pode ser calculado com a fórmula:
+&#10230; Campo receptivo - O campo receptivo na camada k é a área denotada por Rk×Rk da entrada que cada pixel do k-ésimo mapa de ativação pode 'ver'. Ao chamar Fj o tamanho do filtro da camada j e Si o valor do passo da camada i e com a convenção S0=1, o campo receptivo na camada k pode ser calculado com a fórmula:
 
 <br>
 
 
 **42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
 
-&#10230; No exemplo abaixo, temos que F1=F2=3 e S1=S2=1, que resulta em R2=1+2⋅1+2⋅1=5.
+&#10230; No exemplo abaixo, temos que F1=F2=3 e S1=S2=1, o que resulta em R2=1+2⋅1+2⋅1=5.
 
 <br>
 
@@ -412,7 +412,7 @@
 
 **59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
 
-&#10230; Interseção sobre União (Intersection over Union) - Interseção sobre União, também conhecida como IoU, é uma funçãi que quantifica quão corretamente posicionado uma caixa de delimitação predita Bp está sobre a caixa de delimitação real Ba. É definida por:
+&#10230; Interseção sobre União (Intersection over Union) - Interseção sobre União, também conhecida como IoU, é uma função que quantifica quão corretamente posicionado uma caixa de delimitação predita Bp está sobre a caixa de delimitação real Ba. É definida por:
 
 <br>
 
@@ -426,14 +426,14 @@
 
 **61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
 
-&#10230; Caixas de ancoragem (Anchor boxes) - Caixas de ancoragem é uma técnica usada para predizer caixas de delimitação que se sobrepões. Na prática, a rede tem permissão para predizer mais de uma caixa simultaneamente, oonde cada caixa prevista é restrita a ter um dado conjunto de propriedades geométricas. Por exemplo, a primeira predição pode ser potencialmente uma caixa retangular de uma determinada forma, enquanto a segunda pode ser outra caixa retangular de uma forma geométrica diferente.
+&#10230; Caixas de ancoragem (Anchor boxes) - Caixas de ancoragem é uma técnica usada para predizer caixas de delimitação que se sobrepõem. Na prática, a rede tem permissão para predizer mais de uma caixa simultaneamente, onde cada caixa prevista é restrita a ter um dado conjunto de propriedades geométricas. Por exemplo, a primeira predição pode ser potencialmente uma caixa retangular de uma determinada forma, enquanto a segunda pode ser outra caixa retangular de uma forma geométrica diferente.
 
 <br>
 
 
 **62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
 
-&#10230; Supressão não máxima (Non-max suppression) - A técnica supressão não máxima visa remover caixas de delimitação de um mesmo objeto que estão duplicadas e se sobrepõe, selecionando as mais representativas. Depois de ter removido todas as caixas que contém uma predição menor que 0.6. os seguintes passos são repetidos enquanto existem caixas remanescentes:
+&#10230; Supressão não máxima (Non-max suppression) - A técnica supressão não máxima visa remover caixas de delimitação de um mesmo objeto que estão duplicadas e se sobrepõem, selecionando as mais representativas. Depois de ter removido todas as caixas que contém uma predição menor que 0.6. os seguintes passos são repetidos enquanto existem caixas remanescentes:
 
 <br>
 
@@ -461,14 +461,14 @@
 
 **66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
 
-&#10230; [Passo 1: Divide a imagem de input em uma grade G×G., Passo 2: Para cada célula da grade, rode uma CNN que prevê o valor y da seguinte forma:, repita k vezes]
+&#10230; [Passo 1: Divide a imagem de entrada em uma grade G×G., Passo 2: Para cada célula da grade, roda uma CNN que prevê o valor y da seguinte forma:, repita k vezes]
 
 <br>
 
 
 **67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
 
-&#10230; onde pc é a probabilidade de detecção do objeto, bx,by,bh,bw são as proprioedades das caixas delimitadoras detectadas, c1,...,cp é uma representação única (one-hot representation) de quais das classes p foram detectadas, e k é o número de caixas de ancoragem.
+&#10230; onde pc é a probabilidade de detecção do objeto, bx,by,bh,bw são as propriedades das caixas delimitadoras detectadas, c1,...,cp é uma representação única (one-hot representation) de quais das classes p foram detectadas, e k é o número de caixas de ancoragem.
 
 <br>
 
@@ -545,7 +545,7 @@
 
 **78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
 
-&#10230; Aprendizado de Tiro Único (One Shot Learning) - One Shot Learning é um algoritmo de verificação facial que utiliza um conjunto de treinamento limitado para aprender uma função de similaridade que quantifica o quão diferentes são as duas imagens. A função de similaridade aplicada a duas imagens é frequentemente denotada como  d(imagem 1, imagem 2).
+&#10230; Aprendizado de Disparo Único (One Shot Learning) - One Shot Learning é um algoritmo de verificação facial que utiliza um conjunto de treinamento limitado para aprender uma função de similaridade que quantifica o quão diferentes são as duas imagens. A função de similaridade aplicada a duas imagens é frequentemente denotada como  d(imagem 1, imagem 2).
 
 <br>
 

From 8b9b0c47cda406c3bd5258e4ce56585ffa92e4af Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 17 Feb 2019 11:11:02 -0800
Subject: [PATCH 146/531] Merge #129

---
 fa/convolutional-neural-networks.md | 923 ++++++++++++++++++++++++++++
 1 file changed, 923 insertions(+)
 create mode 100644 fa/convolutional-neural-networks.md

diff --git a/fa/convolutional-neural-networks.md b/fa/convolutional-neural-networks.md
new file mode 100644
index 000000000..ee4201100
--- /dev/null
+++ b/fa/convolutional-neural-networks.md
@@ -0,0 +1,923 @@
+**Convolutional Neural Networks translation**
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+<div dir="rtl">
+راهنمای کوتاه شبکه‌های عصبی پیچشی (کانولوشنی)
+</div>  
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+<div dir="rtl">
+کلاس CS 230 - یادگیری عمیق
+</div>
+<br>
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+<div dir="rtl">
+[نمای کلی، ساختار معماری]
+</div>
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+<div dir="rtl">
+[انواع لایه، کانولوشنی، ادغام، تمام‌متصل]
+</div>
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+<div dir="rtl">
+[ابرفراسنج‌های فیلتر، ابعاد، گام، حاشیه] 
+</div>
+<br>
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+<div dir="rtl">
+[تنظیم ابرفراسنج‌ها، سازش‌پذیری فراسنج، پیچیدگی مدل،  ناحیه‌ی تاثیر]
+</div>
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+<div dir="rtl">
+[توابع فعال‌سازی، تابع یکسوساز خطی، تابع بیشینه‌ی هموار] 
+</div>
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+<div dir="rtl">
+[شناسایی شیء، انواع مدل‌ها، شناسایی، نسبت هم‌پوشانی اشتراک به اجتماع، فروداشت غیربیشینه، YOLO، R-CNN]
+</div>
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+<div dir="rtl">
+[تایید/بازشناسایی چهره، یادگیری یک‌باره‌ای (One shot)، شبکه‌ی Siamese، خطای سه‌گانه]
+</div> 
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+<div dir="rtl">
+[انتقالِ سبکِ عصبی، فعال سازی، ماتریسِ سبک، تابع هزینه‌ی محتوا/سبک]
+</div>
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+<div dir="rtl">
+[معماری‌های با ترفندهای محاسباتی، شبکه‌ی هم‌آوردِ مولد، ResNet، شبکه‌ی Inception]
+</div>
+
+<br>
+
+
+**12. Overview**
+
+<div dir="rtl">
+نمای کلی
+</div>
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+<div dir="rtl">
+معماری یک CNN سنتی – شبکه‌های عصبی مصنوعی پیچشی، که همچنین با عنوان CNN شناخته می شوند، یک نوع خاص از شبکه های عصبی هستند که عموما از لایه‌های زیر تشکیل شده‌اند:
+</div>
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+<div dir="rtl">
+لایه‌ی کانولوشنی و لایه‌ی ادغام می‌توانند به نسبت ابرفراسنج‌هایی که در بخش‌های بعدی بیان شده‌اند تنظیم و تعدیل شوند.
+</div>
+
+<br>
+
+
+**15. Types of layer**
+
+<div dir="rtl">
+انواع لایه‌ها
+</div> 
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+<div dir="rtl">
+لایه کانولوشنی (CONV) - لایه کانولوشنی (CONV) از فیلترهایی استفاده می‌کند که عملیات کانولوشنی را در هنگام پویش ورودی I به نسبت ابعادش، اجرا می‌کند. ابرفراسنج‌های آن شامل اندازه فیلتر F و گام S هستند. خروجی حاصل شده O نگاشت ویژگی یا نگاشت فعال‌سازی نامیده می‌شود.
+</div>
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+<div dir="rtl">
+نکته: مرحله کانولوشنی همچنین می‌تواند به موارد یک بُعدی و سه بُعدی تعمیم داده شود.
+</div>
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+<div dir="rtl">
+لایه ادغام (POOL) - لایه ادغام (POOL) یک عمل نمونه‌کاهی است، که معمولا بعد از یک لایه کانولوشنی اعمال می‌شود، که تا حدی منجر به ناوردایی مکانی می‌شود. به طور خاص، ادغام بیشینه و میانگین انواع خاص ادغام هستند که به ترتیب مقدار بیشینه و میانگین گرفته می‌شود.
+</div>
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+<div dir="rtl">
+[نوع، هدف، نگاره، توضیحات]
+</div>
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+<div dir="rtl">
+[ادغام بیشینه، ادغام میانگین، هر عمل ادغام مقدار بیشینه‌ی نمای فعلی را انتخاب می‌کند، هر عمل ادغام مقدار میانگینِ نمای فعلی را انتخاب می‌کند]
+</div>
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+<div dir="rtl">
+[ویژگی‌های شناسایی شده را حفظ می‌کند، اغلب مورد استفاده قرار می‌گیرد، کاستن نگاشت ویژگی، در (معماری) LeNet استفاده شده است]
+</div>
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+<div dir="rtl">
+تمام‌متصل (FC) - لایه‌ی تمام‌متصل (FC) بر روی یک ورودی مسطح به طوری ‌که هر ورودی به تمامی نورون‌ها متصل است، عمل می‌کند. در صورت وجود، لایه‌های FC معمولا در انتهای معماری‌های CNN یافت می‌شوند و می‌توان آن‌ها را برای بهینه‌سازی اهدافی مثل امتیازات کلاس به‌ کار برد.
+</div>
+<br>
+
+
+**23. Filter hyperparameters**
+
+<div dir="rtl">
+ابرفراسنج‌های فیلتر
+</div>
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+<div dir="rtl">
+لایه کانولوشنی شامل فیلترهایی است که دانستن مفهوم نهفته در فراسنج‌های آن اهمیت دارد.
+</div>
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+<div dir="rtl">
+ابعاد یک فیلتر - یک فیلتر به اندازه F×F اعمال شده بر روی یک ورودیِ حاوی C کانال، یک توده F×F×C است که (عملیات) پیچشی بر روی یک ورودی به اندازه I×I×C اعمال می‌کند و یک نگاشت ویژگی خروجی (که همچنین نگاشت فعال‌سازی نامیده می‌شود) به اندازه O×O×1 تولید می‌کند.
+</div>
+
+<br>
+
+
+**26. Filter**
+
+<div dir="rtl">
+فیلتر
+</div>
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+<div dir="rtl">
+نکته: اعمال K فیلتر به اندازه‌ی F×F، منتج به یک نگاشت ویژگی خروجی به اندازه O×O×K می‌شود.
+</div>
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+<div dir="rtl">
+گام – در یک عملیات ادغام یا پیچشی، اندازه گام S به تعداد پیکسل‌هایی که پنجره بعد از هر عملیات جابه‌جا می‌شود، اشاره دارد.
+</div>
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+<div dir="rtl">
+حاشیه‌ی صفر – حاشیه‌ی صفر به فرآیند افزودن P صفر به هر طرف از کرانه‌های ورودی اشاره دارد. این مقدار می‌تواند به طور دستی مشخص شود یا به طور خودکار به سه روش زیر تعیین گردد:
+</div>
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+<div dir="rtl">
+[نوع، مقدار، نگاره، هدف، Valid، Same، Full]
+</div>
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+<div dir="rtl">
+[فاقد حاشیه، اگر ابعاد مطابقت ندارد آخرین کانولوشنی را رها کن، (اعمال) حاشیه به طوری که اندازه نگاشت ویژگی ⌈IS⌉ باشد، (محاسبه) اندازه خروجی به لحاظ ریاضیاتی آسان است، همچنین حاشیه‌ی 'نیمه' نامیده می‌شود، بالاترین حاشیه (اعمال می‌شود) به طوری که (عملیات) کانولوشنی انتهایی بر روی مرزهای ورودی اعمال می‌شود، فیلتر ورودی را به صورت پکپارچه 'می‌پیماید']
+</div>
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+<div dir="rtl">
+تنظیم ابرفراسنج‌ها
+</div>
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+<div dir="rtl">
+سازش‌پذیری فراسنج در لایه کانولوشنی – با ذکر I به عنوان طول اندازه توده ورودی، F طول فیلتر، P میزان حاشیه‌ی صفر، S گام، اندازه خروجی نگاشت ویژگی O در امتداد ابعاد خواهد بود:
+</div>
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+<div dir="rtl">
+[ورودی، فیلتر، خروجی]
+</div>
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+<div dir="rtl">
+نکته: اغلب Pstart=Pend≜P است، در این صورت Pstart+Pend را می‌توان با  2 Pدر فرمول بالا جایگزین کرد.
+</div>
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+<div dir="rtl">
+درک پیچیدگی مدل – برای برآورد پیچیدگی مدل، اغلب تعیین تعداد فراسنج‌هایی که معماری آن می‌تواند داشته باشد، مفید است. در یک لایه مفروض شبکه پیچشی عصبی این امر به صورت زیر انجام می‌شود:
+</div>
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+<div dir="rtl">
+[نگاره، اندازه ورودی، اندازه خروجی، تعداد فراسنج‌ها، ملاحظات]
+</div>
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+<div dir="rtl">
+[یک پیش‌قدر به ازای هر فیلتر، در بیشتر موارد S&lt;F است، یک انتخاب رایج برای K، 2C است]
+</div>
+
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+<div dir="rtl">
+[عملیات ادغام به صورت کانال‌به‌کانال انجام میشود، در بیشتر موارد S=F است]
+</div>
+
+<br>
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+<div dir="rtl">
+[ورودی مسطح شده است، یک پیش‌قدر به ازای هر نورون، تعداد نورون‌های FC فاقد محدودیت‌های ساختاری‌ست]
+</div>
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+<div dir="rtl">
+ناحیه تاثیر – ناحیه تاثیر در لایه k محدوده‌ای از ورودی Rk×Rk است که هر پیکسلِ kاٌم نگاشت ویژگی می‌تواند 'ببیند'. با ذکر Fj به عنوان اندازه فیلتر لایه j و Si مقدار گام لایه i و با این توافق که S0=1 است، ناحیه تاثیر در لایه k با فرمول زیر محاسبه می‌شود:
+</div>
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+<div dir="rtl">
+در مثال زیر داریم، F1=F2=3 و S1=S2=1 که منتج به R2=1+2⋅1+2⋅1=5 می‌شود.
+</div>
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+<div dir="rtl">
+توابع فعال‌سازی پرکاربرد
+</div>
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+<div dir="rtl">
+تابع یکسوساز خطی – تابع یکسوساز خطی (ReLU) یک تابع فعال‌سازی g است که بر روی تمامی عناصر توده اعمال می‌شود. هدف آن ارائه (رفتار) غیرخطی به شبکه است. انواع آن در جدول زیر به‌صورت خلاصه آمده‌اند:
+</div>
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+<div dir="rtl">
+[ReLU ، ReLUنشت‌دار، ELU، با]
+</div>
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+<div dir="rtl">
+[پیچیدگی‌های غیر خطی که از دیدگاه زیستی قابل تفسیر هستند، مسئله افول ReLU برای مقادیر منفی را مهار می‌کند، در تمامی نقاط مشتق‌پذیر است]
+</div>
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+<div dir="rtl">
+بیشینه‌ی هموار – مرحله بیشینه‌ی هموار را می‌توان به عنوان یک تابع لجستیکی تعمیم داده شده که یک بردار x∈Rn را از ورودی می‌گیرد و یک بردار خروجی احتمال p∈Rn، به‌واسطه‌ی تابع بیشینه‌ی هموار در انتهای معماری، تولید می‌کند. این تابع به‌صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**48. where**
+
+<div dir="rtl">
+که
+</div>
+
+<br>
+
+
+**49. Object detection**
+
+<div dir="rtl">
+شناسایی شیء
+</div>
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+<div dir="rtl">
+انواع مدل‌ – سه نوع اصلی از الگوریتم‌های بازشناسایی وجود دارد، که ماهیت آنچه‌که شناسایی شده متفاوت است. این الگوریتم‌ها در جدول زیر توضیح داده شده‌اند:
+</div>
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+<div dir="rtl">
+[دسته‌بندی تصویر، دسته‌بندی با موقعیت‌یابی، شناسایی]
+</div>
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+<div dir="rtl">
+[خرس تدی، کتاب]
+</div>
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+<div dir="rtl">
+[یک عکس را دسته‌بندی می‌کند، احتمال شیء را پیش‌بینی می‌کند، یک شیء را در یک عکس شناسایی می‌کند، احتمال یک شیء و موقعیت آن را پیش‌بینی میکند، چندین شیء در یک عکس را شناسایی می‌کند، احتمال اشیاء و موقعیت آنها را پیش‌بینی می‌کند]
+</div>
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+<div dir="rtl">
+[CNN سنتی، YOLO ساده شده، R-CNN، YOLO، R-CNN]
+</div>
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+<div dir="rtl">
+شناسایی – در مضمون شناسایی شیء، روشهای مختلفی بسته به اینکه آیا فقط می‌خواهیم موقعیت قرارگیری شیء را پیدا کنیم یا شکل پیچیده‌تری در تصویر را شناسایی کنیم، استفاده می‌شوند. دو مورد از اصلی ترین آنها در جدول زیر به‌صورت خلاصه آورده‌ شده‌اند:
+</div>
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+<div dir="rtl">
+[پیش‌بینی کادر محصورکننده، ]شناسایی نقاط(برجسته)
+</div>
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+<div dir="rtl">
+[بخشی از تصویر که شیء در آن قرار گرفته را شناسایی می‌کند، یک شکل یا مشخصات یک شیء (مثل چشم‌ها) را شناسایی می‌کند، موشکافانه‌تر]
+</div>
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+<div dir="rtl">
+[مرکزِ کادر (bx,by)، ارتفاع bh و عرض bw، نقاط مرجع (l1x,l1y), ..., (lnx,lny)]
+</div>
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+<div dir="rtl">
+نسبت هم‌پوشانی اشتراک به اجتماع - نسبت هم‌پوشانی اشتراک به اجتماع، همچنین به عنوان IoU شناخته می‌شود، تابعی‌ است که میزان موقعیت دقیق کادر محصورکننده Bp نسبت به کادر محصورکننده حقیقی Ba را می‌سنجد. این تابع به‌صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+<div dir="rtl">
+نکته: همواره داریم IoU∈[0,1]. به صورت قرارداد، یک کادر محصورکننده Bp را می‌توان نسبتا خوب در نظر گرفت اگر IoU(Bp,Ba)⩾0.5 باشد.
+</div>
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+<div dir="rtl">
+کادرهای محوری – کادر بندی محوری روشی است که برای پیش‌بینی کادرهای محصورکننده هم‌پوشان استفاده می‌شود. در عمل، شبکه این اجازه را دارد که بیش از یک کادر به‌صورت هم‌زمان پیش‌بینی کند جایی‌که هر پیش‌بینی کادر مقید به داشتن یک مجموعه خصوصیات هندسی مفروض است. به عنوان مثال، اولین پیش‌بینی می‌تواند یک کادر مستطیلی با قالب خاص باشد حال آنکه کادر دوم، یک کادر مستطیلی محوری با قالب هندسی متفاوتی خواهد بود.
+</div>
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+<div dir="rtl">
+فروداشت غیربیشینه – هدف روش فروداشت غیربیشینه، حذف کادرهای محصورکننده هم‌پوشان تکراریِ دسته یکسان با انتخاب معرف‌ترین‌ها است. بعد از حذف همه کادرهایی که احتمال پیش‌بینی پایین‌تر از 0.6 دارند، مراحل زیر  با وجود آنکه کادرهایی باقی می‌مانند، تکرار می‌شوند:
+</div>
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+<div dir="rtl">
+[برای یک دسته مفروض، گام اول: کادر با بالاترین احتمال پیش‌بینی را انتخاب کن، گام دوم: هر کادری که IoU≥0.5 نسبت به کادر پیشین دارد را رها کن.]
+</div>
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+<div dir="rtl">
+[پیش‌بینی کادرها، انتخاب کادرِ با احتمال بیشینه، حذف (کادر) همپوشان دسته یکسان، کادرهای محصورکننده نهایی]
+</div>
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+<div dir="rtl">
+YOLO -  «شما فقط یک‌بار نگاه می‌کنید» (YOLO) یک الگوریتم شناسایی شیء است که مراحل زیر را اجرا می‌کند:
+</div>
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+<div dir="rtl">
+[گام اول: تصویر ورودی را به یک مشبک G×G تقسیم کن، گام دوم: برای هر سلول مشبک، یک CNN که y را به شکل زیر پیش‌بینی می‌کند، اجرا کن:، k مرتبه تکرارشده]
+</div>
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+<div dir="rtl">
+که pc احتمال شناسایی یک شیء است، bx,by,bh,bw اندازه‌های نسبی کادر محیطی شناسایی شده است، c1,...,cp نمایش «تک‌فعال» یک دسته از p دسته که تشخیص داده شده است، و k تعداد کادرهای محوری است.
+
+</div>
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+<div dir="rtl">
+گام سوم: الگوریتم فروداشت غیربیشینه را برای حذف هر کادر محصورکننده هم‌پوشان تکراری بالقوه، اجرا کن.
+</div>
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+<div dir="rtl">
+[تصویر اصلی، تقسیم به GxG مشبک، پیش‌بینی کادر محصورکننده، فروداشت غیربیشینه]
+</div>
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+<div dir="rtl">
+نکته: زمانی‌که pc=0 است، شبکه هیچ شیئی را شناسایی نمی‌کند. در چنین حالتی، پیش‌بینی‌های متناظر bx,…,cp بایستی نادیده گرفته شوند.
+</div>
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+<div dir="rtl">
+R-CNN - ناحیه با شبکه‌های عصبی پیچشی (R-CNN) یک الگوریتم شناسایی شیء است که ابتدا تصویر را برای یافتن کادرهای محصورکننده مربوط بالقوه قطعه‌بندی می‌کند و سپس الگوریتم شناسایی را برای یافتن محتمل‌ترین اشیاء در این کادرهای محصور کننده اجرا می‌کند.
+</div>
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+<div dir="rtl">
+[تصویر اصلی، قطعه بندی، پیش‌بینی کادر محصور کننده، فروداشت غیربیشینه]
+</div>
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+<div dir="rtl">
+نکته: هرچند الگوریتم اصلی به لحاظ محاسباتی پرهزینه و کند است، معماری‌های جدید از قبیل Fast R-CNN و Faster R-CNN باعث شدند که الگوریتم سریعتر اجرا شود.
+</div>
+
+<br>
+
+
+**74. Face verification and recognition**
+
+<div dir="rtl">
+تایید چهره و بازشناسایی
+</div>
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+<div dir="rtl">
+انواع مدل – دو نوع اصلی از مدل در جدول زیر به‌صورت خلاصه آورده‌ شده‌اند:
+</div>
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+<div dir="rtl">
+[تایید چهره، بازشناسایی چهره، جستار، مرجع، پایگاه داده]
+</div>
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+<div dir="rtl">
+[فرد مورد نظر است؟، جستجوی یک‌به‌یک، این فرد یکی از K فرد پایگاه داده است؟، جستجوی یک‌به‌چند]
+</div>
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+<div dir="rtl">
+یادگیری یک‌باره‌ای – یادگیری یک‌باره‌ای یک الگوریتم تایید چهره است که از یک مجموعه آموزشی محدود برای یادگیری یک تابع مشابهت که میزان اختلاف دو تصویر مفروض را تعیین می‌کند، بهره می‌برد. تابع مشابهت اعمال‌شده بر روی دو تصویر اغلب با نماد  d(image 1, image 2) نمایش داده می‌شود.
+</div>
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+<div dir="rtl">
+شبکه‌ی Siamese - هدف شبکه‌ی Siamese یادگیری طریقه رمزنگاری تصاویر و سپس تعیین اختلاف دو تصویر است. برای یک تصویر مفروض ورودی x(i)، خروجی رمزنگاری شده اغلب با نماد f(x(i)) نمایش داده می‌شود.
+</div>
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+<div dir="rtl">
+خطای سه‌گانه – خطای سه‌گانه ℓ یک تابع خطا است که بر روی بازنمایی تعبیه‌ی سه‌گانه‌ی تصاویر A (محور)، P (مثبت) و N (منفی)  محاسبه می‌شود. نمونه‌های محور (anchor) و مثبت به دسته یکسانی تعلق دارند، حال آنکه نمونه منفی به دسته دیگری تعلق دارد. با نامیدن α∈R+ (به عنوان) فراسنج حاشیه، این خطا به‌صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**81. Neural style transfer**
+
+<div dir="rtl">
+انتقالِ سبک عصبی
+</div>
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+<div dir="rtl">
+انگیزه – هدف انتقالِ سبک عصبی تولید یک تصویر G بر مبنای یک محتوای مفروض C و سبک مفروض S است.
+</div>
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+<div dir="rtl">
+[محتوای  C، سبک S، تصویر تولیدشده‌ی  G]
+</div>
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+<div dir="rtl">
+فعال‌سازی – در یک لایه مفروض l، فعال‌سازی با a[l] نمایش داده می‌شود و به ابعاد nH×nw×nc است
+</div>
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+<div dir="rtl">
+تابع هزینه‌ی محتوا – تابع هزینه‌ی محتوا Jcontent(C,G) برای تعیین میزان اختلاف تصویر تولیدشده G از تصویر اصلی C استفاده می‌شود. این تابع به‌صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+<div dir="rtl">
+ماتریسِ سبک - ماتریسِ سبک G[l] یک لایه مفروض l، یک ماتریس گرَم (Gram) است که هر کدام از عناصر G[l]kk′ میزان همبستگی کانال‌های k و k′ را می‌سنجند. این ماتریس نسبت به فعال‌سازی‌های a[l] به‌صورت زیر محاسبه می‌شود:
+</div>
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+<div dir="rtl">
+نکته: ماتریس سبک برای تصویر سبک و تصویر تولید شده، به ترتیب با G[l] (S) و G[l] (G) نمایش داده می‌شوند.
+</div>
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+<div dir="rtl">
+تابع هزینه‌ی سبک – تابع هزینه‌ی سبک Jstyle(S,G) برای تعیین میزان اختلاف تصویر تولیدشده G و سبک S استفاده می‌شود. این تابع به صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+<div dir="rtl">
+تابع هزینه‌ی کل – تابع هزینه‌ی کل به صورت ترکیبی از توابع هزینه‌ی سبک و محتوا تعریف شده است که با فراسنج‌های α,β, به شکل زیر وزن‌دار شده است:
+</div>
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+<div dir="rtl">
+نکته: مقدار بیشتر α مدل را به توجه بیشتر به محتوا وا می‌دارد حال آنکه مقدار بیشتر β مدل را به توجه بیشتر به سبک وا می‌دارد.
+</div>
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+<div dir="rtl">
+معماری‌هایی که از ترفندهای محاسباتی استفاده می‌کنند.
+</div>
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+<div dir="rtl">
+شبکه‌ی هم‌آوردِ مولد – شبکه‌ی هم‌آوردِ مولد، همچنین با نام GANs شناخته می‌شوند، ترکیبی از یک مدل مولد و تمیزدهنده هستند، جایی‌که مدل مولد هدفش تولید واقعی‌ترین خروجی است که به (مدل) تمیزدهنده تغذیه می‌شود و این (مدل) هدفش تفکیک بین تصویر تولیدشده و واقعی است.
+</div>
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+<div dir="rtl">
+[آموزش، نویز، تصویر دنیای واقعی، مولد، تمیز دهنده، واقعی بدلی]
+</div>
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+<div dir="rtl">
+نکته: موارد استفاده متنوع GAN ها شامل تبدیل متن به تصویر، تولید موسیقی و تلفیقی از آنهاست.
+</div>
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+<div dir="rtl">
+ResNet – معماری شبکه‌ی پسماند (همچنین با عنوان ResNet شناخته می‌شود) از بلاک‌های پسماند با تعداد لایه‌های زیاد به منظور کاهش خطای آموزش استفاده می‌کند. بلاک پسماند معادله‌ای با خصوصیات زیر دارد:
+</div>
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+<div dir="rtl">
+شبکه‌ی Inception – این معماری از ماژول‌های inception استفاده می‌کند و هدفش فرصت دادن به (عملیات) کانولوشنی مختلف برای افزایش کارایی از طریق تنوع‌بخشی ویژگی‌ها است. به طور خاص، این معماری از ترفند کانولوشنی 1×1 برای محدود سازی بار محاسباتی استفاده می‌کند.
+</div>
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+<div dir="rtl">
+راهنمای یادگیری عمیق هم اکنون به زبان ]فارسی[ در دسترس است.
+</div>
+
+<br>
+
+
+**98. Original authors**
+
+<div dir="rtl">
+نویسندگان اصلی
+</div>
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+<div dir="rtl">
+ترجمه شده توسط X،Y و Z
+</div>
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+<div dir="rtl">
+بازبینی شده توسط توسط X،Y و Z
+</div>
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+<div dir="rtl">
+نسخه پی‌دی‌اف را در گیت‌هاب ببینید
+</div>
+
+<br>
+
+
+**102. By X and Y**
+
+<div dir="rtl">
+توسط X و Y
+</div>
+
+<br>
+

From 64ded98f313982523bca1f7721ecc53c32c24e6f Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 17 Feb 2019 11:13:42 -0800
Subject: [PATCH 147/531] Add [fa] contributors of CNNs

---
 CONTRIBUTORS | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index f1c9a4747..c9b71fa5c 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -38,6 +38,10 @@
   Fernando Diaz (review of unsupervised learning)
   
 --fa
+  AlisterTA (translation of convolutional neural networks)
+  Ehsan Kermani (translation of convolutional neural networks)
+  Erfan Noury (review of convolutional neural networks)
+
   AlisterTA (translation of deep learning)
   Mohammad Karimi (review of deep learning)
   Erfan Noury (review of deep learning)

From c1a9e924663a890b8f6b9eb8eb5e97cb04b76140 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 18 Feb 2019 12:02:42 -0800
Subject: [PATCH 148/531] Merge #126

---
 fa/recurrent-neural-networks.md | 868 ++++++++++++++++++++++++++++++++
 1 file changed, 868 insertions(+)
 create mode 100644 fa/recurrent-neural-networks.md

diff --git a/fa/recurrent-neural-networks.md b/fa/recurrent-neural-networks.md
new file mode 100644
index 000000000..22a1e2106
--- /dev/null
+++ b/fa/recurrent-neural-networks.md
@@ -0,0 +1,868 @@
+**Recurrent Neural Networks translation**
+
+<br>
+
+**1. Recurrent Neural Networks cheatsheet**
+
+<div dir="rtl">
+راهنمای کوتاه شبکه‌های عصبی برگشتی 
+</div>
+ 
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+<div dir="rtl">
+کلاس CS 230 - یادگیری عمیق
+</div>
+
+<br>
+
+
+**3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
+
+<div dir="rtl">
+[نمای کلی، ساختار معماری، کاربردهایRNN  ها، تابع خطا، انتشار معکوس]
+</div>
+
+<br>
+
+
+**4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
+
+<div dir="rtl">
+[کنترل وابستگی‌های بلندمدت، توابع فعال‌سازی رایج، مشتق صفرشونده/منفجرشونده، برش گرادیان، GRU/LSTM، انواع دروازه، RNN دوسویه، RNN عمیق]
+</div>
+
+<br>
+
+
+**5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
+
+<div dir="rtl">
+[یادگیری بازنمائی کلمه، نمادها، ماتریس تعبیه، Word2vec،skip-gram، نمونه‌برداری منفی، GloVe]
+</div>
+
+<br>
+
+
+**6. [Comparing words, Cosine similarity, t-SNE]**
+
+<div dir="rtl">
+[مقایسه‌ی کلمات، شباهت کسینوسی، t-SNE]
+</div>
+
+<br>
+
+
+**7. [Language model, n-gram, Perplexity]**
+
+<div dir="rtl">
+[مدل زبانی،ان‌گرام، سرگشتگی]
+</div>
+
+<br>
+
+
+**8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
+
+<div dir="rtl">
+[ترجمه‌ی ماشینی، جستجوی پرتو، نرمال‌سازی طول، تحلیل خطا، امتیاز Bleu]
+</div>
+
+<br>
+
+
+**9. [Attention, Attention model, Attention weights]**
+
+<div dir="rtl">
+[ژرف‌نگری، مدل ژرف‌نگری، وزن‌های ژرف‌نگری]
+</div>
+
+<br>
+
+
+**10. Overview**
+
+<div dir="rtl">
+نمای کلی
+</div>
+
+<br>
+
+
+**11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
+
+<div dir="rtl">
+معماری RNN سنتی ــ شبکه‌های عصبی برگشتی که همچنین با عنوان RNN شناخته می‌شوند، دسته‌ای از شبکه‌های عصبی‌اند که این امکان را می‌دهند خروجی‌های قبلی به‌عنوان ورودی استفاده شوند و در عین حال حالت‌های نهان داشته باشند. این شبکه‌ها به‌طور معمول عبارت‌اند از:</div>
+
+<br>
+
+
+**12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
+
+<div dir="rtl">
+به‌ازای هر گام زمانی t، فعال‌سازی a<t> و خروجی y<t> به‌صورت زیر بیان می‌شود:
+ </div>
+
+<br>
+
+
+**13. and**
+
+<div dir="rtl">
+و
+</div>
+
+<br>
+
+
+**14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
+
+<div dir="rtl">
+که در آن Wax,Waa,Wya,ba,by ضرایبی‌اند که در راستای زمان به ‌اشتراک گذاشته می‌شوند و g1، g2 توابع فعال‌سازی‌ هستند.
+</div>
+
+<br>
+
+
+**15. The pros and cons of a typical RNN architecture are summed up in the table below:**
+
+<div dir="rtl">
+مزایا و معایب معماری RNN به‌صورت خلاصه در جدول زیر آورده شده‌اند:
+</div>
+
+<br>
+
+
+**16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
+
+<div dir="rtl">
+مزایا، امکان پردازش ورودی با هر طولی، اندازه‌ی مدل مطابق با اندازه‌ی ورودی افزایش نمی‌یابد، اطلاعات (زمان‌های) گذشته در محاسبه در نظر گرفته می‌شود، وزن‌ها در طول زمان به‌ اشتراک گذاشته می‌شوند]
+</div>
+
+<br>
+
+
+**17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
+
+<div dir="rtl">
+[معایب، محاسبه کند می‌شود، دشوار بودن دسترسی به اطلاعات مدت‌ها پیش، در نظر نگرفتن ورودی‌های بعدی در وضعیت جاری]
+</div>
+
+<br>
+
+
+**18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
+
+<div dir="rtl">
+کاربردهایRNN  ها ــ مدل‌های RNN غالباً در حوزه‌ی پردازش زبان طبیعی و حوزه‌ی بازشناسایی گفتار به کار می‌روند. کاربردهای مختلف آنها به صورت خلاصه در جدول زیر آورده شده‌اند:
+</div>
+
+<br>
+
+
+**19. [Type of RNN, Illustration, Example]**
+
+<div dir="rtl">
+[نوع RNN، نگاره، مثال]
+</div>
+
+<br>
+
+
+**20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
+
+<div dir="rtl">
+[یک به یک، یک به چند، چند به یک، چند به چند]
+</div>
+
+<br>
+
+
+**21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
+
+<div dir="rtl">
+[شبکه‌ی عصبی سنتی، تولید موسیقی، دسته‌بندی حالت احساسی، بازشناسایی موجودیت اسمی، ترجمه ماشینی]
+</div>
+
+<br>
+
+
+**22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
+
+<div dir="rtl">
+تابع خطا ــ در شبکه عصبی برگشتی، تابع خطا L برای همه‌ی گام‌های زمانی براساس خطا در هر گام به صورت زیر محاسبه می‌شود:
+</div>
+
+<br>
+
+
+**23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
+
+<div dir="rtl">
+انتشار معکوس در طول زمان ـــ انتشار معکوس در هر نقطه از زمان انجام می‌شود. در گام زمانی T، مشتق خطا L با توجه به ماتریس وزن W به‌صورت زیر بیان می‌شود:
+</div>
+
+<br>
+
+
+**24. Handling long term dependencies**
+
+<div dir="rtl">
+کنترل وابستگی‌های بلندمدت
+</div>
+
+<br>
+
+
+**25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
+
+<div dir="rtl">
+توابع فعال‌سازی پرکاربرد ـــ رایج‌ترین توابع فعال‌سازی به‌کاررفته در ماژول‌های RNN به شرح زیر است:
+</div>
+
+<br>
+
+
+**26. [Sigmoid, Tanh, RELU]**
+
+<div dir="rtl">
+[سیگموید، تانژانت هذلولوی، یکسو ساز]
+</div>
+
+<br>
+
+
+**27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
+
+<div dir="rtl">
+مشتق صفرشونده/منفجرشونده ــ  پدیده مشتق صفرشونده و منفجرشونده غالبا در بستر RNNها رخ می‌دهند. علت چنین رخدادی این است که به دلیل گرادیان ضربی، که می‌تواند با توجه به تعداد لایه‌ها به صورت نمایی کاهش/افزایش می‌یابد، به‌دست آوردن وابستگی‌های بلندمدت سخت است.
+</div>
+
+
+<br>
+
+
+**28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
+
+<div dir="rtl">
+برش گرادیان ــ یک روش برای مقابله با انفجار گرادیان است که گاهی اوقات هنگام انتشار معکوس رخ می‌دهد. با تعیین حداکثر مقدار برای گرادیان، این پدیده در عمل کنترل می‌شود.
+</div>
+
+<br>
+
+
+**29. clipped**
+
+<div dir="rtl">
+برش ‌داده‌شده
+</div>
+
+<br>
+
+
+**30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
+
+<div dir="rtl">
+انواع دروازه ـــ برای حل مشکل مشتق صفرشونده/منفجرشونده، در برخی از انواع RNN ها، دروازه‌های خاصی استفاده می‌شود و این دروازه‌ها عموما هدف معینی دارند. این  دروازه‌ها عموما با نمادΓ  نمایش داده می‌شوند و برابرند با:
+</div>
+
+<br>
+
+
+**31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
+
+<div dir="rtl">
+که W,U,b ضرایب خاص دروازه و σ تابع سیگموید است. دروازه‌های اصلی به صورت خلاصه در جدول زیر آورده شده‌اند:
+</div>
+
+<br>
+
+
+**32. [Type of gate, Role, Used in]**
+
+<div dir="rtl">
+[نوع دروازه، نقش، به‌کار رفته در]
+</div>
+
+<br>
+
+
+**33. [Update gate, Relevance gate, Forget gate, Output gate]**
+
+<div dir="rtl">
+33. [دروازه‌ی به‌روزرسانی، دروازه‌ی ربط(میزان اهمیت)، دروازه‌ی فراموشی، دروازه‌ی خروجی]
+</div>
+
+<br>
+
+
+**34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
+
+<div dir="rtl">
+34. [چه میزان از گذشته اکنون اهمیت دارد؟ اطلاعات گذشته رها شوند؟ سلول حذف شود یا خیر؟ چه میزان از (محتوای) سلول آشکار شود؟]
+</div>
+
+<br>
+
+
+**35. [LSTM, GRU]**
+
+<div dir="rtl">
+[LSTM، GRU]
+</div>
+
+<br>
+
+
+**36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
+
+<div dir="rtl">
+GRU/LSTM ـــ واحد برگشتی دروازه‌دار (GRU) و واحدهای حافظه‌ی کوتاه‌-مدت طولانی (LSTM) مشکل مشتق صفرشونده که در RNNهای سنتی رخ می‌دهد، را بر طرف می‌کنند، درحالی‌که LSTM شکل عمومی‌تر  GRU است. در جدول زیر، معادله‌های توصیف‌کنندهٔ هر معماری به صورت خلاصه آورده شده‌اند:
+</div>
+
+<br>
+
+
+**37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
+
+<div dir="rtl">
+37. [توصیف، واحد برگشتی دروازه‌دار (GRU)، حافظه‌ی کوتاه-مدت طولانی (LSTM)، وابستگی‌ها]
+</div>
+<br>
+
+
+**38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
+
+<div dir="rtl">
+نکته: نشانه‌ی * نمایان‌گر ضرب عنصربه‌عنصر دو بردار است.
+</div>
+
+<br>
+
+
+**39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
+
+<div dir="rtl">
+انواع RNN ها ــ جدول زیر سایر معماری‌های پرکاربرد RNN را به صورت خلاصه نشان می‌دهد.
+</div>
+
+<br>
+
+
+**40. [Bidirectional (BRNN), Deep (DRNN)]**
+
+<div dir="rtl">
+[دوسویه  (BRNN)، عمیق (DRNN)]
+</div>
+
+<br>
+
+
+**41. Learning word representation**
+
+<div dir="rtl">
+یادگیری بازنمائی کلمه
+</div>
+
+<br>
+
+
+**42. In this section, we note V the vocabulary and |V| its size.**
+
+<div dir="rtl">
+در این بخش، برای اشاره به واژگان از V و برای اشاره به اندازه‌ی آن از |V| استفاده می‌کنیم.
+</div>
+
+<br>
+
+
+**43. Motivation and notations**
+
+<div dir="rtl">
+انگیزه و نمادها
+</div>
+
+<br>
+
+
+**44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
+
+<div dir="rtl">
+روش‌های بازنمائی ― دو روش اصلی برای بازنمائی کلمات به صورت خلاصه در جدول زیر آورده شده‌اند:
+</div>
+
+<br>
+
+
+**45. [1-hot representation, Word embedding]**
+
+<div dir="rtl">
+[بازنمائی تک‌فعال، تعبیه‌ی کلمه]
+</div>
+
+<br>
+
+
+**46. [teddy bear, book, soft]**
+
+<div dir="rtl">
+[خرس تدی، کتاب، نرم]
+</div>
+
+<br>
+
+
+**47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
+
+<div dir="rtl">
+[نشان داده شده با نماد ow، رویکرد ساده، فاقد اطلاعات تشابه، نشان داده شده با نماد ew، به‌حساب‌آوردن تشابه کلمات]
+</div>
+
+<br>
+
+
+**48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
+
+<div dir="rtl">
+ماتریس تعبیه ـــ به‌ ازای کلمه‌ی مفروض w ، ماتریس تعبیه E ماتریسی است که بازنمائی تک‌فعال  ow را به نمایش تعبیه‌ی ew نگاشت می‌دهد:
+</div>
+
+<br>
+
+
+**49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
+
+<div dir="rtl">
+نکته: یادگیری ماتریس تعبیه را می‌توان با استفاده از مدل‌های درست‌نمایی هدف/متن(زمینه) انجام داد.
+</div>
+
+<br>
+
+
+**50. Word embeddings**
+
+<div dir="rtl">
+(نمایش) تعبیه‌ی کلمه
+</div>
+
+<br>
+
+
+**51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
+
+<div dir="rtl">
+Word2vec ― Word2vec چهارچوبی است که با محاسبه‌ی احتمال قرار گرفتن یک کلمه‌ی خاص در میان سایر کلمات، تعبیه‌های کلمه را یاد می‌گیرد. مدل‌های متداول شامل Skip-gram، نمونه‌برداری منفی و CBOW هستند.
+</div>
+
+<br>
+
+
+**52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
+
+<div dir="rtl">
+[یک خرس تدی بامزه در حال مطالعه است، خرس تدی، نرم، شعر فارسی، هنر]
+</div>
+
+<br>
+
+
+**53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
+
+<div dir="rtl">
+[آموزش شبکه بر روی مسئله‌ی جایگزین، استخراج بازنمائی سطح بالا، محاسبه‌ی نمایش تعبیه‌ی کلمات]
+</div>
+
+<br>
+
+
+**54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
+
+<div dir="rtl">
+Skip-gram ــ مدل اسکیپ‌گرام word2vec یک وظیفه‌ی یادگیری بانظارت است که تعبیه‌های کلمه را با ارزیابی احتمال وقوع کلمه‌ی t هدف با کلمه‌ی زمینه c یاد می‌گیرد. با توجه به اینکه نماد θt پارامتری مرتبط با t است، احتمال P(t|c) به‌صورت زیر به‌دست می‌آید:
+</div>
+
+<br>
+
+
+**55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
+
+<div dir="rtl">
+نکته: جمع کل واژگان در بخش مقسوم‌الیه بیشینه‌ی‌هموار باعث می‌شود که این مدل از لحاظ محاسباتی گران شود. مدل CBOW مدل word2vec دیگری ست که از کلمات اطراف برای پیش‌بینی یک کلمهٔ مفروض استفاده می‌کند.
+</div>
+
+<br>
+
+
+**56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
+
+<div dir="rtl">
+نمونه‌گیری منفی ― مجموعه‌ای از دسته‌بندی‌های دودویی با استفاده از رگرسیون لجستیک است که مقصودش ارزیابی احتمال ظهور همزمان کلمه‌ی مفروض هدف و کلمه‌ی مفروض زمینه است، که در اینجا مدل‌ها براساس مجموعه k مثال منفی و 1 مثال مثبت آموزش می‌بینند. با توجه به کلمه‌ی مفروض زمینه c و کلمه‌ی مفروض هدف t، پیش‌بینی به صورت زیر بیان می‌شود:
+</div>
+
+<br>
+
+
+**57. Remark: this method is less computationally expensive than the skip-gram model.**
+
+<div dir="rtl">
+نکته: این روش از لحاظ محاسباتی ارزان‌تر از مدل skip-gram است.
+</div>
+
+<br>
+
+
+**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
+
+<div dir="rtl">
+GloVe ― مدل GloVe، مخفف بردارهای سراسری بازنمائی کلمه، یکی از روش‌های تعبیه کلمه است که از ماتریس هم‌رویدادی X استفاده می‌کند که در آن هر Xi,j به تعداد دفعاتی اشاره دارد که هدف i با زمینهٔ j رخ می‌دهد. تابع هزینه‌ی J به‌صورت زیر است:
+</div>
+
+<br>
+
+
+**58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
+Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
+
+<div dir="rtl">
+که در آن f تابع وزن‌دهی است، به‌طوری که Xi,j=0⟹f(Xi,j)=0. با توجه به تقارنی که e و θ در این مدل دارند، نمایش تعبیه‌ی نهایی کلمه‌ e(final)w به صورت زیر محاسبه می‌شود:
+</div>
+
+<br>
+
+
+**59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
+
+<div dir="rtl">
+تذکر: مولفه‌های مجزا در نمایش تعبیه‌ی یادگرفته‌شده‌ی کلمه الزاما قابل تفسیر نیستند.
+</div>
+
+<br>
+
+
+**60. Comparing words**
+
+<div dir="rtl">
+مقایسه‌ی کلمات
+</div>
+
+<br>
+
+
+**61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
+
+<div dir="rtl">
+شباهت کسینوسی - شباهت کسینوسی بین کلمات w1 و w2 به ‌صورت زیر بیان می‌شود:
+</div>
+
+<br>
+
+
+**62. Remark: θ is the angle between words w1 and w2.**
+
+<div dir="rtl">
+نکته: θ زاویهٔ بین کلمات w1 و w2 است.
+</div>
+
+<br>
+
+
+**63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
+
+<div dir="rtl">
+t-SNE ― t-SNE (نمایش تعبیه‌ی همسایه‌ی تصادفی توزیع‌شده توسط توزیع t) روشی است که هدف آن کاهش تعبیه‌های ابعاد بالا به فضایی با ابعاد پایین‌تر است. این روش در تصویرسازی بردارهای کلمه در فضای 2 بعدی کاربرد فراوانی دارد.
+</div>
+
+<br>
+
+
+**64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
+
+<div dir="rtl">
+[ادبیات، هنر، کتاب، فرهنگ، شعر، دانش، مفرح، دوست‌داشتنی، دوران کودکی، مهربان، خرس تدی، نرم، آغوش، بامزه، ناز]
+</div>
+
+<br>
+
+
+**65. Language model**
+
+<div dir="rtl">
+مدل زبانی
+</div>
+
+<br>
+
+
+**66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
+
+<div dir="rtl">
+نمای کلی ـــ هدف مدل زبان تخمین احتمال جمله‌ی P(y) است.
+</div>
+
+<br>
+
+
+**67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
+
+<div dir="rtl">
+مدل  ان‌گرام ــ این مدل یک رویکرد ساده با هدف اندازه‌گیری احتمال نمایش یک عبارت در یک نوشته است که با دفعات تکرار آن در داده‌های آموزشی محاسبه می‌شود.
+</div>
+
+<br>
+
+
+**68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
+
+<div dir="rtl">
+سرگشتگی ـــ مدل‌های زبانی معمولاً با معیار سرگشتی، که با PP هم نمایش داده می‌شود، سنجیده می‌شوند، که مقدار آن معکوس احتمال یک مجموعه‌ داده است که تقسیم بر تعداد کلمات T می‌شود. هر چه سرگشتگی کمتر باشد بهتر است و به صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**69. Remark: PP is commonly used in t-SNE.**
+
+<div dir="rtl">
+نکته: PP عموما در t-SNE کاربرد دارد.
+</div>
+
+<br>
+
+
+**70. Machine translation**
+
+<div dir="rtl">
+ترجمه ماشینی
+</div>
+
+<br>
+
+
+**71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
+
+<div dir="rtl">
+نمای کلی ― مدل ترجمه‌ی ماشینی مشابه مدل زبانی است با این تفاوت که یک شبکه‌ی رمزنگار قبل از آن قرار گرفته است. به همین دلیل، گاهی اوقات به آن مدل زبان شرطی می‌گویند. هدف آن یافتن جمله y است بطوری که:
+</div>
+
+<br>
+
+
+**72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
+
+<div dir="rtl">
+جستجوی پرتو ― یک الگوریتم جستجوی اکتشافی است که در ترجمه‌ی ماشینی و بازتشخیص گفتار برای یافتن محتمل‌ترین جمله‌ی y باتوجه به ورودی مفروض x بکار برده می‌شود.
+</div>
+
+<br>
+
+
+**73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
+
+<div dir="rtl">
+[گام 1: یافتن B کلمه‌ی محتمل برتر y<1>، گام 2: محاسبه احتمالات شرطی y|x,y<1>,...,y<k−1>، گام 3: نگه‌داشتن B ترکیب برتر x,y<1>,…,y، خاتمه فرآیند با کلمه‌ی توقف]
+ </div>
+
+<br>
+
+
+**74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
+
+<div dir="rtl">
+نکته: اگر پهنای پرتو 1 باشد، آنگاه با جست‌وجوی حریصانهٔ ساده برابر خواهد بود.
+</div>
+
+<br>
+
+
+**75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
+
+<div dir="rtl">
+پهنای پرتو ـــ پهنای پرتوی B پارامتری برای جستجوی پرتو است. مقادیر بزرگ B به نتیجه بهتر منتهی می‌شوند اما عملکرد آهسته‌تری دارند و حافظه را افزایش می‌دهند. مقادیر کوچک B به نتایج بدتر منتهی می‌شوند اما بار محاسباتی پایین‌تری دارند. مقدار استاندارد B حدود 10 است.
+</div>
+
+<br>
+
+
+**76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
+
+<div dir="rtl">
+نرمال‌سازی طول ―‌ برای بهبود ثبات عددی، جستجوی پرتو معمولا با تابع هدف نرمال‌شده‌ی زیر اعمال می‌شود، که اغلب اوقات هدف درست‌نمایی لگاریتمی نرمال‌شده نامیده می‌شود و به‌صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
+
+<div dir="rtl">
+تذکر: پارامتر α را می‌توان تعدیل‌کننده نامید و مقدارش معمولا بین 0.5 و 1 است.
+</div>
+
+<br>
+
+
+**78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
+
+<div dir="rtl">
+تحلیل خطا ―زمانی‌که ترجمه‌ی پیش‌بینی‌شده‌ی ^y ی به‌دست می‌آید که مطلوب نیست، می‌توان با انجام تحلیل خطای زیر از خود پرسید که چرا ترجمه y* خوب نیست:
+</div>
+
+<br>
+
+
+**79. [Case, Root cause, Remedies]**
+
+<div dir="rtl">
+[قضیه، ریشه‌ی مشکل، راه‌حل]
+</div>
+
+<br>
+
+
+**80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
+
+<div dir="rtl">
+[جستجوی پرتوی معیوب، RNN معیوب، افزایش پهنای پرتو، امتحان معماری‌های مختلف، استفاده از تنظیم‌کننده، جمع‌آوری داده‌های بیشتر]</div>
+
+<br>
+
+
+**81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
+
+<div dir="rtl">
+امتیاز Bleu ― جایگزین ارزشیابی دوزبانه  (bleu) میزان خوب بودن ترجمه ماشینی را با محاسبه‌ی امتیاز تشابه برمبنای دقت ان‌گرام اندازه‌گیری می‌کند. (این امتیاز) به صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**82. where pn is the bleu score on n-gram only defined as follows:**
+
+<div dir="rtl">
+که pn امتیاز bleu تنها براساس ان‌گرام است و به صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
+
+<div dir="rtl">
+تذکر: ممکن است برای پیشگیری از امتیاز اغراق آمیز تصنعیbleu ، برای ترجمه‌های پیش‌بینی‌شده‌ی کوتاه از جریمه اختصار استفاده شود.</div>
+
+<br>
+
+
+**84. Attention**
+
+<div dir="rtl">
+ژرف‌نگری
+</div>
+
+<br>
+
+
+**85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
+
+<div dir="rtl">
+مدل ژرف‌نگری ― این مدل به RNN این امکان را می‌دهد که به بخش‌های خاصی از ورودی که حائز اهمیت هستند توجه نشان دهد که در عمل باعث بهبود عملکرد مدل حاصل‌شده خواهد شد. اگر α<t,t′> به معنای مقدار توجهی باشد که خروجی y باید به فعال‌سازی a<t′>  داشته باشد و c نشان‌دهنده‌ی زمینه (متن) در زمان t باشد، داریم:
+ </div>
+
+<br>
+
+
+**86. with**
+
+<div dir="rtl">
+با
+</div>
+
+<br>
+
+
+**87. Remark: the attention scores are commonly used in image captioning and machine translation.**
+
+<div dir="rtl">
+نکته: امتیازات ژرف‌نگری عموما در عنوان‌سازی متنی برای تصویر (image captioning) و ترجمه ماشینی کاربرد دارد.
+</div>
+
+<br>
+
+
+**88. A cute teddy bear is reading Persian literature.**
+
+<div dir="rtl">
+یک خرس تدی بامزه در حال خواندن ادبیات فارسی است.
+</div>
+
+<br>
+
+
+**89. Attention weight ― The amount of attention that the output y<t> should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
+
+<div dir="rtl">
+وزن ژرف‌نگری ― مقدار توجهی که خروجی y باید به فعال‌سازی a<t′> داشته باشد به‌وسیله‌ی α<t,t′> به‌دست می‌آید که به‌صورت زیر محاسبه می‌شود:
+</div>
+
+<br>
+
+
+**90. Remark: computation complexity is quadratic with respect to Tx.**
+
+<div dir="rtl">
+نکته: پیچیدگی محاسباتی به نسبت Tx از نوع درجه‌ی دوم است.
+</div>
+
+<br>
+
+
+**91. The Deep Learning cheatsheets are now available in [target language].**
+
+<div dir="rtl">
+راهنمای یادگیری عمیق هم اکنون به زبان [فارسی] در دسترس است.
+</div>
+
+<br>
+
+**92. Original authors**
+
+<div dir="rtl">
+نویسندگان اصلی
+</div>
+
+<br>
+
+**93. Translated by X, Y and Z**
+
+<div dir="rtl">
+ترجمه شده توسط X،Y و Z
+</div>
+
+<br>
+
+**94. Reviewed by X, Y and Z**
+
+<div dir="rtl">
+بازبینی شده توسط توسط X،Y و Z
+</div>
+
+<br>
+
+**95. View PDF version on GitHub**
+
+<div dir="rtl">
+نسخه پی‌دی‌اف را در گیت‌هاب ببینید
+</div>
+
+<br>
+
+**96. By X and Y**
+
+<div dir="rtl">
+توسط X و Y
+</div>
+
+<br>

From 795741eabdd6efa724edfba8a6fef35582ab57d3 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 18 Feb 2019 12:04:10 -0800
Subject: [PATCH 149/531] Add contributors of #126

---
 CONTRIBUTORS | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index c9b71fa5c..aad837bbb 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -59,7 +59,10 @@
 
   Erfan Noury (translation of probabilities and statistics)
   Mohammad Karimi (review of probabilities and statistics)
-  
+
+  AlisterTA (translation of recurrent neural networks)
+  Erfan Noury (review of recurrent neural networks)
+
   Amirhosein Kazemnejad (translation of supervised learning)
   Erfan Noury (review of supervised learning)
   Mohammad Karimi (review of supervised learning)

From 8b49a25fddc185180263d74f65677e90ae60472b Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Mon, 18 Feb 2019 17:59:45 -0800
Subject: [PATCH 150/531] Update README.md

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index b70e6359d..c8daa1a25 100644
--- a/README.md
+++ b/README.md
@@ -37,9 +37,9 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 ## Progression for CS 230 (Deep Learning)
 |Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|[in progress](https://github.com/erfannoury/cheatsheet-translation/issues/9)|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
-|Recurrent Neural Nets|not started|[in progress](https://github.com/erfannoury/cheatsheet-translation/issues/10)|done|not started|not started|not started|
-|DL tips and tricks|not started|[in progress](https://github.com/erfannoury/cheatsheet-translation/issues/11)|done|not started|not started|not started|
+|Convolutional Neural Nets|not started|done|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
+|Recurrent Neural Nets|not started|done|done|not started|not started|not started|
+|DL tips and tricks|not started|done|done|not started|not started|not started|
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|

From fea5c055cc0f323fca74d775bf425c286c6f7815 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Mon, 18 Feb 2019 18:01:20 -0800
Subject: [PATCH 151/531] Update README.md

---
 README.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index c8daa1a25..ae8a805d7 100644
--- a/README.md
+++ b/README.md
@@ -54,14 +54,14 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |DL tips and tricks|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
 
 ## Progression for CS 229 (Machine Learning)
-|Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|
-|Supervised learning|done|done|done|not started|done|done|
-|Unsupervised learning|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|
-|ML tips and tricks|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|
-|Probabilities and Statistics|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|
-|Linear algebra|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|Cheatsheet topic|Deutsch|Español|فارسی|Français|日本語|Português|中文|
+|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+|Deep learning|not started|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|
+|Supervised learning|not started|done|done|done|not started|done|done|
+|Unsupervised learning|not started|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|
+|ML tips and tricks|not started|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|
+|Probabilities and Statistics|not started|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|
+|Linear algebra|not started|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|

From 4a50008a1b30a2a5fedeb19a3a01ec372dc809e4 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Mon, 18 Feb 2019 18:01:59 -0800
Subject: [PATCH 152/531] Update README.md

---
 README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index ae8a805d7..cf019c4fe 100644
--- a/README.md
+++ b/README.md
@@ -35,11 +35,11 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 
 ## Progression for CS 230 (Deep Learning)
-|Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|done|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
-|Recurrent Neural Nets|not started|done|done|not started|not started|not started|
-|DL tips and tricks|not started|done|done|not started|not started|not started|
+|Cheatsheet topic|Deutsch|Español|فارسی|Français|日本語|Português|中文|
+|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+|Convolutional Neural Nets|not started|not started|done|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
+|Recurrent Neural Nets|not started|not started|done|done|not started|not started|not started|
+|DL tips and tricks|not started|not started|done|done|not started|not started|not started|
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|

From b4fae286781d378c0ffda159903792ab1a38479f Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Mon, 18 Feb 2019 18:03:13 -0800
Subject: [PATCH 153/531] Update README.md

---
 README.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md
index cf019c4fe..ae8a805d7 100644
--- a/README.md
+++ b/README.md
@@ -35,11 +35,11 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 
 ## Progression for CS 230 (Deep Learning)
-|Cheatsheet topic|Deutsch|Español|فارسی|Français|日本語|Português|中文|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|not started|done|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
-|Recurrent Neural Nets|not started|not started|done|done|not started|not started|not started|
-|DL tips and tricks|not started|not started|done|done|not started|not started|not started|
+|Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
+|:---|:---:|:---:|:---:|:---:|:---:|:---:|
+|Convolutional Neural Nets|not started|done|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
+|Recurrent Neural Nets|not started|done|done|not started|not started|not started|
+|DL tips and tricks|not started|done|done|not started|not started|not started|
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|

From 2a3df5546295992517d9ff3bb9b72445d43ffabf Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Mon, 18 Feb 2019 18:04:12 -0800
Subject: [PATCH 154/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index ae8a805d7..26aa49cbe 100644
--- a/README.md
+++ b/README.md
@@ -56,7 +56,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 ## Progression for CS 229 (Machine Learning)
 |Cheatsheet topic|Deutsch|Español|فارسی|Français|日本語|Português|中文|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|not started|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|
+|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|
 |Supervised learning|not started|done|done|done|not started|done|done|
 |Unsupervised learning|not started|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|
 |ML tips and tricks|not started|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|

From da6c69f0086c9c4b15979198a911bc68d2f36192 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Mon, 18 Feb 2019 18:08:08 -0800
Subject: [PATCH 155/531] Update README.md

---
 README.md | 37 ++++++++++++++++++++++++-------------
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/README.md b/README.md
index 26aa49cbe..7547b83ee 100644
--- a/README.md
+++ b/README.md
@@ -54,14 +54,14 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |DL tips and tricks|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
 
 ## Progression for CS 229 (Machine Learning)
-|Cheatsheet topic|Deutsch|Español|فارسی|Français|日本語|Português|中文|
+|Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|
-|Supervised learning|not started|done|done|done|not started|done|done|
-|Unsupervised learning|not started|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|
-|ML tips and tricks|not started|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|
-|Probabilities and Statistics|not started|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|
-|Linear algebra|not started|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|Deep learning|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|
+|Supervised learning|done|done|done|not started|done|done|
+|Unsupervised learning|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|
+|ML tips and tricks|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|
+|Probabilities and Statistics|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|
+|Linear algebra|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
@@ -75,12 +75,23 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 |Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|Magyar|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
-|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
-|Unsupervised learning|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
-|Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
-|Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|
+|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|
+|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|
+|Unsupervised learning|not started|not started|not started|not started|done|
+|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|done|
+|Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|done|done|
+|Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|done|
+
+
+|Cheatsheet topic|Magyar|Deutsch|
+|:---|:---:|:---:|
+|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|
+|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|
+|Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|
+|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|
+|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|
+|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|
+
 
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From 36dddb0e516d41b15f331530ec3f5d88557a824c Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Mon, 18 Feb 2019 18:08:36 -0800
Subject: [PATCH 156/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 7547b83ee..08053b0a5 100644
--- a/README.md
+++ b/README.md
@@ -55,7 +55,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 ## Progression for CS 229 (Machine Learning)
 |Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
+|:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |Deep learning|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|
 |Supervised learning|done|done|done|not started|done|done|
 |Unsupervised learning|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|

From 3651f219168b361479dbe516f585e2d9e19689be Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Mon, 18 Feb 2019 18:09:16 -0800
Subject: [PATCH 157/531] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 08053b0a5..588f9506c 100644
--- a/README.md
+++ b/README.md
@@ -73,8 +73,8 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|not started|not started|done|not started|not started|
 
 
-|Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|Magyar|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|
+|Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|
+|:---|:---:|:---:|:---:|:---:|:---:|
 |Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|
 |Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|
 |Unsupervised learning|not started|not started|not started|not started|done|

From 92b3de716a3bc170ec72b9105688a07f40818293 Mon Sep 17 00:00:00 2001
From: Leticia Portella <leportella@gmail.com>
Date: Sat, 16 Feb 2019 14:01:31 +0000
Subject: [PATCH 158/531] Add revisor in CNN pt

---
 pt/convolutional-neural-networks.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/pt/convolutional-neural-networks.md b/pt/convolutional-neural-networks.md
index d31fb0695..4934d7c2f 100644
--- a/pt/convolutional-neural-networks.md
+++ b/pt/convolutional-neural-networks.md
@@ -559,7 +559,7 @@
 
 **80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
 
-&#10230; Perda tripla (Triplet loss) - A perda tripla ℓ é uma função de perda (loss function) computada na representação da encorporação de três imagens A (âncora), P (positiva) e N (negativa). O exemplo da âncora e positivo pertencem à mesma classe, enquanto o exemplo negativo pertence a uma classe diferente. Chamando o parâmetro de margem de α∈R+, essa função de perda é calculada da seguinte forma:
+&#10230; Perda tripla (Triplet loss) - A perda tripla ℓ é uma função de perda (loss function) computada na representação da encorporação de três imagens A (âncora), P (positiva) e N (negativa). O exemplo da âncora e positivo pertencem à mesma classe, enquanto o exemplo negativo pertence a uma classe diferente. Chamando o parâmetro de margem de α∈R+, essa função de perda é definida da seguinte forma:
 
 <br>
 
@@ -643,7 +643,7 @@
 
 **92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
 
-&#10230; Rede Adversarial Gerativa (Generative Adversarial Network) - As Generaive Adversarial Networks, também conhecidas como GANs, são compostas de um modelo generativo e um modelo discriminativo, onde o modelo generativo visa gerar a saída mais verdadeira que será alimentada na discriminativa que visa diferenciar a imagem gerada e verdadeira.
+&#10230; Rede Adversarial Gerativa (Generative Adversarial Network) - As Generaive Adversarial Networks, também conhecidas como GANs, são compostas de um modelo generativo e um modelo discriminativo, onde o modelo generativo visa gerar a saída mais verdadeira que será alimentada na discriminativa que visa diferenciar a imagem gerada e a imagem verdadeira.
 
 <br>
 
@@ -699,7 +699,7 @@
 
 **100. Reviewed by X, Y and Z**
 
-&#10230; Revisado por X, Y e Z
+&#10230; Revisado por Gabriel Fonseca
 
 <br>
 

From 4aa0fef962d137d98f5652a937dc9dba6923e06a Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 19 Feb 2019 20:15:53 -0800
Subject: [PATCH 159/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index f8e720dfc..c7b22b33f 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -80,6 +80,9 @@
 --ja
 
 --pt
+  Leticia Portella (translation of convolutional neural networks)
+  Gabriel Aparecido Fonseca (review of convolutional neural networks)
+
   Gabriel Fonseca (translation of deep learning)
   Leticia Portella (review of deep learning)
 
@@ -100,8 +103,6 @@
   Gabriel Fonseca (translation of unsupervised learning)
   Tiago Danin (review of unsupervised learning)
 
-  Leticia Portella (translation of CNNs)
-
 --tr
   Ayyüce Kızrak (translation of convolutional neural networks)
   Yavuz Kömeçoğlu (review of convolutional neural networks)

From 886dcb0d7fcfd7456acff7a21c369083662e68ef Mon Sep 17 00:00:00 2001
From: kevingo <kevingo75@gmail.com>
Date: Fri, 22 Feb 2019 09:53:11 +0800
Subject: [PATCH 160/531] Add linear algebra zh-tw translation

---
 CONTRIBUTORS                      |   3 +
 zh-tw/refresher-linear-algebra.md | 338 ++++++++++++++++++++++++++++++
 2 files changed, 341 insertions(+)
 create mode 100644 zh-tw/refresher-linear-algebra.md

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 35c12147f..88affdffd 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -161,3 +161,6 @@
 
   kevingo (translation of probabilities and statistics)
   johnnychhsu (review of probabilities and statistics)
+
+  kevingo (translation of linear algebra)
+
diff --git a/zh-tw/refresher-linear-algebra.md b/zh-tw/refresher-linear-algebra.md
new file mode 100644
index 000000000..03b532385
--- /dev/null
+++ b/zh-tw/refresher-linear-algebra.md
@@ -0,0 +1,338 @@
+1. **Linear Algebra and Calculus refresher**
+
+&#10230;
+線性代數與微積分回顧
+<br>
+
+2. **General notations**
+
+&#10230;
+通用符號
+<br>
+
+3. **Definitions**
+
+&#10230;
+定義
+<br>
+
+4. **Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
+
+&#10230;
+向量 - 我們定義 x∈Rn 是一個向量，包含 n 維元素，xi∈R 是第 i 維元素：
+<br>
+
+5. **Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
+
+&#10230;
+矩陣 - 我們定義 A∈Rm×n 是一個 m 行 n 列的矩陣，Ai,j∈R 代表位在第 i 行第 j 列的元素：
+<br>
+
+6. **Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
+
+&#10230;
+注意：上述定義的向量 x 可以視為 nx1 的矩陣，或是更常被稱為行向量
+<br>
+
+7. **Main matrices**
+
+&#10230;
+主要的矩陣
+<br>
+
+8. **Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
+
+&#10230;
+單位矩陣 - 單位矩陣 I∈Rn×n 是一個方陣，其主對角線皆為 1，其餘皆為 0
+<br>
+
+9. **Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
+
+&#10230;
+注意：對於所有矩陣 A∈Rn×n，我們有 A×I=I×A=A
+<br>
+
+10. **Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
+
+&#10230;
+對角矩陣 - 對角矩陣 D∈Rn×n 是一個方陣，其主對角線為非 0，其餘皆為 0
+<br>
+
+11. **Remark: we also note D as diag(d1,...,dn).**
+
+&#10230;
+注意：我們令 D 為 diag(d1,...,dn)
+<br>
+
+12. **Matrix operations**
+
+&#10230;
+矩陣運算
+<br>
+
+13. **Multiplication**
+
+&#10230;
+乘法
+<br>
+
+14. **Vector-vector ― There are two types of vector-vector products:**
+
+&#10230;
+向量-向量 - 有兩種類型的向量-向量相乘：
+<br>
+
+15. **inner product: for x,y∈Rn, we have:**
+
+&#10230;
+內積：對於 x,y∈Rn，我們可以得到：
+<br>
+
+16. **outer product: for x∈Rm,y∈Rn, we have:**
+
+&#10230;
+外積：對於 x∈Rm,y∈Rn，我們可以得到：
+<br>
+
+17. **Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
+
+&#10230;
+矩陣-向量 - 矩陣 A∈Rm×n 和向量 x∈Rn 的乘積是一個大小為 Rn 的向量，使得：
+<br>
+
+18. **where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
+
+&#10230;
+其中 aTr,i 是 A 的行向量、ac,j 是 A 的列向量、xi 是 x 的元素
+<br>
+
+19. **Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
+
+&#10230;
+矩陣-矩陣：矩陣 A∈Rm×n 和 B∈Rn×p 為一個大小 Rn×p 的矩陣，使得：
+<br>
+
+20. **where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
+
+&#10230;
+其中，aTr, i, bTr, i 和 ac, j, bc, j 是列向量分別是 A 和 B 的行向量與列向量
+<br>
+
+21. **Other operations**
+
+&#10230;
+其他操作
+<br>
+
+22. **Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
+
+&#10230;
+轉置 - 一個矩陣的轉置矩陣 A∈Rm×n，記作 AT，指的是其中元素的翻轉：
+<br>
+
+23. **Remark: for matrices A,B, we have (AB)T=BTAT**
+
+&#10230;
+注意：對於矩陣 A、B，我們有 (AB)T=BTAT
+<br>
+
+24. **Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
+
+&#10230;
+可逆 - 一個可逆矩陣 A 記作 A−1，並且滿足以下唯一的要求：
+<br>
+
+25. **Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
+
+&#10230;
+注意：並非所有的方陣都是可逆的。同樣的，對於矩陣 A、B 來說，我們有 (AB)−1=B−1A−1
+<br>
+
+26. **Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
+
+&#10230;
+跡 - 一個方陣 A 的跡，記作 tr(A)，指的是主對角線元素之合：
+<br>
+
+27. **Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
+
+&#10230;
+注意：對於矩陣 A、B 來說，我們有 tr(AT)=tr(A) 及 tr(AB)=tr(BA)
+<br>
+
+28. **Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
+
+&#10230;
+行列式 - 一個方陣 A∈Rn×n 的行列式，記作|A| 或 det(A)，可以透過 A∖i,∖j 來遞迴表示，它是一個沒有第 i 行和第 j 列的矩陣 A：
+<br>
+
+29. **Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
+
+&#10230;
+注意：A 是一個可逆矩陣，若且唯若 |A|≠0。同樣的，|AB|=|A||B| 且 |AT|=|A|
+<br>
+
+30. **Matrix properties**
+
+&#10230;
+矩陣的性質
+<br>
+
+31. **Definitions**
+
+&#10230;
+定義
+<br>
+
+32. **Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
+
+&#10230;
+對稱分解 - 給定一個矩陣 A，它可以透過其對稱和反對稱的部分表示如下：
+<br>
+
+33. **[Symmetric, Antisymmetric]**
+
+&#10230;
+[對稱, 反對稱]
+<br>
+
+34. **Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
+
+&#10230;
+範數 - 範數指的是一個函式 N:V⟶[0,+∞[，其中 V 是一個向量空間，且對於所有 x,y∈V，我們有：
+<br>
+
+35. **N(ax)=|a|N(x) for a scalar**
+
+&#10230;
+對一個純量來說，我們有 N(ax)=|a|N(x)
+<br>
+
+36. **if N(x)=0, then x=0**
+
+&#10230;
+若 N(x)=0 時，則 x=0
+<br>
+
+37. **For x∈V, the most commonly used norms are summed up in the table below:**
+
+&#10230;
+對於 x∈V，最常用的範數總結如下表：
+<br>
+
+38. **[Norm, Notation, Definition, Use case]**
+
+&#10230;
+[範數, 表示法, 定義, 使用情境]
+<br>
+
+39. **Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
+
+&#10230;
+線性相關 - 當集合中的一個向量可以用被定義為集合中其他向量的線性組合時，則則稱此集合的向量為線性相關
+<br>
+
+40. **Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
+
+&#10230;
+注意：如果沒有向量可以如上表示時，則稱此集合的向量彼此為線性獨立
+<br>
+
+41. **Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
+
+&#10230;
+矩陣的秩 - 一個矩陣 A 的秩記作 rank(A)，指的是其列向量空間所產生的維度，等價於 A 的線性獨立的最大最大列向量
+<br>
+
+42. **Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
+
+&#10230;
+半正定矩陣 - 當以下成立時，一個矩陣 A∈Rn×n 是半正定矩陣 (PSD)，且記作A⪰0：
+<br>
+
+43. **Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
+
+&#10230;
+注意：同樣的，一個矩陣 A 是一個半正定矩陣 (PSD)，且滿足所有非零向量 x，xTAx>0 時，稱之為正定矩陣，記作 A≻0
+<br>
+
+44. **Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+特徵值、特徵向量 - 給定一個矩陣 A∈Rn×n，當存在一個向量 z∈Rn∖{0} 時，此向量被稱為特徵向量，λ 稱之為 A 的特徵值，且滿足：
+<br>
+
+45. **Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+譜分解 - 令 A∈Rn×n，如果 A 是對稱的，則 A 可以被一個實數正交矩陣 U∈Rn×n 給對角化。令 Λ=diag(λ1,...,λn)，我們得到：
+<br>
+
+46. **diagonal**
+
+&#10230;
+對角線
+<br>
+
+47. **Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
+
+&#10230;
+奇異值分解 - 對於給定維度為 mxn 的矩陣 A，其奇異值分解指的是一種因子分解技巧，保證存在 mxm 的單式矩陣 U、對角線矩陣 Σ m×n 和 nxn 的單式矩陣 V，滿足：
+<br>
+
+48. **Matrix calculus**
+
+&#10230;
+矩陣導數
+<br>
+
+49. **Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
+
+&#10230;
+梯度 - 令 f:Rm×n→R 是一個函式，且 A∈Rm×n 是一個矩陣。f 相對於 A 的梯度是一個 mxn 的矩陣，記作 ∇Af(A)，滿足：
+<br>
+
+50. **Remark: the gradient of f is only defined when f is a function that returns a scalar.**
+
+&#10230;
+注意：f 的梯度僅在 f 為一個函數且該函數回傳一個純量時有效
+<br>
+
+51. **Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
+
+&#10230;
+海森 - 令 f:Rn→R 是一個函式，且 x∈Rn 是一個向量，則一個 f 的海森對於向量 x 是一個 nxn 的對稱矩陣，記作 ∇2xf(x)，滿足：
+<br>
+
+52. **Remark: the hessian of f is only defined when f is a function that returns a scalar**
+
+&#10230;
+注意：f 的海森僅在 f 為一個函數且該函數回傳一個純量時有效
+<br>
+
+53. **Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
+梯度運算 - 對於矩陣 A、B、C，下列的梯度性質值得牢牢記住：
+&#10230;
+
+54. **[General notations, Definitions, Main matrices]**
+
+&#10230;
+[通用符號, 定義, 主要矩陣]
+<br>
+
+55. **[Matrix operations, Multiplication, Other operations]**
+
+&#10230;
+[矩陣運算, 矩陣乘法, 其他運算]
+<br>
+
+56. **[Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
+
+&#10230;
+[矩陣性質, 範數, 特徵值/特徵向量, 奇異值分解]
+<br>
+
+57. **[Matrix calculus, Gradient, Hessian, Operations]**
+
+&#10230;
+[矩陣導數, 梯度, 海森, 運算]
\ No newline at end of file

From 096dddf0431eaaa8f76d519dea0e73fae3db73c7 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Wed, 27 Feb 2019 20:21:31 -0800
Subject: [PATCH 161/531] Sync CONTRIBUTORS

---
 CONTRIBUTORS | 69 +++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 63 insertions(+), 6 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index ed831caac..35c12147f 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -1,9 +1,7 @@
 --ar
   Amjad Khatabi (translation of deep learning)
   Zaid Alyafeai (review of deep learning)
-  Zaid Alyafeai (translation for linear algebra)
-  Amjad Khatabi (review of linear algebra)
-
+  
 --de
 
 --es 
@@ -40,9 +38,16 @@
   Fernando Diaz (review of unsupervised learning)
   
 --fa
+  AlisterTA (translation of convolutional neural networks)
+  Ehsan Kermani (translation of convolutional neural networks)
+  Erfan Noury (review of convolutional neural networks)
+
   AlisterTA (translation of deep learning)
   Mohammad Karimi (review of deep learning)
   Erfan Noury (review of deep learning)
+  
+  AlisterTA (translation of deep learning tips and tricks)
+  Erfan Noury (review of deep learning tips and tricks)
 
   Erfan Noury (translation of linear algebra)
   Mohammad Karimi (review of linear algebra)
@@ -54,7 +59,10 @@
 
   Erfan Noury (translation of probabilities and statistics)
   Mohammad Karimi (review of probabilities and statistics)
-  
+
+  AlisterTA (translation of recurrent neural networks)
+  Erfan Noury (review of recurrent neural networks)
+
   Amirhosein Kazemnejad (translation of supervised learning)
   Erfan Noury (review of supervised learning)
   Mohammad Karimi (review of supervised learning)
@@ -69,17 +77,31 @@
 
 --hi
 
+--ko
+  Wooil Jeong (translation of machine learning tips and tricks)
+  
+  Wooil Jeong (translation of probabilities and statistics)
+  
+  Kwang Hyeok Ahn (translation of Unsupervised Learning)
+
 --ja
 
 --pt
+  Leticia Portella (translation of convolutional neural networks)
+  Gabriel Aparecido Fonseca (review of convolutional neural networks)
+
   Gabriel Fonseca (translation of deep learning)
   Leticia Portella (review of deep learning)
 
   Gabriel Fonseca (translation of linear algebra)
   Leticia Portella (review of linear algebra)
+  
+  Fernando Santos (translation of machine learning tips and tricks)
+  Leticia Portella (review of machine learning tips and tricks)
+  Gabriel Fonseca (review of machine learning tips and tricks)
 
-  Leticia Portella (translation of probability)
-  Flavio Clesio (review of probability)
+  Leticia Portella (translation of probabilities and statistics)
+  Flavio Clesio (review of probabilities and statistics)
 
   Leticia Portella (translation of supervised learning)
   Gabriel Fonseca (review of supervised learning)
@@ -89,12 +111,38 @@
   Tiago Danin (review of unsupervised learning)
 
 --tr
+  Ayyüce Kızrak (translation of convolutional neural networks)
+  Yavuz Kömeçoğlu (review of convolutional neural networks)
+
   Ekrem Çetinkaya (translation of deep learning)
   Omer Bukte (review of deep learning)
   
+  Ayyüce Kızrak (translation of deep learning tips and tricks)
+  Yavuz Kömeçoğlu (review of deep learning tips and tricks)
+  
   Kadir Tekeli (translation of linear algebra)
   Ekrem Çetinkaya (review of linear algebra)
   
+  Seray Beşer (translation of machine learning tips and tricks)
+  Ayyüce Kızrak (review of machine learning tips and tricks)
+  Yavuz Kömeçoğlu (review of machine learning tips and tricks)
+
+  Ayyüce Kızrak (translation of probabilities and statistics)
+  Başak Buluz (review of probabilities and statistics)
+  
+  Başak Buluz (translation of recurrent neural networks)
+  Yavuz Kömeçoğlu (review of recurrent neural networks)
+  
+  Başak Buluz (translation of supervised learning)
+  Ayyüce Kızrak (review of supervised learning)
+  
+  Yavuz Kömeçoğlu (translation of unsupervised learning)
+  Başak Buluz (review of unsupervised learning)
+  
+--uk
+  Gregory Reshetniak (translation of probabilities and statistics)
+  Denys (review of probabilities and statistics)
+  
 --zh
   Wang Hongnian (translation of supervised learning)
   Xiaohu Zhu (朱小虎) (review of supervised learning)
@@ -104,3 +152,12 @@
   kevingo (translation of deep learning)
   TobyOoO (review of deep learning)
 
+  kevingo (translation of supervised learning)
+  accelsao (review of supervised learning)
+
+  kevingo (translation of unsupervised learning)
+  imironhead (review of unsupervised learning)
+  johnnychhsu (review of unsupervised learning)
+
+  kevingo (translation of probabilities and statistics)
+  johnnychhsu (review of probabilities and statistics)

From 4f9464286b213d82d2d8875cdaf9fc61101026bf Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Wed, 27 Feb 2019 20:23:49 -0800
Subject: [PATCH 162/531] Add contributors

---
 CONTRIBUTORS | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 35c12147f..138742978 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -2,6 +2,10 @@
   Amjad Khatabi (translation of deep learning)
   Zaid Alyafeai (review of deep learning)
   
+  Zaid Alyafeai (translation of linear algebra)
+  Amjad Khatabi (review of linear algebra)
+  Mazen Melibari (review of linear algebra)
+  
 --de
 
 --es 

From c3463a038874fafffb5101a76782e86173966f5c Mon Sep 17 00:00:00 2001
From: kevingo <kevingo75@gmail.com>
Date: Tue, 12 Mar 2019 20:31:31 +0800
Subject: [PATCH 163/531] Update content after reviewing

---
 zh-tw/refresher-linear-algebra.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/zh-tw/refresher-linear-algebra.md b/zh-tw/refresher-linear-algebra.md
index 03b532385..36d4cef5d 100644
--- a/zh-tw/refresher-linear-algebra.md
+++ b/zh-tw/refresher-linear-algebra.md
@@ -25,7 +25,7 @@
 5. **Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
 
 &#10230;
-矩陣 - 我們定義 A∈Rm×n 是一個 m 行 n 列的矩陣，Ai,j∈R 代表位在第 i 行第 j 列的元素：
+矩陣 - 我們定義 A∈Rm×n 是一個 m 列 n 行的矩陣，Ai,j∈R 代表位在第 i 列第 j 行的元素：
 <br>
 
 6. **Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
@@ -97,25 +97,25 @@
 17. **Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
 
 &#10230;
-矩陣-向量 - 矩陣 A∈Rm×n 和向量 x∈Rn 的乘積是一個大小為 Rn 的向量，使得：
+矩陣-向量 - 矩陣 A∈Rm×n 和向量 x∈Rn 的乘積是一個大小為 Rm 的向量，使得：
 <br>
 
 18. **where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
 
 &#10230;
-其中 aTr,i 是 A 的行向量、ac,j 是 A 的列向量、xi 是 x 的元素
+其中 aTr,i 是 A 的列向量、ac,j 是 A 的行向量、xi 是 x 的元素
 <br>
 
 19. **Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
 
 &#10230;
-矩陣-矩陣：矩陣 A∈Rm×n 和 B∈Rn×p 為一個大小 Rn×p 的矩陣，使得：
+矩陣-矩陣：矩陣 A∈Rm×n 和 B∈Rn×p 的乘積為一個大小 Rm×p 的矩陣，使得：
 <br>
 
 20. **where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
 
 &#10230;
-其中，aTr, i, bTr, i 和 ac, j, bc, j 是列向量分別是 A 和 B 的行向量與列向量
+其中，aTr,i,bTr,i 和 ac,j,bc,j 分別是 A 和 B 的列向量與行向量
 <br>
 
 21. **Other operations**
@@ -139,7 +139,7 @@
 24. **Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
 
 &#10230;
-可逆 - 一個可逆矩陣 A 記作 A−1，並且滿足以下唯一的要求：
+可逆 - 一個可逆矩陣 A 記作 A−1，存在唯一的矩陣，使得：
 <br>
 
 25. **Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
@@ -163,7 +163,7 @@
 28. **Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
 
 &#10230;
-行列式 - 一個方陣 A∈Rn×n 的行列式，記作|A| 或 det(A)，可以透過 A∖i,∖j 來遞迴表示，它是一個沒有第 i 行和第 j 列的矩陣 A：
+行列式 - 一個方陣 A∈Rn×n 的行列式，記作|A| 或 det(A)，可以透過 A∖i,∖j 來遞迴表示，它是一個沒有第 i 列和第 j 行的矩陣 A：
 <br>
 
 29. **Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
@@ -241,7 +241,7 @@
 41. **Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
 
 &#10230;
-矩陣的秩 - 一個矩陣 A 的秩記作 rank(A)，指的是其列向量空間所產生的維度，等價於 A 的線性獨立的最大最大列向量
+矩陣的秩 - 一個矩陣 A 的秩記作 rank(A)，指的是其列向量空間所產生的維度，等價於 A 的線性獨立的最大最大行向量
 <br>
 
 42. **Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**

From 137d3d8e4a9973103af2f208d2e85c4e9e58036f Mon Sep 17 00:00:00 2001
From: kevingo <kevingo75@gmail.com>
Date: Tue, 12 Mar 2019 20:31:45 +0800
Subject: [PATCH 164/531] Add CONTRIBUTOR

---
 CONTRIBUTORS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 88affdffd..bae8c37c4 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -163,4 +163,5 @@
   johnnychhsu (review of probabilities and statistics)
 
   kevingo (translation of linear algebra)
+  Miyaya  (review of linear algebra)
 

From 84fe93f72064df3d1b3f909a33718af5b4f0c601 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 17 Mar 2019 00:38:39 -0700
Subject: [PATCH 165/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index bae8c37c4..8071830aa 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -152,16 +152,15 @@
   kevingo (translation of deep learning)
   TobyOoO (review of deep learning)
 
+  kevingo (translation of linear algebra)
+  Miyaya (review of linear algebra)
+
+  kevingo (translation of probabilities and statistics)
+  johnnychhsu (review of probabilities and statistics)
+
   kevingo (translation of supervised learning)
   accelsao (review of supervised learning)
 
   kevingo (translation of unsupervised learning)
   imironhead (review of unsupervised learning)
   johnnychhsu (review of unsupervised learning)
-
-  kevingo (translation of probabilities and statistics)
-  johnnychhsu (review of probabilities and statistics)
-
-  kevingo (translation of linear algebra)
-  Miyaya  (review of linear algebra)
-

From 5c47895dc279810aba37d5f297d3f3d11bd6ccc8 Mon Sep 17 00:00:00 2001
From: Robert Altena <Rob@Ra-ai.com>
Date: Fri, 24 May 2019 10:51:55 +0900
Subject: [PATCH 166/531] WIP. First commit.

---
 ja/refresher-linear-algebra.md | 339 +++++++++++++++++++++++++++++++++
 1 file changed, 339 insertions(+)
 create mode 100644 ja/refresher-linear-algebra.md

diff --git a/ja/refresher-linear-algebra.md b/ja/refresher-linear-algebra.md
new file mode 100644
index 000000000..94a393ce7
--- /dev/null
+++ b/ja/refresher-linear-algebra.md
@@ -0,0 +1,339 @@
+**1. Linear Algebra and Calculus refresher**
+
+&#10230;
+線形代数と微積分回顧
+<br>
+
+**2. General notations**
+
+&#10230;
+一般的表記
+<br>
+
+**3. Definitions**
+
+&#10230;
+定義
+<br>
+
+**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
+
+&#10230;
+ベクター - x∈Rnがn個のエントリを持つベクトルです。ここで、xi∈Rはi番目のエントリです。
+<br>
+
+**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
+
+&#10230;
+行列 - A∈Rm×nがm行n列の行列です。 Ai、j∈Rは、i行j列目にあるエントリです。
+<br>
+
+**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
+
+&#10230;
+備考：上で定義されたベクトルxはn×1行列と見なすことができます。 それは列ベクトルと呼ばれます。
+<br>
+
+**7. Main matrices**
+
+&#10230;
+主行列
+<br>
+
+**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
+
+&#10230;
+単位行列 - 単位行列I∈Rn×nは、対角に1、それ以外ではゼロの正方行列です。
+<br>
+
+**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
+
+&#10230;
+備考：すべての行列A∈Rn×nに対して、A×I = I×A = Aとなる。
+<br>
+
+**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
+
+&#10230;
+対角行列 - 対角行列D∈Rn×nは、対角にゼロ以外の値があり、それ以外はゼロである正方行列です。
+<br>
+
+**11. Remark: we also note D as diag(d1,...,dn).**
+
+&#10230;
+
+<br>
+
+**12. Matrix operations**
+
+&#10230;
+
+<br>
+
+**13. Multiplication**
+
+&#10230;
+
+<br>
+
+**14. Vector-vector ― There are two types of vector-vector products:**
+
+&#10230;
+
+<br>
+
+**15. inner product: for x,y∈Rn, we have:**
+
+&#10230;
+
+<br>
+
+**16. outer product: for x∈Rm,y∈Rn, we have:**
+
+&#10230;
+
+<br>
+
+**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
+
+&#10230;
+
+<br>
+
+**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
+
+&#10230;
+
+<br>
+
+**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
+
+&#10230;
+
+<br>
+
+**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
+
+&#10230;
+
+<br>
+
+**21. Other operations**
+
+&#10230;
+
+<br>
+
+**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
+
+&#10230;
+
+<br>
+
+**23. Remark: for matrices A,B, we have (AB)T=BTAT**
+
+&#10230;
+
+<br>
+
+**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
+
+&#10230;
+
+<br>
+
+**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
+
+&#10230;
+
+<br>
+
+**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
+
+&#10230;
+
+<br>
+
+**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
+
+&#10230;
+
+<br>
+
+**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
+
+&#10230;
+
+<br>
+
+**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
+
+&#10230;
+
+<br>
+
+**30. Matrix properties**
+
+&#10230;
+
+<br>
+
+**31. Definitions**
+
+&#10230;
+
+<br>
+
+**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
+
+&#10230;
+
+<br>
+
+**33. [Symmetric, Antisymmetric]**
+
+&#10230;
+
+<br>
+
+**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
+
+&#10230;
+
+<br>
+
+**35. N(ax)=|a|N(x) for a scalar**
+
+&#10230;
+
+<br>
+
+**36. if N(x)=0, then x=0**
+
+&#10230;
+
+<br>
+
+**37. For x∈V, the most commonly used norms are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**38. [Norm, Notation, Definition, Use case]**
+
+&#10230;
+
+<br>
+
+**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
+
+&#10230;
+
+<br>
+
+**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
+
+&#10230;
+
+<br>
+
+**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
+
+&#10230;
+
+<br>
+
+**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
+
+&#10230;
+
+<br>
+
+**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
+
+&#10230;
+
+<br>
+
+**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+
+<br>
+
+**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+
+<br>
+
+**46. diagonal**
+
+&#10230;
+
+<br>
+
+**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
+
+&#10230;
+
+<br>
+
+**48. Matrix calculus**
+
+&#10230;
+
+<br>
+
+**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
+
+&#10230;
+
+<br>
+
+**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
+
+&#10230;
+
+<br>
+
+**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
+
+&#10230;
+
+<br>
+
+**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
+
+&#10230;
+
+<br>
+
+**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
+
+&#10230;
+
+<br>
+
+**54. [General notations, Definitions, Main matrices]**
+
+&#10230;
+
+<br>
+
+**55. [Matrix operations, Multiplication, Other operations]**
+
+&#10230;
+
+<br>
+
+**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
+
+&#10230;
+
+<br>
+
+**57. [Matrix calculus, Gradient, Hessian, Operations]**
+
+&#10230;

From 8c849c0d88707707654d4980a7cd2b9c5c42c3bb Mon Sep 17 00:00:00 2001
From: Robert Altena <Rob@Ra-ai.com>
Date: Sat, 25 May 2019 09:26:23 +0900
Subject: [PATCH 167/531] up to line 130.

---
 ja/refresher-linear-algebra.md | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/ja/refresher-linear-algebra.md b/ja/refresher-linear-algebra.md
index 94a393ce7..a3d5c1885 100644
--- a/ja/refresher-linear-algebra.md
+++ b/ja/refresher-linear-algebra.md
@@ -61,73 +61,73 @@
 **11. Remark: we also note D as diag(d1,...,dn).**
 
 &#10230;
-
+備考：Dをdiag（d 1、...、d n）と呼ばれます。
 <br>
 
 **12. Matrix operations**
 
 &#10230;
-
+行列演算
 <br>
 
 **13. Multiplication**
 
 &#10230;
-
+行列乗算
 <br>
 
 **14. Vector-vector ― There are two types of vector-vector products:**
 
 &#10230;
-
+ベクトル-ベクトル - ベクトル-ベクトル積には2つのタイプがあります。
 <br>
 
 **15. inner product: for x,y∈Rn, we have:**
 
 &#10230;
-
+内積: x、y∈Rnについては、
 <br>
 
 **16. outer product: for x∈Rm,y∈Rn, we have:**
 
 &#10230;
-
+外積: x∈Rm,y∈Rnについては、
 <br>
 
 **17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
 
 &#10230;
-
+行列-ベクトル - 行列A∈Rm×nとベクトルx∈Rnの積はサイズRnのベクトルで、次のようになります。
 <br>
 
 **18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
 
 &#10230;
-
+ここで、aTr、iはAのベクトル行、ac、jはAのベクトル列です。 xiはxのエントリです。
 <br>
 
 **19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
 
 &#10230;
-
+行列-行列 - 行列A∈Rm×nとB∈Rn×pの積は次のようにサイズRm×pの行列です。 (There is a typo in the original: Rn×p)
 <br>
 
 **20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
 
 &#10230;
-
+aTr、i、bTr、iはベクトル行。　ac、j、bc、jはそれぞれAとBのベクトル列です。
 <br>
 
 **21. Other operations**
 
 &#10230;
-
+その他の演算
 <br>
 
 **22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
 
 &#10230;
-
+転置 ― A∈Rm×nの転置行列はATと示される。　Aの行列要素が交換されます。
 <br>
 
 **23. Remark: for matrices A,B, we have (AB)T=BTAT**

From e710f6c164cfab4e56a18e2a345f5558890585e1 Mon Sep 17 00:00:00 2001
From: Yuta Kanzawa <yutakanzawa@gmail.com>
Date: Sat, 25 May 2019 16:59:56 +0900
Subject: [PATCH 168/531] [ja] Supervised Learning

WIP. First commit

No 1-20; Page 1
---
 ja/cheatsheet-supervised-learning.md | 567 +++++++++++++++++++++++++++
 1 file changed, 567 insertions(+)
 create mode 100644 ja/cheatsheet-supervised-learning.md

diff --git a/ja/cheatsheet-supervised-learning.md b/ja/cheatsheet-supervised-learning.md
new file mode 100644
index 000000000..5f3c70235
--- /dev/null
+++ b/ja/cheatsheet-supervised-learning.md
@@ -0,0 +1,567 @@
+**1. Supervised Learning cheatsheet**
+
+&#10230;教師あり学習チートシート
+
+<br>
+
+**2. Introduction to Supervised Learning**
+
+&#10230;教師あり学習入門
+
+<br>
+
+**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
+
+&#10230;入力が{x(1),...,x(m)}, 出力が{y(1),...,y(m)}であるとき, xからyを予測する分類器を構築したい。
+
+<br>
+
+**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
+
+&#10230;予測の種類 ― 様々な種類の予測モデルは下表に集約される：
+
+<br>
+
+**5. [Regression, Classifier, Outcome, Examples]**
+
+&#10230;回帰, 分類, 出力, 例
+
+<br>
+
+**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
+
+&#10230;連続値, クラス, 線形回帰, ロジスティック回帰, SVM, ナイーブベイズ
+
+<br>
+
+**7. Type of model ― The different models are summed up in the table below:**
+
+&#10230;モデルの種類 ― 様々な種類のモデルは下表に集約される：
+
+<br>
+
+**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
+
+&#10230;判別モデル, 生成モデル, 目的, 学習対象, イメージ図, 例
+
+<br>
+
+**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
+
+&#10230;P(y|x)を直接推定する, P(y|x)を推測するためにP(x|y)を推定する, 決定境界, データの確率分布, 回帰, SVM, GDA, ナイーブベイズ
+
+<br>
+
+**10. Notations and general concepts**
+
+&#10230;記法と概念
+
+<br>
+
+**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
+
+&#10230;仮説 ― 仮説はhθと表され、選択されたモデルのことである。与えられた入力x(i)に対して、モデルの予測結果はhθ(x(i))である。
+
+<br>
+
+**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
+
+&#10230;損失関数 ― 損失関数とは(z,y)∈R×Y⟼L(z,y)∈Rを満たす関数Lで、予測値zとそれに対応する正解データ値yを入力とし、その誤差を出力するものである。一般的な損失関数は次表に集約される：
+
+<br>
+
+**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
+
+&#10230;
+
+<br>最小2乗誤差, ロジスティック損失, ヒンジ損失, クロスエントロピー
+
+**14. [Linear regression, Logistic regression, SVM, Neural Network]**
+
+&#10230;
+
+<br>線形回帰, ロジスティック回帰, SVM, ニューラルネットワーク
+
+**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
+
+&#10230;コスト関数 ― コスト関数Jは一般的にモデルの性能を評価するために用いられ、損失関数をLとして次のように定義される：
+
+<br>
+
+**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
+
+&#10230;勾配降下法 ― 学習率をα∈Rとし、勾配降下法におけるパラメータの更新は学習率とコスト関数Jを用いて次のように行われる：
+
+<br>
+
+**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
+
+&#10230;備考：確率的勾配降下法(SGD)は学習データ全体を用いてパラメータを更新し、バッチ勾配降下法は学習データの各バッチ毎に更新する。
+
+<br>
+
+**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
+
+&#10230;尤度 ― パラメータをθとすると、あるモデルの尤度L(θ)を最大にすることにより最適なパラメータを求められる。実際には、最適化しやすい対数尤度ℓ(θ)=log(L(θ))を用いる。すなわち：
+
+<br>
+
+**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
+
+&#10230;ニュートン法 ― ニュートン法とはℓ′(θ)=0となるθを求める数値計算アルゴリズムである。そのパラメータ更新は次のように行われる：
+
+<br>
+
+**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
+
+&#10230;備考：高次元正則化またはニュートン-ラフソン法ではパラメータ更新は次のように行われる：
+
+<br>
+
+**21. Linear models**
+
+&#10230;
+
+<br>
+
+**22. Linear regression**
+
+&#10230;
+
+<br>
+
+**23. We assume here that y|x;θ∼N(μ,σ2)**
+
+&#10230;
+
+<br>
+
+**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
+
+&#10230;
+
+<br>
+
+**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
+
+&#10230;
+
+<br>
+
+**26. Remark: the update rule is a particular case of the gradient ascent.**
+
+&#10230;
+
+<br>
+
+**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
+
+&#10230;
+
+<br>
+
+**28. Classification and logistic regression**
+
+&#10230;
+
+<br>
+
+**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
+
+&#10230;
+
+<br>
+
+**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
+
+&#10230;
+
+<br>
+
+**31. Remark: there is no closed form solution for the case of logistic regressions.**
+
+&#10230;
+
+<br>
+
+**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
+
+&#10230;
+
+<br>
+
+**33. Generalized Linear Models**
+
+&#10230;
+
+<br>
+
+**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
+
+&#10230;
+
+<br>
+
+**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
+
+&#10230;
+
+<br>
+
+**36. Here are the most common exponential distributions summed up in the following table:**
+
+&#10230;
+
+<br>
+
+**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
+
+&#10230;
+
+<br>
+
+**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
+
+&#10230;
+
+<br>
+
+**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
+
+&#10230;
+
+<br>
+
+**40. Support Vector Machines**
+
+&#10230;
+
+<br>
+
+**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
+
+&#10230;
+
+<br>
+
+**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
+
+&#10230;
+
+<br>
+
+**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
+
+&#10230;
+
+<br>
+
+**44. such that**
+
+&#10230;
+
+<br>
+
+**45. support vectors**
+
+&#10230;
+
+<br>
+
+**46. Remark: the line is defined as wTx−b=0.**
+
+&#10230;
+
+<br>
+
+**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
+
+&#10230;
+
+<br>
+
+**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
+
+&#10230;
+
+<br>
+
+**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
+
+&#10230;
+
+<br>
+
+**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
+
+&#10230;
+
+<br>
+
+**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
+
+&#10230;
+
+<br>
+
+**53. Remark: the coefficients βi are called the Lagrange multipliers.**
+
+&#10230;
+
+<br>
+
+**54. Generative Learning**
+
+&#10230;
+
+<br>
+
+**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
+
+&#10230;
+
+<br>
+
+**56. Gaussian Discriminant Analysis**
+
+&#10230;
+
+<br>
+
+**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
+
+&#10230;
+
+<br>
+
+**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
+
+&#10230;
+
+<br>
+
+**59. Naive Bayes**
+
+&#10230;
+
+<br>
+
+**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
+
+&#10230;
+
+<br>
+
+**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
+
+&#10230;
+
+<br>
+
+**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
+
+&#10230;
+
+<br>
+
+**63. Tree-based and ensemble methods**
+
+&#10230;
+
+<br>
+
+**64. These methods can be used for both regression and classification problems.**
+
+&#10230;
+
+<br>
+
+**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
+
+&#10230;
+
+<br>
+
+**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
+
+&#10230;
+
+<br>
+
+**67. Remark: random forests are a type of ensemble methods.**
+
+&#10230;
+
+<br>
+
+**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**69. [Adaptive boosting, Gradient boosting]**
+
+&#10230;
+
+<br>
+
+**70. High weights are put on errors to improve at the next boosting step**
+
+&#10230;
+
+<br>
+
+**71. Weak learners trained on remaining errors**
+
+&#10230;
+
+<br>
+
+**72. Other non-parametric approaches**
+
+&#10230;
+
+<br>
+
+**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
+
+&#10230;
+
+<br>
+
+**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
+
+&#10230;
+
+<br>
+
+**75. Learning Theory**
+
+&#10230;
+
+<br>
+
+**76. Union bound ― Let A1,...,Ak be k events. We have:**
+
+&#10230;
+
+<br>
+
+**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
+
+&#10230;
+
+<br>
+
+**78. Remark: this inequality is also known as the Chernoff bound.**
+
+&#10230;
+
+<br>
+
+**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
+
+&#10230;
+
+<br>
+
+**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
+
+&#10230;
+
+<br>
+
+**81: the training and testing sets follow the same distribution **
+
+&#10230;
+
+<br>
+
+**82. the training examples are drawn independently**
+
+&#10230;
+
+<br>
+
+**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
+
+&#10230;
+
+<br>
+
+**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
+
+&#10230;
+
+<br>
+
+**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
+
+&#10230;
+
+<br>
+
+**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
+
+&#10230;
+
+<br>
+
+**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
+
+&#10230;
+
+<br>
+
+**88. [Introduction, Type of prediction, Type of model]**
+
+&#10230;
+
+<br>
+
+**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
+
+&#10230;
+
+<br>
+
+**90. [Linear models, linear regression, logistic regression, generalized linear models]**
+
+&#10230;
+
+<br>
+
+**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
+
+&#10230;
+
+<br>
+
+**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
+
+&#10230;
+
+<br>
+
+**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
+
+&#10230;
+
+<br>
+
+**94. [Other methods, k-NN]**
+
+&#10230;
+
+<br>
+
+**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
+
+&#10230;

From 99eeba0cfd2af1cc93c73e66d679cda7c89d1403 Mon Sep 17 00:00:00 2001
From: Yuta Kanzawa <yutakanzawa@gmail.com>
Date: Sun, 26 May 2019 14:56:10 +0900
Subject: [PATCH 169/531] [ja] Surpervised Learning

WIP. Second commit

No 21-44; Page 2
---
 ja/cheatsheet-supervised-learning.md | 58 ++++++++++++++--------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/ja/cheatsheet-supervised-learning.md b/ja/cheatsheet-supervised-learning.md
index 5f3c70235..cb5fac160 100644
--- a/ja/cheatsheet-supervised-learning.md
+++ b/ja/cheatsheet-supervised-learning.md
@@ -72,15 +72,15 @@
 
 **13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
 
-&#10230;
+&#10230;最小2乗誤差, ロジスティック損失, ヒンジ損失, クロスエントロピー
 
-<br>最小2乗誤差, ロジスティック損失, ヒンジ損失, クロスエントロピー
+<br>
 
 **14. [Linear regression, Logistic regression, SVM, Neural Network]**
 
-&#10230;
+&#10230;線形回帰, ロジスティック回帰, SVM, ニューラルネットワーク
 
-<br>線形回帰, ロジスティック回帰, SVM, ニューラルネットワーク
+<br>
 
 **15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
 
@@ -120,145 +120,145 @@
 
 **21. Linear models**
 
-&#10230;
+&#10230;線形モデル
 
 <br>
 
 **22. Linear regression**
 
-&#10230;
+&#10230;線形回帰
 
 <br>
 
 **23. We assume here that y|x;θ∼N(μ,σ2)**
 
-&#10230;
+&#10230;ここでy|x;θ∼N(μ,σ2)であるとする。
 
 <br>
 
 **24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
 
-&#10230;
+&#10230;正規方程式 ― Xを行列とすると、コスト関数を最小化するθの値は次のような閉形式の解である：
 
 <br>
 
 **25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
 
-&#10230;
+&#10230;最小2乗法 ― 学習率をαとすると、m個のデータ点からなる学習データに対する最小2乗法（LMSアルゴリズム）によるパラメータ更新は次のように行われ、これはウィドロウ-ホフの学習規則としても知られている：
 
 <br>
 
 **26. Remark: the update rule is a particular case of the gradient ascent.**
 
-&#10230;
+&#10230;備考：この更新は勾配上昇法の特殊な例である。
 
 <br>
 
 **27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
 
-&#10230;
+&#10230;局所重み付き回帰 ― 局所重み付き回帰は、LWRとも呼ばれ、線形回帰の派生形である。パラメータをτ∈Rとして次のように定義されるw(i)(x)により、個々の学習サンプルをそのコスト関数において重み付けする：
 
 <br>
 
 **28. Classification and logistic regression**
 
-&#10230;
+&#10230;分類とロジスティック回帰
 
 <br>
 
 **29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
 
-&#10230;
+&#10230;シグモイド関数 ― シグモイド関数gは、ロジスティック関数とも呼ばれ、次のように定義される：
 
 <br>
 
 **30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
 
-&#10230;
+&#10230;ロジスティック回帰 ― ここでy|x;θ∼Bernoulli(ϕ)であるとすると、次の形式を得る：
 
 <br>
 
 **31. Remark: there is no closed form solution for the case of logistic regressions.**
 
-&#10230;
+&#10230;備考：ロジスティック回帰については閉形式の解は存在しない。
 
 <br>
 
 **32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
 
-&#10230;
+&#10230;ソフトマックス回帰 ― ソフトマックス回帰は、多クラス分類ロジスティック回帰とも呼ばれ、3個以上の結果クラスがある場合にロジスティック回帰を一般化するためのものである。慣習的に、θK=0とすると、各クラスiのベルヌーイ分布のパラメータϕiは次と等しくなる：
 
 <br>
 
 **33. Generalized Linear Models**
 
-&#10230;
+&#10230;一般化線形モデル
 
 <br>
 
 **34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
 
-&#10230;
+&#10230;指数分布族 ― 正準パラメータまたはリンク関数とも呼ばれる自然パラメータη、十分統計量T(y)及び対数分配関数a(η)を用いて、次のように表すことのできる一群の分布は指数分布族と呼ばれる：
 
 <br>
 
 **35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
 
-&#10230;
+&#10230;備考：T(y)=yとすることが多い。また、exp(−a(η))は確率の合計が１になることを担保する正規化定数だと見なせる。
 
 <br>
 
 **36. Here are the most common exponential distributions summed up in the following table:**
 
-&#10230;
+&#10230;最も一般的な指数分布族は下表に集約される：
 
 <br>
 
 **37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
 
-&#10230;
+&#10230;分布, ベルヌーイ, ガウス, ポワソン, 幾何
 
 <br>
 
-**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
+**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function of x∈Rn+1 and rely on the following 3 assumptions:**
 
-&#10230;
+&#10230;GLMの仮定 ― 一般化線形モデル(GLM)はランダムな変数yをx∈Rn+1の関数として予測することを目的とし、次の3つの仮定に依拠する：
 
 <br>
 
 **39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
 
-&#10230;
+&#10230;備考：最小2乗回帰とロジスティック回帰は一般化線形モデルの特殊な例である。
 
 <br>
 
 **40. Support Vector Machines**
 
-&#10230;
+&#10230;サポートベクターマシン
 
 <br>
 
 **41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
 
-&#10230;
+&#10230;サポートベクターマシンの目的は、データ点から線への最短距離が最大となる線を求めることである。
 
 <br>
 
 **42: Optimal margin classifier ― The optimal margin classifier h is such that:**
 
-&#10230;
+&#10230;最適マージン分類器 ― 最適マージン分類器hは次のようなものである：
 
 <br>
 
 **43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
 
-&#10230;
+&#10230;ここで、(w,b)∈Rn×Rは次の最適化問題の解である：
 
 <br>
 
 **44. such that**
 
-&#10230;
+&#10230;ただし
 
 <br>
 

From 07b2fc7cc3c484eac82d085f769f8cc87592d99a Mon Sep 17 00:00:00 2001
From: Robert Altena <Rob@Ra-ai.com>
Date: Sun, 26 May 2019 20:14:28 +0900
Subject: [PATCH 170/531] up to line 175.

---
 ja/refresher-linear-algebra.md | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/ja/refresher-linear-algebra.md b/ja/refresher-linear-algebra.md
index a3d5c1885..70eb6cdf4 100644
--- a/ja/refresher-linear-algebra.md
+++ b/ja/refresher-linear-algebra.md
@@ -133,43 +133,44 @@ aTr、i、bTr、iはベクトル行。　ac、j、bc、jはそれぞれAとBの
 **23. Remark: for matrices A,B, we have (AB)T=BTAT**
 
 &#10230;
-
+備考： 行列AとBの場合、（AB）T = BTAT** となります。
 <br>
 
 **24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
 
 &#10230;
-
+逆行列 ― 可逆正方行列Ａの逆行列はＡ − １と表される。 以下を満たす唯一の行列です。
 <br>
 
 **25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
 
 &#10230;
-
+備考： すべての正方行列が可逆的なわけではありません。　行列A、Bについては、(AB)−1=B−1A−1
 <br>
 
 **26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
 
 &#10230;
-
+跡 ― 正方行列Aの跡は、その対角要素の合計です。　tr(A)と表される。
 <br>
 
 **27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
 
 &#10230;
-
+備考： 行列A、Bの場合：　tr(AT)=tr(A)とtr(AB)=tr(BA)となります。
 <br>
 
 **28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
 
 &#10230;
-
+行列式 ― 行列式は|A| または det(A) と表される。 正方行列　A∈Rn×n　の行列式はAijによって再帰的に表現されます。
+ それはi番目の行とj番目の列のない行列Aです。 次のように：
 <br>
 
 **29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
 
 &#10230;
-
+備考： |A|≠0の場合に限り、行列は可逆行列です。また|AB|=|A||B| と |AT|=|A|。
 <br>
 
 **30. Matrix properties**

From 7c8e6ffd7ec75acdcd8435dcaa568cf3c7f2a2f2 Mon Sep 17 00:00:00 2001
From: Takatoshi nao <takatoshi.nao@gmail.com>
Date: Sat, 25 May 2019 15:02:42 +0900
Subject: [PATCH 171/531] Update refresher-probability.md

---
 ja/refresher-probability.md | 381 ++++++++++++++++++++++++++++++++++++
 1 file changed, 381 insertions(+)
 create mode 100644 ja/refresher-probability.md

diff --git a/ja/refresher-probability.md b/ja/refresher-probability.md
new file mode 100644
index 000000000..b30513fbf
--- /dev/null
+++ b/ja/refresher-probability.md
@@ -0,0 +1,381 @@
+**1. Probabilities and Statistics refresher**
+
+&#10230;確率と統計
+
+<br>
+
+**2. Introduction to Probability and Combinatorics**
+
+&#10230;確率と組合せの紹介
+
+<br>
+
+**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
+
+&#10230;標本空間 - 試行可能なすべての結果の集合は標本空間として知られ、Sと表します。
+
+<br>
+
+**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
+
+&#10230;事象 - 標本空間のすべての部分集合のEを事象と言います。つまり事象は試行可能な結果で構成された集合です。試行結果がEに含まれるなら、Eが発生した言います。
+
+<br>
+
+**5. Axioms of probability ― For each event E, we denote P(E) as the probability of event E occuring.**
+
+&#10230;確率の公理 - 各事象Eに対して、事象Eが起こる確率をP(E)と書きます。
+
+<br>
+
+**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
+
+&#10230;公理1 - すべての確立は0と1の間に含まれ次のようになります：
+
+<br>
+
+**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
+
+&#10230;公理2 - 全体の標本空間で少なくとも一つの根元事象が起こる確率は1で次のようになります：
+
+<br>
+
+**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
+
+&#10230;公理3 - 相互に排他的なとある連続した事象E1,...Enに対し、次のようになります：
+
+<br>
+
+**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
+
+&#10230;順列(Permutation) - 順列はn個の中からr個を順番を考慮して並べられた配列です。このような配列の数はP(n, r)と表し、次のように定義します:
+
+<br>
+
+**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
+
+&#10230;組合せ(Combination) - 組合せはn個の中からr個の順番を勘案しない配列です。このような配列の数はC(n,r)と表し、次のように定義します:
+
+<br>
+
+**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
+
+&#10230;注釈: 0⩽r⩽nに対し、P(n,r)⩾C(n,r)となります。
+
+<br>
+
+**12. Conditional Probability**
+
+&#10230;条件付き確率
+
+<br>
+
+**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
+
+&#10230;ベイズの定理 - P(B)>0のような事象A, Bに対して次となります:
+
+<br>
+
+**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
+
+&#10230;注釈: P(A∩B)=P(A)P(B|A)=P(A|B)P(B)となります。
+
+<br>
+
+**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
+
+&#10230;分割(Partition) - {Ai,i∈[[1,n]]}はすべてのiに対してAi≠∅としましょう。{Ai}が次のような場合、分割と言います:
+
+<br>
+
+**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
+
+&#10230;注釈: 標本空間で任意の事象Bに対して、P(B)=n∑i=1P(B|Ai)P(Ai)となります。
+
+<br>
+
+**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
+
+&#10230;ベイズの定理の応用 - {Ai,i∈[[1,n]]}を標本空間の分割としましょう。次のようになります:
+
+<br>
+
+**18. Independence ― Two events A and B are independent if and only if we have:**
+
+&#10230;独立性 - 次の場合のみ事象AとBは独立であるといいます:
+
+<br>
+
+**19. Random Variables**
+
+&#10230;確率変数
+
+<br>
+
+**20. Definitions**
+
+&#10230;定義
+
+<br>
+
+**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
+
+&#10230;確率変数 - 確率変数は主にXと表記し標本空間のすべての要素に実線で対応する関数です。
+
+<br>
+
+**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
+
+&#10230;累積分布関数(CDF) - 単調非減少の累積分布関数Fはlimx→−∞F(x)=0 and limx→+∞F(x)=1となり次のように定義します:
+
+<br>
+
+**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
+
+&#10230;注釈: P(a<X⩽B)=F(b)−F(a)となります。
+
+<br>
+
+**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
+
+&#10230;確率密度関数(PDF) - 確率密度関数Fは隣接する二つの確率変数の間に置かれる確率です。
+
+<br>
+
+**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
+
+&#10230;PDFとCDFとの関係性 - 離散(D)と連続(C)の例から知るべき重要な特性があります。
+
+<br>
+
+**26. [Case, CDF F, PDF f, Properties of PDF]**
+
+&#10230;[例、CDF F、PDF f、PDFの特性]
+
+<br>
+
+**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
+
+&#10230;分布の期待値と積率 - 離散または連続の場合、期待値E[X]、一般化した期待値E[g(X)]、k次の積率E[Xk]と特性関数ψ(ω):
+
+<br>
+
+**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
+
+&#10230;分散(Variance) - 確率変数の分散は主にVar(X)またはσ2と表記し、分布関数の散布度を測定したものです。次のように決まります。
+
+<br>
+
+**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
+
+&#10230;標準偏差(Standard deviation) - 確率変数の標準偏差は主にσと表記し実確率変数の単位をしようする分布関数の散布度を測定したものです。次のように決まります。
+
+<br>
+
+**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
+
+&#10230;確率変数の変換 - 変数XとYは任意の関数に繋がってるとします。fXとfYに各々XとYの分布関数を表記すると次のようになります:
+
+<br>
+
+**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
+
+&#10230;ライプニッツ積分法 - gをxの関数とし、暫定的にcとしましょう。そしてcに従属的な境界a,bに対して次のようになります。
+
+<br>
+
+**32. Probability Distributions**
+
+&#10230;確率分布
+
+<br>
+
+**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
+
+&#10230;チェビシェフの不等式 - Xを期待値μをの確率変数とします。kに対して、σ>0なら次のような不等式を持ちます。
+
+<br>
+
+**34. Main distributions ― Here are the main distributions to have in mind:**
+
+&#10230;主な分布 - 覚えておくべき主な分布があります:
+
+<br>
+
+**35. [Type, Distribution]**
+
+&#10230;[タイプ、分布]
+
+<br>
+
+**36. Jointly Distributed Random Variables**
+
+&#10230;結合確率変数
+
+<br>
+
+**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
+
+&#10230;周辺密度と累積分布 - 結合密度確率関数fXYから次のようになります。
+
+<br>
+
+**38. [Case, Marginal density, Cumulative function]**
+
+&#10230;[例,、周辺密度、累積関数]
+
+<br>
+
+**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
+
+&#10230;条件部密度(Conditional density) - Yに対するXの条件部密度は主にfx|Yと表記され、次のように定義されます:
+
+<br>
+
+**40. Independence ― Two random variables X and Y are said to be independent if we have:**
+
+&#10230;独立性(Independence) - 二つの確率変数XとYは次の場合、独立的と言います。
+
+<br>
+
+**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
+
+&#10230;共分散(Covariance) - 次のようにふたつの確率変数X,Yの共分散をσ2XYまたはさらに一般的にはCov(X,Y)で定義します。
+
+<br>
+
+**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
+
+&#10230;相関関係(Correlation) - X, Yの標準変数をσX,σYで表記し、確率変数X,Yの相関関係をρXYで表記し、次のように定義します。
+
+<br>
+
+**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
+
+&#10230;注釈 1: 任意の確率変数X,Yに対してρXY∈[−1,1]となります。
+
+<br>
+
+**44. Remark 2: If X and Y are independent, then ρXY=0.**
+
+&#10230;注釈 2: XとYが独立ならρXY=0です。
+
+<br>
+
+**45. Parameter estimation**
+
+&#10230;母数推定
+
+<br>
+
+**46. Definitions**
+
+&#10230;定義
+
+<br>
+
+**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
+
+&#10230;確率標本(Random sample) - 確率標本はXと独立で同一に分布するn個の確率変数X1,...,Xnの集まりです。
+
+<br>
+
+**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
+
+&#10230;推定量(Estimator) - 推定量は統計モデルで未知のパラメータの値を推定するために使用されるデータの関数です。
+
+<br>
+
+**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
+
+&#10230;偏り(Bias) - 推定量^θの偏りは^θの期待値と実際の値との差で定義されます。
+
+<br>
+
+**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
+
+&#10230;注釈: 推定量はE[^θ]=θの場合、不偏といいます。
+
+<br>
+
+**51. Estimating the mean**
+
+&#10230;平均の推定
+
+<br>
+
+**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
+
+&#10230;標本平均(Sample mean) - 確率標本の標本平均は実の平均μを推定するのに用いられ、主に¯¯¯¯¯Xと表記され次のように定義されます。
+
+<br>
+
+**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
+
+&#10230;注釈: 標本平均は不偏です。すなわちE[¯¯¯¯¯X]=μとなります。
+
+<br>
+
+**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
+
+&#10230;中心極限定理 - 平均μと分散σ2を持つ分布を従う確率標本X1,...,Xnがある。その場合、次のようになります。
+
+<br>
+
+**55. Estimating the variance**
+
+&#10230;分散推定
+
+<br>
+
+**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
+
+&#10230;標本分散 - 確率標本の標本分散は実の分散σ2を推定するのに用いられ、主にs2または^σ2と表記し次のように定義されます。
+
+<br>
+
+**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
+
+&#10230;注釈: 標本分散は不偏です。つまりE[s2]=σ2になります。
+
+<br>
+
+**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
+
+&#10230;標本分散とカイ二乗の関係 - 確率標本の標本分散をs2としよう。次のようになります。
+
+<br>
+
+**59. [Introduction, Sample space, Event, Permutation]**
+
+&#10230;[紹介、標本空間、事象、順列]
+
+<br>
+
+**60. [Conditional probability, Bayes' rule, Independence]**
+
+&#10230;[条件部確率、ベイズの定理、独立]
+
+<br>
+
+**61. [Random variables, Definitions, Expectation, Variance]**
+
+&#10230;[確率変数、定義、期待値、分散]
+
+<br>
+
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
+
+&#10230;[確率分布、チェビシェフの不等式、主な分布]
+
+<br>
+
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
+
+&#10230;[結合分布の確率変数、密度、共分散、相関関係]
+
+<br>
+
+**64. [Parameter estimation, Mean, Variance]**
+
+&#10230;[母数推定、平均、分散]

From 9674982eaf46790b48c3b054aeedae4f976568d0 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 26 May 2019 13:06:21 -0700
Subject: [PATCH 172/531] Update README.md

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 588f9506c..f9d0620b5 100644
--- a/README.md
+++ b/README.md
@@ -57,11 +57,11 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |Deep learning|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|
-|Supervised learning|done|done|done|not started|done|done|
+|Supervised learning|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/144)|done|done|
 |Unsupervised learning|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|
 |ML tips and tricks|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|
-|Probabilities and Statistics|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|
-|Linear algebra|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|Probabilities and Statistics|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/142)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|
+|Linear algebra|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/140)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|

From b591068b1f54e15e1fda67c3c994f09bb8c1b289 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 26 May 2019 13:21:35 -0700
Subject: [PATCH 173/531] Update README.md

---
 README.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index f9d0620b5..2d97f6355 100644
--- a/README.md
+++ b/README.md
@@ -83,14 +83,14 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|done|
 
 
-|Cheatsheet topic|Magyar|Deutsch|
+|Cheatsheet topic|Magyar|Deutsch|Deutsch|Bahasa Indonesia|
 |:---|:---:|:---:|
-|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|
-|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|
-|Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|
-|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|
-|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|
+|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|not started|
+|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
+|Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
+|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
+|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
+|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
 
 
 ## Acknowledgements

From 990b1d4033177c45c732e8ec25a25f9f89c8bce7 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 26 May 2019 13:22:13 -0700
Subject: [PATCH 174/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 2d97f6355..e19461df5 100644
--- a/README.md
+++ b/README.md
@@ -84,7 +84,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 
 |Cheatsheet topic|Magyar|Deutsch|Deutsch|Bahasa Indonesia|
-|:---|:---:|:---:|
+|:---|:---:|:---:|:---:|
 |Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|not started|
 |Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
 |Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|

From e8cf6493a0c014efa91a8aca37e4e7e5732da1f8 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 26 May 2019 13:22:40 -0700
Subject: [PATCH 175/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index e19461df5..c4ccc313a 100644
--- a/README.md
+++ b/README.md
@@ -83,7 +83,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|done|
 
 
-|Cheatsheet topic|Magyar|Deutsch|Deutsch|Bahasa Indonesia|
+|Cheatsheet topic|Magyar|Deutsch|Bahasa Indonesia|
 |:---|:---:|:---:|:---:|
 |Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|not started|
 |Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|

From 6e279a304e20d112a3c5477c77dfd651e8480af9 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 26 May 2019 13:23:42 -0700
Subject: [PATCH 176/531] Update README.md

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index c4ccc313a..c9f6f8694 100644
--- a/README.md
+++ b/README.md
@@ -85,12 +85,12 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 |Cheatsheet topic|Magyar|Deutsch|Bahasa Indonesia|
 |:---|:---:|:---:|:---:|
-|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|not started|
+|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
 |Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
 |Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
 |ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
-|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
-|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
+|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
+|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
 
 
 ## Acknowledgements

From 7cb94845f32e0ddf79fb17c751ecdbc7183bfb38 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 26 May 2019 13:24:41 -0700
Subject: [PATCH 177/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index c9f6f8694..120ec0f79 100644
--- a/README.md
+++ b/README.md
@@ -87,7 +87,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |:---|:---:|:---:|:---:|
 |Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
 |Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
-|Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
+|Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|
 |ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
 |Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
 |Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|

From 9e78f9c33f54d1c26f95915811d448e50c870129 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 26 May 2019 13:25:55 -0700
Subject: [PATCH 178/531] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 120ec0f79..dd151c4c8 100644
--- a/README.md
+++ b/README.md
@@ -88,9 +88,9 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
 |Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
 |Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
+|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|
 |Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
-|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
+|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
 
 
 ## Acknowledgements

From 1cb04c516c0374dfce2a853a2fb9761a6ca40c24 Mon Sep 17 00:00:00 2001
From: scrambleegg7 <scrambleegg7@gmail.com>
Date: Mon, 27 May 2019 15:20:22 +0900
Subject: [PATCH 179/531] initial version

---
 ja/recurrent-neural-networks.md | 677 ++++++++++++++++++++++++++++++++
 1 file changed, 677 insertions(+)
 create mode 100644 ja/recurrent-neural-networks.md

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
new file mode 100644
index 000000000..065bd4ced
--- /dev/null
+++ b/ja/recurrent-neural-networks.md
@@ -0,0 +1,677 @@
+**Recurrent Neural Networks translation**
+
+<br>
+
+**1. Recurrent Neural Networks cheatsheet**
+
+&#10230;リカレントニューラルネットワーク　チートシート
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230;
+
+<br>
+
+
+**3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
+
+&#10230;
+
+<br>
+
+
+**4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
+
+&#10230;
+
+<br>
+
+
+**5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
+
+&#10230;
+
+<br>
+
+
+**6. [Comparing words, Cosine similarity, t-SNE]**
+
+&#10230;
+
+<br>
+
+
+**7. [Language model, n-gram, Perplexity]**
+
+&#10230;
+
+<br>
+
+
+**8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
+
+&#10230;
+
+<br>
+
+
+**9. [Attention, Attention model, Attention weights]**
+
+&#10230;
+
+<br>
+
+
+**10. Overview**
+
+&#10230;
+
+<br>
+
+
+**11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
+
+&#10230;
+
+<br>
+
+
+**12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
+
+&#10230;
+
+<br>
+
+
+**13. and**
+
+&#10230;
+
+<br>
+
+
+**14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
+
+&#10230;
+
+<br>
+
+
+**15. The pros and cons of a typical RNN architecture are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
+
+&#10230;
+
+<br>
+
+
+**17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
+
+&#10230;
+
+<br>
+
+
+**18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**19. [Type of RNN, Illustration, Example]**
+
+&#10230;
+
+<br>
+
+
+**20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
+
+&#10230;
+
+<br>
+
+
+**21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
+
+&#10230;
+
+<br>
+
+
+**22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
+
+&#10230;
+
+<br>
+
+
+**23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
+
+&#10230;
+
+<br>
+
+
+**24. Handling long term dependencies**
+
+&#10230;
+
+<br>
+
+
+**25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
+
+&#10230;
+
+<br>
+
+
+**26. [Sigmoid, Tanh, RELU]**
+
+&#10230;
+
+<br>
+
+
+**27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
+
+&#10230;
+
+<br>
+
+
+**28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
+
+&#10230;
+
+<br>
+
+
+**29. clipped**
+
+&#10230;
+
+<br>
+
+
+**30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
+
+&#10230;
+
+<br>
+
+
+**31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**32. [Type of gate, Role, Used in]**
+
+&#10230;
+
+<br>
+
+
+**33. [Update gate, Relevance gate, Forget gate, Output gate]**
+
+&#10230;
+
+<br>
+
+
+**34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
+
+&#10230;
+
+<br>
+
+
+**35. [LSTM, GRU]**
+
+&#10230;
+
+<br>
+
+
+**36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
+
+&#10230;
+
+<br>
+
+
+**37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
+
+&#10230;
+
+<br>
+
+
+**38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
+
+&#10230;
+
+<br>
+
+
+**39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
+
+&#10230;
+
+<br>
+
+
+**40. [Bidirectional (BRNN), Deep (DRNN)]**
+
+&#10230;
+
+<br>
+
+
+**41. Learning word representation**
+
+&#10230;
+
+<br>
+
+
+**42. In this section, we note V the vocabulary and |V| its size.**
+
+&#10230;
+
+<br>
+
+
+**43. Motivation and notations**
+
+&#10230;
+
+<br>
+
+
+**44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**45. [1-hot representation, Word embedding]**
+
+&#10230;
+
+<br>
+
+
+**46. [teddy bear, book, soft]**
+
+&#10230;
+
+<br>
+
+
+**47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
+
+&#10230;
+
+<br>
+
+
+**48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
+
+&#10230;
+
+<br>
+
+
+**49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
+
+&#10230;
+
+<br>
+
+
+**50. Word embeddings**
+
+&#10230;
+
+<br>
+
+
+**51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
+
+&#10230;
+
+<br>
+
+
+**52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
+
+&#10230;
+
+<br>
+
+
+**53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
+
+&#10230;
+
+<br>
+
+
+**54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
+
+&#10230;
+
+<br>
+
+
+**55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
+
+&#10230;
+
+<br>
+
+
+**56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
+
+&#10230;
+
+<br>
+
+
+**57. Remark: this method is less computationally expensive than the skip-gram model.**
+
+&#10230;
+
+<br>
+
+
+**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
+
+&#10230;
+
+<br>
+
+
+**58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
+Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
+
+&#10230;
+
+<br>
+
+
+**59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
+
+&#10230;
+
+<br>
+
+
+**60. Comparing words**
+
+&#10230;
+
+<br>
+
+
+**61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
+
+&#10230;
+
+<br>
+
+
+**62. Remark: θ is the angle between words w1 and w2.**
+
+&#10230;
+
+<br>
+
+
+**63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
+
+&#10230;
+
+<br>
+
+
+**64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
+
+&#10230;
+
+<br>
+
+
+**65. Language model**
+
+&#10230;
+
+<br>
+
+
+**66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
+
+&#10230;
+
+<br>
+
+
+**67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
+
+&#10230;
+
+<br>
+
+
+**68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**69. Remark: PP is commonly used in t-SNE.**
+
+&#10230;
+
+<br>
+
+
+**70. Machine translation**
+
+&#10230;
+
+<br>
+
+
+**71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
+
+&#10230;
+
+<br>
+
+
+**72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
+
+&#10230;
+
+<br>
+
+
+**73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
+
+&#10230;
+
+<br>
+
+
+**74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
+
+&#10230;
+
+<br>
+
+
+**75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
+
+&#10230;
+
+<br>
+
+
+**76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
+
+&#10230;
+
+<br>
+
+
+**77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
+
+&#10230;
+
+<br>
+
+
+**78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
+
+&#10230;
+
+<br>
+
+
+**79. [Case, Root cause, Remedies]**
+
+&#10230;
+
+<br>
+
+
+**80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
+
+&#10230;
+
+<br>
+
+
+**81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**82. where pn is the bleu score on n-gram only defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
+
+&#10230;
+
+<br>
+
+
+**84. Attention**
+
+&#10230;
+
+<br>
+
+
+**85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
+
+&#10230;
+
+<br>
+
+
+**86. with**
+
+&#10230;
+
+<br>
+
+
+**87. Remark: the attention scores are commonly used in image captioning and machine translation.**
+
+&#10230;
+
+<br>
+
+
+**88. A cute teddy bear is reading Persian literature.**
+
+&#10230;
+
+<br>
+
+
+**89. Attention weight ― The amount of attention that the output y<t> should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**90. Remark: computation complexity is quadratic with respect to Tx.**
+
+&#10230;
+
+<br>
+
+
+**91. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+**92. Original authors**
+
+&#10230;
+
+<br>
+
+**93. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**94. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**95. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+**96. By X and Y**
+
+&#10230;
+
+<br>

From 20c7190cc647dd3fc050dc3f041e3228847e643b Mon Sep 17 00:00:00 2001
From: scrambleegg7 <scrambleegg7@gmail.com>
Date: Mon, 27 May 2019 23:30:56 +0900
Subject: [PATCH 180/531] translate to japanese version

---
 ja/recurrent-neural-networks.md | 192 ++++++++++++++++----------------
 1 file changed, 96 insertions(+), 96 deletions(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 065bd4ced..917fb2378 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -11,399 +11,399 @@
 
 **2. CS 230 - Deep Learning**
 
-&#10230;
+&#10230;ディープラーニング
 
 <br>
 
 
 **3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
 
-&#10230;
+&#10230;概要、アーキテクチャの構造、RNNの応用アプリケーション、損失関数、逆伝播
 
 <br>
 
 
 **4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
 
-&#10230;
+&#10230;長期依存性関係の処理、活性化関数、勾配喪失と発散、勾配クリッピング、GRU/LTSM、ゲートの種類、双方向性RNN、ディープ(深層学習)RNN
 
 <br>
 
 
 **5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
 
-&#10230;
+&#10230;単語出現の学習、ノーテーション、埋め込み行列、Word2vec、スキップグラム、ネガティブサンプリング、グローブ
 
 <br>
 
 
 **6. [Comparing words, Cosine similarity, t-SNE]**
 
-&#10230;
+&#10230;単語の比較、コサイン類似度、t-SNE
 
 <br>
 
 
 **7. [Language model, n-gram, Perplexity]**
 
-&#10230;
+&#10230;言語モデル、n-gramモデル、パープレキシティ
 
 <br>
 
 
 **8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
 
-&#10230;
+&#10230;機械翻訳、ビームサーチ、言語長正規化、エラー分析、ブルースコア(機械翻訳比較スコア)
 
 <br>
 
 
 **9. [Attention, Attention model, Attention weights]**
 
-&#10230;
+&#10230;アテンション、アテンションモデル、アテンションウェイト
 
 <br>
 
 
 **10. Overview**
 
-&#10230;
+&#10230;概要
 
 <br>
 
 
 **11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
 
-&#10230;
+&#10230;一般的なRNNのアーキテクチャ - RNNとして知られるリカレントニューラルネットワークは、直前隠れ層の状態を利用しながら、過去の(一時点前の)情報を入力情報と取り扱うことを可能にするニューラルネットワークです。一般的なモデルは下記のようになります。
 
 <br>
 
 
 **12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
 
-&#10230;
+&#10230;それぞれの時点t において活性化関数の状態 a<t> と出力 y<t> は下記のように表現されます。　
 
 <br>
 
 
 **13. and**
 
-&#10230;
+&#10230;そして
 
 <br>
 
 
 **14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
 
-&#10230;
+&#10230;Wax,Waa,Wya,baは一時的に共有係数であり、g1,g2は活性化関数です。
 
 <br>
 
 
 **15. The pros and cons of a typical RNN architecture are summed up in the table below:**
 
-&#10230;
+&#10230;一般的なRNNのアーキテクチャ利用の長所・短所については下記の表にまとめられています。
 
 <br>
 
 
 **16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
 
-&#10230;
+&#10230;長所、任意の長さの入力を処理できる、入力サイズに比べてモデルサイズは大きくならない、時間軸を考慮した計算パワー、時間軸での重みは共有される
 
 <br>
 
 
 **17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
 
-&#10230;
+&#10230;短所、遅い計算時間、長期の時間軸にわたるデータ探索が困難、現在の状態から将来の入力を予測不可能
 
 <br>
 
 
 **18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
 
-&#10230;
+&#10230;RNNの応用 -  RNNモデルは主に自然言語処理と音声認識の分野で使用されます。以下の表に、さまざまなアプリケーションの概要が下記のテーブルに示されます。
 
 <br>
 
 
 **19. [Type of RNN, Illustration, Example]**
 
-&#10230;
+&#10230;RNNの種類、イラスト、例
 
 <br>
 
 
 **20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
 
-&#10230;
+&#10230;一対一、一対多、多対一、多対多
 
 <br>
 
 
 **21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
 
-&#10230;
+&#10230;伝統的なニューラルネットワーク、音楽生成、感情分類、固有名詞認識、機械翻訳
 
 <br>
 
 
 **22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
 
-&#10230;
+&#10230;損失関数 - リカレントニューラルネットワークの場合、すべての時間軸での損失関数Lは、それぞれの時点での損失に基づき、次のように定義されます
 
 <br>
 
 
 **23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
 
-&#10230;
+&#10230;時間軸での誤差逆伝播法 - 誤差逆伝播法(バックプロパゲーション)は各時点で行われます。時間ステップＴにおいて、重み行列Ｗに関する損失Ｌの導関数は以下のように表されます。
 
 <br>
 
 
 **24. Handling long term dependencies**
 
-&#10230;
+&#10230;長期依存関係の処理
 
 <br>
 
 
 **25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
 
-&#10230;
+&#10230;一般的に使用される活性化関数 -  RNNモジュールで使用される最も一般的な活性化関数を以下に説明します。
 
 <br>
 
 
 **26. [Sigmoid, Tanh, RELU]**
 
-&#10230;
+&#10230;[ジグモイド、Tanh、RELU]
 
 <br>
 
 
 **27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
 
-&#10230;
+&#10230;勾配消失と勾配爆発について - 勾配消失と勾配爆発の現象は、RNNでよく見られます。これらの現象が起こる理由は、多層にわたり勾配が指数関数的に減少/増加する可能性があるため、長期の依存関係を計算するのには向いていないからです。
 
 <br>
 
 
 **28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
 
-&#10230;
+&#10230;勾配クリッピング - 誤差逆伝播法を実行するときに時折発生する勾配爆発問題に対処するために使用される手法です。勾配の最大値(閾値)を定義することで、この現象が抑制されます。
 
 <br>
 
 
 **29. clipped**
 
-&#10230;
+&#10230;クリップド
 
 <br>
 
 
 **30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
 
-&#10230;
+&#10230;ゲートの種類 - 勾配消失問題を解決するために、特定のゲートがいくつかのRNNで使用され、通常明確に定義された目的を持っています。それらは通常Γと記され、以下と同じです。
 
 <br>
 
 
 **31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
 
-&#10230;
+&#10230;ここで、W、U、bはゲート固有の係数、σはシグモイド関数です。主なものは以下の表にまとめられています。
 
 <br>
 
 
 **32. [Type of gate, Role, Used in]**
 
-&#10230;
+&#10230;[ゲートの種類、役割、で使用]
 
 <br>
 
 
 **33. [Update gate, Relevance gate, Forget gate, Output gate]**
 
-&#10230;
+&#10230;[更新ゲート、関連ゲート、忘却ゲート、出力ゲート]
 
 <br>
 
 
 **34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
 
-&#10230;
+&#10230;[過去情報はどのくらい重要ですか？ 前の情報を削除しますか？、セルを消去しますか？　しませんか？　セルを表示するコストはどのくらいですか？]
 
 <br>
 
 
 **35. [LSTM, GRU]**
 
-&#10230;
+&#10230;[LSTM GRU]
 
 <br>
 
 
 **36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
 
-&#10230;
+&#10230;GRU/LSTM -  ゲートリカレントユニット（GRU）およびロングショートタームメモリユニット（LSTM）は、従来のRNNで問題になった勾配消失問題を解決します。LSTMはGRUの一般化名称です。以下は、各アーキテクチャの特性式をまとめた表です。
 
 <br>
 
 
 **37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
 
-&#10230;
+&#10230;特性評価、ゲートリカレントユニット（GRU）、ロングショートタームメモリ（LSTM）、依存関係
 
 <br>
 
 
 **38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
 
-&#10230;
+&#10230;備考：記号*は2つのベクトル間の要素ごとの乗算を表します。
 
 <br>
 
 
 **39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
 
-&#10230;
+&#10230;RNNの変化版 - 以下の表は、一般的に使用されている他のRNNアーキテクチャをまとめたものです。
 
 <br>
 
 
 **40. [Bidirectional (BRNN), Deep (DRNN)]**
 
-&#10230;
+&#10230;[双方向(BRNN)、ディープ(DRNN)] 
 
 <br>
 
 
 **41. Learning word representation**
 
-&#10230;
+&#10230;単語表現の学習
 
 <br>
 
 
 **42. In this section, we note V the vocabulary and |V| its size.**
 
-&#10230;
+&#10230;この節では、Vを語彙、そして|V|を語彙のサイズとして定義します。
 
 <br>
 
 
 **43. Motivation and notations**
 
-&#10230;
+&#10230;動機と表記
 
 <br>
 
 
 **44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
 
-&#10230;
+&#10230;表現のテクニック - 単語を表現する2つの主な方法は、以下の表にまとめられています。
 
 <br>
 
 
 **45. [1-hot representation, Word embedding]**
 
-&#10230;
+&#10230;[1-ホット表現、Wordの埋め込み]
 
 <br>
 
 
 **46. [teddy bear, book, soft]**
 
-&#10230;
+&#10230;テディベア、本、ソフト
 
 <br>
 
 
 **47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
 
-&#10230;
+&#10230;[表記 ow、ナイーブベイズアプローチ、類似性情報なし、表記 ew、単語の類似性を考慮に入れる]
 
 <br>
 
 
 **48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
 
-&#10230;
+&#10230;埋め込み行列 - 与えられた単語wに対して、埋め込み行列Eは、1-hot表現owを埋め込み行列ewに写像します。式は以下のようになります。
 
 <br>
 
 
 **49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
 
-&#10230;
+&#10230;注：埋め込み行列は、ターゲット/コンテキスト尤度モデルを使用して学習できます。
 
 <br>
 
 
 **50. Word embeddings**
 
-&#10230;
+&#10230;単語の埋め込み
 
 <br>
 
 
 **51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
 
-&#10230;
+&#10230;Word2vec -  Word2vecは、ある単語が他の周辺単語から導きだされる可能性を推定することで、単語の埋め込みの重みを学習することを目的としたフレームワークです。この一般的なモデルは、スキップグラム、ネガティブサンプリング、およびCBOWがあります。
 
 <br>
 
 
 **52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
 
-&#10230;
+&#10230;[かわいいテディベアが読んでいる、テディベア、ソフト、ペルシャ詩、芸術]
 
 <br>
 
 
 **53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
 
-&#10230;
+&#10230;[代理タスク上のネットワークの訓練、高水準表現の抽出、単語埋め込み重みの計算]
 
 <br>
 
 
 **54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
 
-&#10230;
+&#10230;スキップグラム - スキップグラムword2vecモデルは、あるコンテキスト単語を与え、ターゲット単語t の出現確率を計算することで単語の埋め込みを学習する教師付き学習タスクです。時点tと関係するパラメーターθtと表記すると、確率P(t|c) は下記のように表現されます。
 
 <br>
 
 
 **55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
 
-&#10230;
+&#10230;注意：softmax部分の分母全体の語彙全体を合計すると、モデルの計算コストは高くなります。 CBOWは、ある単語を予測するため周辺単語を使用する別のタイプのword2vecモデルです。
 
 <br>
 
 
 **56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
 
-&#10230;
+&#10230;ネガティブサンプリング -  k個のネガティブな例と1つのポジティブな例で訓練されたモデルで、ある与えられた文脈とターゲット単語の出現確率を評価するロジスティック回帰を使用するバイナリ分類器です。単語cとターゲット語tが与えられると、予測は次のように表現されます。
 
 <br>
 
 
 **57. Remark: this method is less computationally expensive than the skip-gram model.**
 
-&#10230;
+&#10230;注意：この計算コストは、スキップグラムモデルよりも少ないです。
 
 <br>
 
 
 **57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
 
-&#10230;
+&#10230;GloVe  -  GloVeモデルは、単語表現のためのグローバルベクトルの略で、共起行列Xを使用する単語の埋め込み手法です。ここで、各Xi、jは、ターゲットiがコンテキストjで発生した回数を表します。そのコスト関数Jは以下の通りです。
 
 <br>
 
@@ -411,267 +411,267 @@
 **58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
 Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
 
-&#10230;
+&#10230;ここで、fはXi、j =0⟹f（Xi、j）= 0となるような重み関数です。このモデルでeとθが果たす対称性を考えると、e（final）wが最後の単語の埋め込みになります。
 
 <br>
 
 
 **59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
 
-&#10230;
+&#10230;注意：学習された単語の埋め込みの個々の要素は、必ずしも関係性がある必要はないです。
 
 <br>
 
 
 **60. Comparing words**
 
-&#10230;
+&#10230;単語の比較
 
 <br>
 
 
 **61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
 
-&#10230;
+&#10230;コサイン類似度 - 単語w1とw2のコサイン類似度は次のように表されます。
 
 <br>
 
 
 **62. Remark: θ is the angle between words w1 and w2.**
 
-&#10230;
+&#10230;注意：θはワードw1とw2の間の角度です。
 
 <br>
 
 
 **63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
 
-&#10230;
+&#10230; t-SNE − t-SNE（ｔ−分布確率的近傍埋め込み）は、高次元埋め込みから低次元埋め込み空間への次元削減を目的とした技法です。実際には、2次元空間で単語ベクトルを視覚化するために使用されます。
 
 <br>
 
 
 **64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
 
-&#10230;
+&#10230;[文学、芸術、本、文化、詩、読書、知識、娯楽、愛らしい、幼年期、親切、テディベア、ソフト、抱擁、かわいい、愛らしい] 
 
 <br>
 
 
 **65. Language model**
 
-&#10230;
+&#10230;言語モデル
 
 <br>
 
 
 **66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
 
-&#10230;
+&#10230;概要 - 言語モデルは文の確率P(y)を推定することを目的としています。
 
 <br>
 
 
 **67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
 
-&#10230;
+&#10230;n-gramモデル - このモデルは、トレーニングデータでの出現数を数えることによって、コーパス表現の出現確率を定量化することを目的とした単純なアプローチです。
 
 <br>
 
 
 **68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
 
-&#10230;
+&#10230;パープレキシティ - 言語モデルは、一般的にPPとも呼ばれるパープレキシティメトリックを使用して評価されます。これは、ワード数Tにより正規化されたデータセットの確率の逆数と解釈できます。パープレキシティの数値はより低いものがより選択しやすい単語として評価されます(訳注:10であれば10個の中から1つ選択される、10000であれば10000個の中から1つ)、評価式は下記のようになります。
 
 <br>
 
 
 **69. Remark: PP is commonly used in t-SNE.**
 
-&#10230;
+&#10230;備考：PPはt-SNEで一般的に使用されています。
 
 <br>
 
 
 **70. Machine translation**
 
-&#10230;
+&#10230;機械翻訳
 
 <br>
 
 
 **71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
 
-&#10230;
+&#10230;概要 - 機械翻訳モデルは、エンコーダーネットワークのロジックが最初に付加されている以外は、言語モデルと似ています。このため、条件付き言語モデルと呼ばれることもあります。目的は次のような文yを見つけることです。
 
 <br>
 
 
 **72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
 
-&#10230;
+&#10230;ビーム検索 - 入力xが与えられたとき最も可能性の高い文yを見つける、機械翻訳と音声認識で使用されるヒューリスティック探索アルゴリズムです。
 
 <br>
 
 
 **73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
 
-&#10230;
+&#10230;［ステップ１：単語y<1>と高い確率を持つ上位Ｂ個の組み合わせを見つける。ステップ２：条件付き確率y<k>|x,y<1>,...,y<k−1>を計算する。ステップ３：上位Ｂ個の組み合わせx,y<1>,...,y<k>を保持しながら、あるストップワードでプロセスを終了する]
 
 <br>
 
 
 **74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
 
-&#10230;
+&#10230;注意：ビーム幅が1に設定されている場合、これは単純な貪欲法と同等の結果を導きます。
 
 <br>
 
 
 **75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
 
-&#10230;
+&#10230;ビーム幅 - ビーム幅Bはビームサーチのパラメータです。 Bの値を大きくするとより良い結果が得られますが、探索パフォーマンスは低下し、メモリ使用量が増加します。 Bの値が小さいと結果が悪くなりますが、計算量は少なくなります。 Bの標準推奨値は10前後です。
 
 <br>
 
 
 **76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
 
-&#10230;
+&#10230;文章の長さの正規化 - 数値の安定性を向上させるために、ビームサーチは通常次のような正規化、特に対数尤度正規化された探索対象物に対して適用されます。
 
 <br>
 
 
 **77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
 
-&#10230;
+&#10230;注意：パラメーターαは緩衝パラメーターと見なされ、その値は通常0.5から1の間です。
 
 <br>
 
 
 **78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
 
-&#10230;
+&#10230;エラー分析 - 予測ˆyの翻訳が誤りである場合、その文の後に続く誤り分析を実行することで訳文y*がなぜ不正解であるかを理解することが可能です。
 
 <br>
 
 
 **79. [Case, Root cause, Remedies]**
 
-&#10230;
+&#10230;[症例、根本原因、改善策]
 
 <br>
 
 
 **80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
 
-&#10230;
+&#10230;[ビーム検索の誤り、RNNの誤り、ビーム幅の拡大、さまざまなアーキテクチャを試す、正規化、データをさらに取得] 
 
 <br>
 
 
 **81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
 
-&#10230;
+&#10230;Bleuスコア - バイリンガル正確性の代替評価（bleu）スコアは、n-gramの精度に基づき類似性スコアを計算することで、機械翻訳がどれほど優れているかを定量化します。以下のように定義されています。
 
 <br>
 
 
 **82. where pn is the bleu score on n-gram only defined as follows:**
 
-&#10230;
+&#10230;ここで、pnは、唯一定義されたn-gramでのbleuスコアです。定義は下記のようになります。
 
 <br>
 
 
 **83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
 
-&#10230;
+&#10230;注：人為的に水増しされたブルースコアを防ぐために、短い翻訳評価には簡潔なペナルティが適用される場合があります。
 
 <br>
 
 
 **84. Attention**
 
-&#10230;
+&#10230;アテンション
 
 <br>
 
 
 **85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
 
-&#10230;
+&#10230;アテンションモデル - このモデルはRNNが重要であると考えられる特定の入力部分に注目することで、モデルの実際の性能結果を向上させます。時点tにおける出力y<t>が、活性化関数a<t'>およびコンテキストc <t>に注目するとき、α<t、t'>はアテンション量と定義されます。式は次のようになります。
 
 <br>
 
 
 **86. with**
 
-&#10230;
+&#10230;ウェイト
 
 <br>
 
 
 **87. Remark: the attention scores are commonly used in image captioning and machine translation.**
 
-&#10230;
+&#10230;注：アテンションスコアは、一般的に画像のキャプション作成および機械翻訳で使用されています。*
 
 <br>
 
 
 **88. A cute teddy bear is reading Persian literature.**
 
-&#10230;
+&#10230;かわいいテディベアがペルシャ文学を読んでいます。
 
 <br>
 
 
 **89. Attention weight ― The amount of attention that the output y<t> should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
 
-&#10230;
+&#10230;アテンションの重み - 出力y<t>が活性化関数a<t'>で表現されるアテンションのウェイト量α<t,t>は、次のように計算されます。
 
 <br>
 
 
 **90. Remark: computation complexity is quadratic with respect to Tx.**
 
-&#10230;
+&#10230;注意：この計算の複雑さはTxの２次関数です。
 
 <br>
 
 
 **91. The Deep Learning cheatsheets are now available in [target language].**
 
-&#10230;
+&#10230;ディープラーニングのチートシートが[対象言語]で利用可能になりました。
 
 <br>
 
 **92. Original authors**
 
-&#10230;
+&#10230;原作者
 
 <br>
 
 **93. Translated by X, Y and Z**
 
-&#10230;
+&#10230;X,YそしてZにより翻訳されました。
 
 <br>
 
 **94. Reviewed by X, Y and Z**
 
-&#10230;
+&#10230;X,YそしてZにより校正されました。
 
 <br>
 
 **95. View PDF version on GitHub**
 
-&#10230;
+&#10230;GitHubでPDF版を見る
 
 <br>
 
 **96. By X and Y**
 
-&#10230;
+&#10230;XそしてYによる。
 
 <br>

From 5bbc3c3e2e6d2a29283015fd2ca83c137a90fc16 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Mon, 27 May 2019 13:36:18 -0700
Subject: [PATCH 181/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index dd151c4c8..7d8cd9acd 100644
--- a/README.md
+++ b/README.md
@@ -38,7 +38,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |Convolutional Neural Nets|not started|done|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
-|Recurrent Neural Nets|not started|done|done|not started|not started|not started|
+|Recurrent Neural Nets|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/146)|not started|not started|
 |DL tips and tricks|not started|done|done|not started|not started|not started|
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|

From d0e95b348f13ffee783c86b6a317eff6c68114e3 Mon Sep 17 00:00:00 2001
From: tt-anh-eole <tt-anh@eole.co.jp>
Date: Tue, 28 May 2019 17:21:09 +0900
Subject: [PATCH 182/531] Start translating

---
 ja/convolutional-neural-networks.md | 716 ++++++++++++++++++++++++++++
 1 file changed, 716 insertions(+)
 create mode 100644 ja/convolutional-neural-networks.md

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
new file mode 100644
index 000000000..5f13a2577
--- /dev/null
+++ b/ja/convolutional-neural-networks.md
@@ -0,0 +1,716 @@
+**Convolutional Neural Networks translation**
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230; 畳み込み神経の網チートシート
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - 深層学習
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+&#10230; [概要、アーキテクチャ構造]
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230; [層のタイプ、畳み込み、プーリング、完全に接続された]
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230;
+
+<br> [フィルタハイパーパラメータ、寸法、ストライド、詰め物]
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230;
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230; [活性化関数、修正済み線形単位、ソフトマックス]
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230; [オブジェクト検出、モデルのタイプ、検出、組合の上の交差点、非最大抑制、YOLO、R-CNN]
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230;
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230;
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230;
+
+<br>
+
+
+**12. Overview**
+
+&#10230; 概要
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230;
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230;
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230; 層のタイプ
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230;
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230;
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230;
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230;
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230;
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230;
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230;
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230; フィルタハイパーパラメータ
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230;
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230;
+
+<br>
+
+
+**26. Filter**
+
+&#10230; Filter
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230;
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230;
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230;
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230; [モード, 値, Illustration, 目的, 有効, 同様, Full]
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230;
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230;
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230;
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230; [入力, フィルタ, 出力]
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230;
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230;
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230;
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230;
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230;
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230;
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230;
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230;
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230;
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230;
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230;
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230;
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**48. where**
+
+&#10230;
+
+<br>
+
+
+**49. Object detection**
+
+&#10230;
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230;
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230;
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230;
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230;
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230;
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230;
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230;
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230;
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230;
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230;
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230;
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230;
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230;
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230;
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230;
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230;
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230;
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230;
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230;
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230;
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230;
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230;
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230;
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230;
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230;
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230;
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230;
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230;
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230;
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230;
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230;
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230;
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230;
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230;
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230;
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230;
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230;
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230;
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230;
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230;
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230;
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230;
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230;
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+
+**98. Original authors**
+
+&#10230;
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230;
+
+<br>

From 906bdd85ac4a0f53ec0b07771e1ccca9da017ad2 Mon Sep 17 00:00:00 2001
From: tt-anh-eole <tt-anh@eole.co.jp>
Date: Tue, 28 May 2019 18:19:10 +0900
Subject: [PATCH 183/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index 5f13a2577..71958fddd 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -179,7 +179,7 @@
 
 **26. Filter**
 
-&#10230; Filter
+&#10230; フィルタ
 
 <br>
 
@@ -207,7 +207,7 @@
 
 **30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
 
-&#10230; [モード, 値, Illustration, 目的, 有効, 同様, Full]
+&#10230; [モード, 値, 図, 目的, 有効, 同様, フル]
 
 <br>
 
@@ -221,7 +221,7 @@
 
 **32. Tuning hyperparameters**
 
-&#10230;
+&#10230; 調律ハイパーパラメータ
 
 <br>
 
@@ -256,7 +256,7 @@
 
 **37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
 
-&#10230;
+&#10230; [図、入力サイズ、出力サイズ、引数の数、備考]
 
 <br>
 
@@ -648,7 +648,7 @@
 
 **93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
 
-&#10230;
+&#10230; [トレーニング、騒音、現実世界の画像、ジェネレータ、弁別器、偽のリアル]
 
 <br>
 
@@ -683,21 +683,21 @@
 
 **98. Original authors**
 
-&#10230;
+&#10230; 原著者
 
 <br>
 
 
 **99. Translated by X, Y and Z**
 
-&#10230;
+&#10230; X、Y、Zによる翻訳された
 
 <br>
 
 
 **100. Reviewed by X, Y and Z**
 
-&#10230;
+&#10230; X、Y、Zによるレビューされた
 
 <br>
 
@@ -711,6 +711,6 @@
 
 **102. By X and Y**
 
-&#10230;
+&#10230; X、yによる
 
 <br>

From 39d490efa848f4e207c9e6c3cdf8c77e8cc89c57 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Tue, 28 May 2019 23:40:34 +0900
Subject: [PATCH 184/531] Update japanese translating

---
 ja/convolutional-neural-networks.md | 32 ++++++++++++++---------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index 71958fddd..2d32c799b 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -39,7 +39,7 @@
 
 **6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
 
-&#10230;
+&#10230; [調律ハイパーパラメータ、パラメータの互換性、モデルの複雑、受容的なフィールド]
 
 <br>
 
@@ -60,14 +60,14 @@
 
 **9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
 
-&#10230;
+&#10230; [顔認証/認識、一発学習、シャムネットワーク、トリプレット損失]
 
 <br>
 
 
 **10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
 
-&#10230;
+&#10230; [神経スタイル転送、活性化、スタイル行列、スタイル/コンテンツコスト関数]
 
 <br>
 
@@ -88,14 +88,14 @@
 
 **13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
 
-&#10230;
+&#10230; 伝統的な畳み込み神経の網のアーキテクチャ - CNNとも知られる畳み込み神経の網は一般的に次の層で構成されている特定タイプの神経の網です。
 
 <br>
 
 
 **14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
 
-&#10230;
+&#10230; 畳み込み層とプール層は次のセクションで説明されるハイパーパラメータに関して微調整されられる。
 
 <br>
 
@@ -116,7 +116,7 @@
 
 **17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
 
-&#10230;
+&#10230; 注意: 畳み込みステップは1D及び3Dの場合にも一般化されられる。
 
 <br>
 
@@ -130,7 +130,7 @@
 
 **19. [Type, Purpose, Illustration, Comments]**
 
-&#10230;
+&#10230; [タイプ、目的、図、コメント]
 
 <br>
 
@@ -165,7 +165,7 @@
 
 **24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
 
-&#10230;
+&#10230; 
 
 <br>
 
@@ -207,7 +207,7 @@
 
 **30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
 
-&#10230; [モード, 値, 図, 目的, 有効, 同様, フル]
+&#10230; [モード、値、図、目的、有効、同様、フル]
 
 <br>
 
@@ -235,7 +235,7 @@
 
 **34. [Input, Filter, Output]**
 
-&#10230; [入力, フィルタ, 出力]
+&#10230; [入力、フィルタ、出力]
 
 <br>
 
@@ -333,14 +333,14 @@
 
 **48. where**
 
-&#10230;
+&#10230; どこ
 
 <br>
 
 
 **49. Object detection**
 
-&#10230;
+&#10230; オブジェクト検出
 
 <br>
 
@@ -375,7 +375,7 @@
 
 **54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
 
-&#10230;
+&#10230; [伝統的なCNN、単純されたYOLO、R-CNN、YOLO、R-CNN]
 
 <br>
 
@@ -515,7 +515,7 @@
 
 **74. Face verification and recognition**
 
-&#10230;
+&#10230; 顔認証及び認識
 
 <br>
 
@@ -529,7 +529,7 @@
 
 **76. [Face verification, Face recognition, Query, Reference, Database]**
 
-&#10230;
+&#10230; [顔認証、顔認識、クエリ、参照、データベース]
 
 <br>
 
@@ -711,6 +711,6 @@
 
 **102. By X and Y**
 
-&#10230; X、yによる
+&#10230; X、Yによる
 
 <br>

From 262760fb846cd3964d3dc294cedae7d64b4044f7 Mon Sep 17 00:00:00 2001
From: tt-anh-eole <tt-anh@eole.co.jp>
Date: Wed, 29 May 2019 12:05:33 +0900
Subject: [PATCH 185/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index 2d32c799b..0f06518d1 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -74,7 +74,7 @@
 
 **11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
 
-&#10230;
+&#10230; [計算詭計アーキテクチャ、生成型敵対的ネットワーク、ResNet、インセプションネットワーク]
 
 <br>
 
@@ -524,7 +524,7 @@
 
 &#10230;
 
-<br>
+<br> モデルのタイプ - 主な二つのモデルは次の表で要約される:
 
 
 **76. [Face verification, Face recognition, Query, Reference, Database]**
@@ -564,7 +564,7 @@
 
 **81. Neural style transfer**
 
-&#10230;
+&#10230; 神経のスタイル転送
 
 <br>
 
@@ -578,7 +578,7 @@
 
 **83. [Content C, Style S, Generated image G]**
 
-&#10230;
+&#10230; [コンテンツC、スタイルS、生成された画像G]
 
 <br>
 
@@ -634,7 +634,7 @@
 
 **91. Architectures using computational tricks**
 
-&#10230;
+&#10230; アーキテクチャは計算の詭計を利用している。
 
 <br>
 
@@ -655,7 +655,7 @@
 
 **94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
 
-&#10230;
+&#10230; 注意: GANsの変種を使用するユースケースには画像へのテキスト、音楽生成及び合成があります。
 
 <br>
 
@@ -678,7 +678,7 @@
 
 &#10230;
 
-<br>
+<br> 深層学習チートシートは今[ターゲット言語]で利用可能です。
 
 
 **98. Original authors**
@@ -706,7 +706,7 @@
 
 &#10230;
 
-<br>
+<br> GithubでPDFバージョン見る
 
 
 **102. By X and Y**

From 9ec3c962d27e3c4fe2c3cce85529beb11006a007 Mon Sep 17 00:00:00 2001
From: tt-anh-eole <tt-anh@eole.co.jp>
Date: Wed, 29 May 2019 16:39:07 +0900
Subject: [PATCH 186/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 62 ++++++++++++++---------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index 0f06518d1..0e9d0794f 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -18,14 +18,14 @@
 
 **3. [Overview, Architecture structure]**
 
-&#10230; [概要、アーキテクチャ構造]
+&#10230; [概要, アーキテクチャ構造]
 
 <br>
 
 
 **4. [Types of layer, Convolution, Pooling, Fully connected]**
 
-&#10230; [層のタイプ、畳み込み、プーリング、完全に接続された]
+&#10230; [層のタイプ, 畳み込み, プーリング, 完全に接続された]
 
 <br>
 
@@ -34,47 +34,47 @@
 
 &#10230;
 
-<br> [フィルタハイパーパラメータ、寸法、ストライド、詰め物]
+<br> [フィルタハイパーパラメータ, 寸法, ストライド, 詰め物]
 
 
 **6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
 
-&#10230; [調律ハイパーパラメータ、パラメータの互換性、モデルの複雑、受容的なフィールド]
+&#10230; [調律ハイパーパラメータ, パラメータの互換性, モデルの複雑, 受容的なフィールド]
 
 <br>
 
 
 **7. [Activation functions, Rectified Linear Unit, Softmax]**
 
-&#10230; [活性化関数、修正済み線形単位、ソフトマックス]
+&#10230; [活性化関数, 修正済み線形単位, ソフトマックス]
 
 <br>
 
 
 **8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
 
-&#10230; [オブジェクト検出、モデルのタイプ、検出、組合の上の交差点、非最大抑制、YOLO、R-CNN]
+&#10230; [オブジェクト検出, モデルのタイプ, 検出, 組合の上の交差点, 非最大抑制, YOLO, R-CNN]
 
 <br>
 
 
 **9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
 
-&#10230; [顔認証/認識、一発学習、シャムネットワーク、トリプレット損失]
+&#10230; [顔認証/認識, 一発学習, シャムネットワーク, トリプレット損失]
 
 <br>
 
 
 **10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
 
-&#10230; [神経スタイル転送、活性化、スタイル行列、スタイル/コンテンツコスト関数]
+&#10230; [神経スタイル転送, 活性化, スタイル行列, スタイル/コンテンツコスト関数]
 
 <br>
 
 
 **11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
 
-&#10230; [計算詭計アーキテクチャ、生成型敵対的ネットワーク、ResNet、インセプションネットワーク]
+&#10230; [計算詭計アーキテクチャ, 生成型敵対的ネットワーク, ResNet, インセプションネットワーク]
 
 <br>
 
@@ -109,7 +109,7 @@
 
 **16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
 
-&#10230;
+&#10230; 畳み込み層 (CONV) - 畳み込み層 (CONV)は入力Iを寸法に関して走査している時畳み込みオペレーションズを行うフィルタを使用する。畳み込み層のハイパーパラメータにはフィルタサイズFとストライドSが含まれる。結果出力0は特徴図及び活性化図で呼ばれる。
 
 <br>
 
@@ -123,21 +123,21 @@
 
 **18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
 
-&#10230;
+&#10230; プーリング (POOL) - プール層 (POOL)はダウンサンプリング操作で、通常は空間的に不変な畳み込み層の後に適用される。特に、最大及び平均プーリングはそれぞれ最大と平均値が取られる特別な種類のプールです。
 
 <br>
 
 
 **19. [Type, Purpose, Illustration, Comments]**
 
-&#10230; [タイプ、目的、図、コメント]
+&#10230; [タイプ, 目的, 図, コメント]
 
 <br>
 
 
 **20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
 
-&#10230;
+&#10230; [最大プール, 平均プール, 各プール操作は現在ビューの最大値を選ぶ, 各プール操作は現在ビューの値を平均する]
 
 <br>
 
@@ -207,7 +207,7 @@
 
 **30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
 
-&#10230; [モード、値、図、目的、有効、同様、フル]
+&#10230; [モード, 値, 図, 目的, 有効, 同様, フル]
 
 <br>
 
@@ -235,7 +235,7 @@
 
 **34. [Input, Filter, Output]**
 
-&#10230; [入力、フィルタ、出力]
+&#10230; [入力, フィルタ, 出力]
 
 <br>
 
@@ -256,7 +256,7 @@
 
 **37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
 
-&#10230; [図、入力サイズ、出力サイズ、引数の数、備考]
+&#10230; [図, 入力サイズ, 出力サイズ, 引数の数, 備考]
 
 <br>
 
@@ -298,7 +298,7 @@
 
 **43. Commonly used activation functions**
 
-&#10230;
+&#10230; よく使われる活性化関数
 
 <br>
 
@@ -312,7 +312,7 @@
 
 **45. [ReLU, Leaky ReLU, ELU, with]**
 
-&#10230;
+&#10230;[ReLU, Leaky ReLU, ELU, with]
 
 <br>
 
@@ -354,14 +354,14 @@
 
 **51. [Image classification, Classification w. localization, Detection]**
 
-&#10230;
+&#10230; [画像分類, 分類 w. ]
 
 <br>
 
 
 **52. [Teddy bear, Book]**
 
-&#10230;
+&#10230; [テディ熊, 本]
 
 <br>
 
@@ -375,7 +375,7 @@
 
 **54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
 
-&#10230; [伝統的なCNN、単純されたYOLO、R-CNN、YOLO、R-CNN]
+&#10230; [伝統的なCNN, 単純されたYOLO, R-CNN, YOLO, R-CNN]
 
 <br>
 
@@ -389,7 +389,7 @@
 
 **56. [Bounding box detection, Landmark detection]**
 
-&#10230;
+&#10230; [物体検出, ランドマーク検出]
 
 <br>
 
@@ -452,7 +452,7 @@
 
 **65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
 
-&#10230;
+&#10230; YOLO - 貴方は一度だけ見る (YOLO)は次のステップを実行するオブジェクト検出アルゴリズムです。
 
 <br>
 
@@ -501,7 +501,7 @@
 
 **72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
 
-&#10230;
+&#10230; [原画像, セグメンテーション, 物体予測, 非最大抑制]
 
 <br>
 
@@ -529,7 +529,7 @@
 
 **76. [Face verification, Face recognition, Query, Reference, Database]**
 
-&#10230; [顔認証、顔認識、クエリ、参照、データベース]
+&#10230; [顔認証, 顔認識, クエリ, 参照, データベース]
 
 <br>
 
@@ -578,7 +578,7 @@
 
 **83. [Content C, Style S, Generated image G]**
 
-&#10230; [コンテンツC、スタイルS、生成された画像G]
+&#10230; [コンテンツC, スタイルS, 生成された画像G]
 
 <br>
 
@@ -648,14 +648,14 @@
 
 **93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
 
-&#10230; [トレーニング、騒音、現実世界の画像、ジェネレータ、弁別器、偽のリアル]
+&#10230; [トレーニング, 騒音, 現実世界の画像, ジェネレータ, 弁別器, 偽のリアル]
 
 <br>
 
 
 **94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
 
-&#10230; 注意: GANsの変種を使用するユースケースには画像へのテキスト、音楽生成及び合成があります。
+&#10230; 注意: GANsの変種を使用するユースケースには画像へのテキスト, 音楽生成及び合成があります。
 
 <br>
 
@@ -690,14 +690,14 @@
 
 **99. Translated by X, Y and Z**
 
-&#10230; X、Y、Zによる翻訳された
+&#10230; X, Y, Zによる翻訳された
 
 <br>
 
 
 **100. Reviewed by X, Y and Z**
 
-&#10230; X、Y、Zによるレビューされた
+&#10230; X, Y, Zによるレビューされた
 
 <br>
 
@@ -711,6 +711,6 @@
 
 **102. By X and Y**
 
-&#10230; X、Yによる
+&#10230; X, Yによる
 
 <br>

From 05014b63bddf20ff2047087c37baee821ec0aec6 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Wed, 29 May 2019 00:52:15 -0700
Subject: [PATCH 187/531] Add new Japanese entries

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 7d8cd9acd..70a17c0b6 100644
--- a/README.md
+++ b/README.md
@@ -37,7 +37,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 ## Progression for CS 230 (Deep Learning)
 |Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|done|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
+|Convolutional Neural Nets|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
 |Recurrent Neural Nets|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/146)|not started|not started|
 |DL tips and tricks|not started|done|done|not started|not started|not started|
 

From d2f4b9c48dc475aa7fd05e92fe1e15792838c442 Mon Sep 17 00:00:00 2001
From: tt-anh-eole <tt-anh@eole.co.jp>
Date: Wed, 29 May 2019 18:29:27 +0900
Subject: [PATCH 188/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index 0e9d0794f..35644f122 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -25,7 +25,7 @@
 
 **4. [Types of layer, Convolution, Pooling, Fully connected]**
 
-&#10230; [層のタイプ, 畳み込み, プーリング, 完全に接続された]
+&#10230; [層のタイプ, 畳み込み, プーリング, 完全接続]
 
 <br>
 
@@ -144,14 +144,14 @@
 
 **21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
 
-&#10230;
+&#10230; [検出された特徴保持, 最も一般的に利用される, ダウンサンプル特徴図, LeNetで利用される]
 
 <br>
 
 
 **22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
 
-&#10230;
+&#10230; 完全接続 (FC) - 完全接続層は各入力は全ての神経に接続されているフラット化入力で動く。
 
 <br>
 
@@ -368,7 +368,7 @@
 
 **53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
 
-&#10230;
+&#10230; [画像分類, オブジェクトの確率予測, 画像内のオブジェクト検出, オブジェクトの確率と所在地予測, 画像内の複数オブジェクト検出, 複数オブジェクトの確率と所在地予測]
 
 <br>
 

From a8048e8d95c2b4667a9fa1d1e760d3e1e638a27a Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Wed, 29 May 2019 18:40:22 -0700
Subject: [PATCH 189/531] Update links for Bahasa Indonesia

---
 README.md | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 70a17c0b6..b1474508c 100644
--- a/README.md
+++ b/README.md
@@ -53,6 +53,12 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Recurrent Neural Nets|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|
 |DL tips and tricks|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
 
+|Cheatsheet topic|Bahasa Indonesia|
+:---|:---:|
+|Convolutional Neural Nets|not started|
+|Recurrent Neural Nets|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|
+|DL tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
+
 ## Progression for CS 229 (Machine Learning)
 |Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
@@ -85,12 +91,12 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 |Cheatsheet topic|Magyar|Deutsch|Bahasa Indonesia|
 |:---|:---:|:---:|:---:|
-|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
+|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|
 |Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
 |Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|
 |ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|
-|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
-|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
+|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|
+|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
 
 
 ## Acknowledgements

From 10e7f29e91b78652d2982769211dc836c45fd520 Mon Sep 17 00:00:00 2001
From: tt-anh-eole <tt-anh@eole.co.jp>
Date: Thu, 30 May 2019 16:45:36 +0900
Subject: [PATCH 190/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index 35644f122..919efe74b 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -151,7 +151,7 @@
 
 **22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
 
-&#10230; 完全接続 (FC) - 完全接続層は各入力は全ての神経に接続されているフラット化入力で動く。
+&#10230; 完全接続 (FC) - 完全接続層は各入力は全ての神経に接続されているフラット化入力で動く。存在する場合、FC層は通常CNNアーキテクチャの終わりに向かって見られ、クラススコアなどの目的を最適化するため利用される。
 
 <br>
 
@@ -263,7 +263,7 @@
 
 **38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
 
-&#10230;
+&#10230; [フィルタにあたり1つのバイアスパラメータ, ほとんどの場合, S<F, Kの一般的な選択は2C]
 
 <br>
 
@@ -349,7 +349,7 @@
 
 &#10230;
 
-<br>
+<br> モデルの種類 - 物体認識アルゴリズムは主に三つのタイプがあり、予測されるものの性質は異なります。次の表で説明される。
 
 
 **51. [Image classification, Classification w. localization, Detection]**
@@ -382,7 +382,7 @@
 
 **55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
 
-&#10230;
+&#10230; 検出 - 物体検出の文脈では、画像内で物体を特定するのかそれとも複雑な形状を検出するのかによって、様々な方法は使用される。二つの主なものは次の表でまとめられる。
 
 <br>
 
@@ -480,7 +480,7 @@
 
 **69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
 
-&#10230;
+&#10230; [原画像, GxGグリッドでの分割, 物体検出, 非最大抑制]
 
 <br>
 
@@ -508,8 +508,8 @@
 
 **73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
 
-&#10230;
-
+&#10230; 注意: 原アルゴリズムは計算コストが高くて遅くても、より新たなアーキテクチャでは、
+Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実行できる。
 <br>
 
 
From 68cb3af84d6fb83c8814ff92d92c79d9705769b3 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Thu, 30 May 2019 23:09:20 +0900
Subject: [PATCH 191/531] [ja] Convolutional Neural Network

---
 ja/convolutional-neural-networks.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index 919efe74b..27c9d3f23 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -550,7 +550,7 @@ Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実
 
 **79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
 
-&#10230;
+&#10230; シャムネットワー - シャムネットワーは2つの画像の違いを定量化して、画像暗号化方法を学ぶことを目的としている。与えられたインプット画像x(i)に対して暗号化された出力はしばしばf(x(i))と表示される。
 
 <br>
 
@@ -573,7 +573,7 @@ Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実
 
 &#10230;
 
-<br>
+<br> モチベーション - 神経のスタイル転送の目的は与えられたコンテンツCとスタイルSに基づく画像Gを生成する。
 
 
 **83. [Content C, Style S, Generated image G]**
@@ -587,7 +587,7 @@ Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実
 
 &#10230;
 
-<br>
+<br> 活性化 - 与えられた層Lで、活性化はa[l]と表示されて、nH×nw×ncの寸法。
 
 
 **85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**

From a2916e74acdcbebfc65791822eb07368a2a5be4a Mon Sep 17 00:00:00 2001
From: tt-anh-eole <tt-anh@eole.co.jp>
Date: Fri, 31 May 2019 16:44:20 +0900
Subject: [PATCH 192/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index 27c9d3f23..fbcab4103 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -165,7 +165,7 @@
 
 **24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
 
-&#10230; 
+&#10230; 畳み込み層にはハイパーパラメータの背後にある意味を知ることが重要なフィルタが含まれる。
 
 <br>
 
@@ -193,7 +193,7 @@
 
 **28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
 
-&#10230;
+&#10230; ストライド - 畳み込みまたはプール操作に対して、ストライドSはそれぞれの操作の後にウィンドウに移動されるピクセル数を表示する。
 
 <br>
 
@@ -242,7 +242,7 @@
 
 **35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
 
-&#10230;
+&#10230; 注意: しばしば、Pstart=Pend≜P、その場合、上記の式のようにPstart+Pendを2Pに置き換える事ができる。
 
 <br>
 
@@ -270,7 +270,7 @@
 
 **39. [Pooling operation done channel-wise, In most cases, S=F]**
 
-&#10230;
+&#10230; [プール操作はチャネルごとに行われる, ほとんどの場合, S=F]
 
 <br>
 
@@ -291,7 +291,7 @@
 
 **42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
 
-&#10230;
+&#10230; 下記の例で、F1=F2=3、S1=S2=1となるのでR2=1+2⋅1+2⋅1=5となる。
 
 <br>
 
@@ -319,14 +319,14 @@
 
 **46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
 
-&#10230;
+&#10230; 
 
 <br>
 
 
 **47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
 
-&#10230;
+&#10230; ソフトマックス - ソフトマックスステップは入力としてx∈Rnスコアのベクターを取り、アーキテクチャの最後にソフトマックス関数を通じてp∈Rn出力確率のベクターを出して、一般化ロジスティック関数として見る事ができる。
 
 <br>
 
@@ -396,7 +396,7 @@
 
 **57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
 
-&#10230;
+&#10230; [物体が配置されている画像の部分検出, (例: 目)物体の特徴または形状検出, より粒状]
 
 <br>
 
@@ -445,7 +445,7 @@
 
 **64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
 
-&#10230;
+&#10230; [ボックス予測, 最大確率のボックス選択, 同じクラスの重なり合う除去, 最後のバウンディングボックス]
 
 <br>
 
@@ -459,7 +459,7 @@
 
 **66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
 
-&#10230;
+&#10230; [ステップ1: 入力画像をGxGグリッドに分ける。, ステップ2: ]
 
 <br>
 

From 42930d2870b7c8a06e1cd8a485dc453beeb3f875 Mon Sep 17 00:00:00 2001
From: unknown <prasetiautama2@gmail.com>
Date: Fri, 31 May 2019 17:45:10 +0900
Subject: [PATCH 193/531] Translating CNN to Indonesian

---
 id/convolutional-neural-networks.md | 716 ++++++++++++++++++++++++++++
 1 file changed, 716 insertions(+)
 create mode 100644 id/convolutional-neural-networks.md

diff --git a/id/convolutional-neural-networks.md b/id/convolutional-neural-networks.md
new file mode 100644
index 000000000..366580e93
--- /dev/null
+++ b/id/convolutional-neural-networks.md
@@ -0,0 +1,716 @@
+**Convolutional Neural Networks translation**
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230;Cheatsheet Convolutional Neural Network
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230;Deep Learning
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+&#10230;[Overview, Struktur Arsitektur]
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230;[Jenis-jenis layer, Covolution, Pooling, Fully connected]
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230;[Hyperparameters filter, Dimensi, Stride, Padding]
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230;[Hyperparameters tuning, Kompability parameter, Kompleksitas model, Receptive field]
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230;[Fungsi-fungsi aktifasi, Rectified Linear Unit, Softmax]
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230;[Deteksi objek, Tipe-tipe model, Deteksi, Intersection over Union, Non-max suppression, YOLO, R-CNN]
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230;[Verifikasi/rekognisi wajah, One shot learning, Siamese network, Loss triplet]
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230;[Transfer neural style, Aktifasi, Matriks style, Fungsi cost style/konten]
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230;[Arkitektur trik komputasi, Generative Adversarial Net, ResNet, Inception Network]
+
+<br>
+
+
+**12. Overview**
+
+&#10230;Overview
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230;Arkitektur dari sebuah tradisional CNN - Convolutional neural network, juga dikenal sebagai CNN, adalah sebuah tipe khusus dari neural network yang secara general terdiri dari layer-layer berikut:
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230;Layer convolution and layer pooling dapat disesuaikan terhadap hyperparameters yang dijelaskan pada sesi selanjutnya.
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230;Jenis-jenis layer
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230;Layer convolution - Layer convolution (CONV) menggunakan filter yang melakukan operasi konvolusion seakan CONV net menscan masukan I berdasarkan dimensinya. Hyperparameter dari CONV meliputi ukuran filter F dan ukuran stride S. Keluaran hasil O disebut feature map atau activation map.
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230;Perlu diingat: tahap konvolusion dapat digeneralisasi terhadap masukan 1D dan 3D.
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230;Pooling (POOL) - Layer pooling adalah sebuah operasi downsampling, biasanya diaplikasikan setelah sebauh layer convolution, yang mengnyebabkan ianvariansi spasial. Pada khususnya, pooling max dan average adalah jenis khusus dari pooling layer masing-masing mengambil nilai maksimum dan rata-rata.
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230;[Jenis, Tujuan, Ilustrasi, Komentar]
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230;[Pooling max, Pooling average, Setiap operasi pooling mengambil nilai tertinggi dari tinjauan sekarang, Setiap operasi poling menghitung rata-rata dari tinjauan sekarang]
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230;[Mempertahankan fitur yang terdeteksi, Yang biasa dipakai, Downsample feature map, Digunakan di LeNet]
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230;Fully Connected (FC) - Fully connected layer (FC) menangani sebuah masukan dijadikan 1D dimana setiap elemen masukan terkoneksi keseluruh neuron. Layer FC biasanya ditemukan pada akhir dari arsitektur CNN dan dapat digunakan untuk mengoptimisasi objektif seperti skor kelas (pada kasus klasifikasi).
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230;Hyperparameters filter
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230;Layer convolutional memuat filter yang mana adalah penting untuk mengerti tentang maksud dari hyperparameter filter tersebut.
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230;Dimensi dari sebuah filter - Sebuah filter dengan ukuran FxF diaplikasikan pada sebuah input yang memuat C channel memiliki volume FxFxC yang melakukan konvolusi pada sebuah input masukan dengan ukuran IxIxC dan menghasilkan sebuah keluaran feature map (juga dikenal activation map) dengan ukuran O×O×1
+
+<br>
+
+
+**26. Filter**
+
+&#10230;Filter
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230;Perlu diperhatikan: aplikasi dari K filter dengan ukuran FxF menhasilkan sebuah keluaran feature map dengan ukuran O×O×K.
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230;Stride - Untuk sebuah konvolution atau sebauh operasi pooling, stide S melambangkan jumlah pixel yang dilewati window setelah setiap operasi.
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230;Zero-padding - Zero-padding melambangkan proses penambahan P nilai 0 pada setiap sisi akhir dari masukan. Nilai dari zero-padding dapat dispesifikasikan secara manual atau secara otomatis melalui salah satu dari tiga mode yang dijelaskan dibawah ini:
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230;[Mode, Nilai, Ilustrasi, Tujuan, Valid, Same, Full]
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230;[No padding, Hapus konvolusi terakhir jika dimensi tidak sesuai, Padding yang menghasilkan feature map dengan ukuran ⌈IS⌉, Ukuran keluaran cocok secara matematis, Juga disebut 'half' padding, Maximum padding menjadikan akhir konvolusi dipasangkan pada batasan dari input, Filter 'melihat' masukan end-to-end]
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230;Menyetel hyperparameters
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230;Kompabilitas hyperparameter pada layer konvolusion - Dengan menuliskan I sebagai panjang dari ukuran volume masukan, F sebagai panjang dari filter, P sebagai jumlah zero padding, S sebagai stride, maka ukuran keluaran O dari feature map pada dimensi tersebut dituliskan sebagai:
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230;[Masukan, Filter, Keluaran]
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230;Perlu diperhatikan: sering, Pstart=Pend≜P, yang mana pada kasus tersebut kita dapat mengganti Pstart+Pend dengan 2P pada formula diatas.
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230;Memahami kompleksitas dari model - Untuk menilai kompleksitas dari sebuah model, sangatlah penting untuk menentukan jumlah parameter yang arsitektur dari model akan miliki. Pada sebuah convolutional neural network, hal tersebut dilakukan sebagai berikut:
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230;[Ilustrasi, Ukuran masukan, Ukuran keluaran, Jumlah parameter, Catatan]
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230;[Satu parameter bias untuk setiap filter, Pada banyak kasus, S<F, Sebuah pilihan yang umum untuk K berinali 2C]
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230;[Operasi pooling dan dilakukan channel-wise, Pada banyak kasus, S=F]
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230;[Input diratakan(menjadi 1D), Satu parameter bias untuk setiap neuron, Jumlah dari neuron FC adalah bebas dari batasan struktural.]
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230;Receptive field - Receptive field pada layer k adalah area yang dinotasikan RkxRk dari input yang setiap pixel dari k-th activation map dapat 'melihat'. Dengan menulasikan Fj sebagai ukuran filter dari layer j dan Si sebagai nilai stride pada layer i dan dengan konvensi S0=1, receptive field pada layer K dapat dihitung dengan formula berikut:
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230;Pada contoh dibawah ini, kita memiliki F1=f2=3 dan S1=S2=1, yang menghasilkan R2=1+2⋅1+2⋅1=5.
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230;Fungsi-fungsi aktifasi yang biasa dipakai
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230;Rectified Linear Unit - Layer rectified linear unit (ReLU) adalah sebuah fungsi aktifasi g yang digunakan pada seluruh elemen. Penggunaan ReLU adalah untuk memasukan non-linearitas ke network. Variasi-variasi dari ReLU dirangkum pada tebel dibawah ini:
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230;[ReLU, Leaky ReLU, ELU, dengan]
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230;[Kompleksitas non-linearitas yang dapat diinterpretasikan secara biologi, Menangani permasalahan dying ReLU yang terjadi untuk nilai negatif, Dapat diturunkan]
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230;Softmax - Langkah softmax dapat dilihat sebagai fungsi logistik yang digeneralisasi yang mengambil masukan sebuah vektor x∈Rn dan mengeluarkan sebuah probabilitas vektor p∈Rn melalui sebuah fungsi softmax pada akhir arsitektur network. Softmax didefinisikan sebagai berikut:
+
+<br>
+
+
+**48. where**
+
+&#10230;Dimana
+
+<br>
+
+
+**49. Object detection**
+
+&#10230;Deteksi objek
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230;Tipe-tipe model - Ada tiga tipe utama dari algoritma rekognisi objek, yang mana berbeda pada hal yang diprediksi. Tipe-tipe tersebut dijelaskan pada tabel dibawah ini:
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230;[Klasifikasi gambar, Klasifikasi w. lokalisasi, Deteksi]
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230;[Boneka beruang, Buku]
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230;[Mengklasifikasikan sebuah gambar, Memprediksi probabilitas dari objek, Mendeteksi objek pada sebuah gambar, Memprediksi probabilitas dari objek dan lokasinya pada gambar, Mendeteksi hingga beberapa objek pada sebuah gambar, Memprediksi probabilitas dari objek-objek dan dimana lokasi mereka]
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230;[CNN tradisional, Simplified YOLO, R-CNN, YOLO, R-CNN]
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230;Deteksi - Pada objek deteksi, metode yang berbeda digunakan tergantung apakah kita hanya ingin untuk mengetahui lokasi objek atau mendeteksi sebuah bentuk yang lebih rumit pada gambar. Dua metode yang utama dirangkum pada tabel dibawah ini:
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230;[Deteksi bounding box, Deteksi landmark]
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230;[Mendeteksi bagian dari gambar dinama objek berlokasi, Mendetek bentuk atau karakteristik dari sebuah objek (contoh: mata), Lebih granular]
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230;[Pusat dari box (bx,by), tinggi bh dan lebah bw, Poin referensi (l1x,l1y), ..., (lnx,lny)]
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230;[Intersection over Union - Intersection over Union, juga dikenal sebagai IoU, adalah sebuah fungsi yang mengkuantifikasi seberapa benar posisi dari sebuah prediksi bounding box Bp terhadap bounding box yang sebenarnya Ba. IoU didefinisikan sebagai berikut:]
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230;Perlu diperhatikan: kita selalu memiliki nilai IoU∈[0,1]. Umumnya, sebuah prediksi bounding box dianggap cukup bagus jika IoU(Bp,Ba)⩾0.5.
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230;Anchor boxes ― Anchor boxing adalah sebuah teknik yang digunakan untuk memprediksi bounding box yang overlap. Pada pengaplikasiannya, network diperbolehkan untuk memprediksi lebih dari satu box secara bersamaan, dimana setiap prediksi box dibatasi untuk memiliki kumpulan properti geometri. Contohnya, prediksi pertama dapat berupa sebuah box persegi panjang untuk sebuah bentuk, sedangkan prediksi kedua adalah persegi panjang lainnya dengan bentuk geometri yang berbeda.
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230;Non-max suppression ― Teknik non-max suppression bertujuan untuk menghapus duplikasi bounding box yang overlap satu sama lain dari sebuah objek yang sama dengan memilih box yang paling representatif. Setelah menghapus seluruh box dengan prediksi probability lebih kecil dari 0.6, langkah berikut diulang selama terdapat box tersisa.
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230;[Untuk sebuah kelas, Langkah 1: Pilih box dengan probabilitas prediksi tertinggi., Langkah 2: Singkirkan box manapun yang yang memiliki IoU⩾0.5 dengan box yang dipilih pada tahap 1.]
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230;[Prediksi-prediksi box, Seleksi box dari probabilitas tertinggi, Penghapusan overlap pada kelas yang sama, Bounding box akhir]
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230;YOLO - You Only Look Once (YOLO) adalah sebuah algoritma deteksi objek yang melakukan langkah-langkah berikut:
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230;Langkah 1: Bagi gambar masukan kedalam sebuah grid dengan ukuran GxG, Langkah 2: Untuk setiap sel grid, gunakan sebuah CNN yang memprediksi y dengan bentuk sebagai berikut; lakukan sebanyak k kali]
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230;dimana pc adalah deteksi probabilitas dari sebuah objek, bx,by,bh,bw adalah properti dari box bounding yang terdeteksi, c1,...,cp adalah representasi one-hot yang mana p classes terdeteksi, dan k adalah jumlah box anchor.
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230;Langkah 3: Jalankan algoritma non-max suppression yang menghapus duplikasi potensial yang mengoverlap box bounding yang sebenarnya.
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230;[Gambar asli, Pembagian kedalam grid berukuran GxG, Prediksi box bounding, Non-max suppression]
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230;Perlu diperhatikan: ketika pc=0, maka netwok tidak mendeteksi objek apapun. Pada kasus seperti itu, prediksi yang bersangkutan bx,...,cp harus diabaikan.
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230;R-CNN ― Region with Convolutional Neural Networks (R-CNN) adalah sebuah algoritma objek deteksi yang pertama-tama mensegmentasi gambar untuk menemukan potensial box-box bounding yang relevan dan selanjutnya menjalankan algoritma deteksi untuk menemukan objek yang paling memungkinkan pada box-box bounding tersebut..
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230;[Gambar asli, Segmentasi, Prediksi box bounding, Non-max suppressio]
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230;Perlu diperhatikan: meskipun algoritma asli dari R-CNN membutuhkan komputasi resource yang besar dan lambar, arsitektur terbaru memungkinan algoritma untuk memiliki performa yang lebih cepat, yang dikenal sebagai Fast R-CNN dan Faster R-CNN.
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230;Verifikasi wajah dan rekognisi
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230;Jenis-jenis model - Dua jenis tipe utama dirangkum pada tabel dibawah ini:
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230;[Ferivikasi wajah, Rekognisi wajah, Query, Referensi, Database]
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230;[Apakah ini adalah orang yang sesuai?, One-to-one lookup, Apakah ini salah satu dari K orang pada database?, One-to-many lookup]
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230;One Shot Learning ― One Shot Learning adalah sebuah algoritma verifikasi wajah yang menggunakan sebuah training set yang terbatas untuk belajar fungsi kemiripan yang mengkuantifikasi seberapa berbeda dua gambar yang diberikan. Fungsi kemiripan yang diaplikasikan pada dua gambar sering dinotasikan sebagai d(image 1,image 2).
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230;Siamese Network ― Siamese Networks didesain untuk mengkodekan gambar dan mengkuantifikasi seberapa berbeda dua buah gambar. Untuk sebuah gambar masukan x(i), keluaran yang dikodekan sering dinotasikan sebagai f(x(i)).
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230;Loss triplet - Loss triplet adalah sebuah fungsi loss yang dihitung pada representasi embedding dari sebuah tiga pasang gambar A (anchor), P (positif) dan N (negatif). Sampel anchor dan positif berdasal dari sebuah kelas yang sama, sedangkan sampel negatif berasal dari kelas yang lain. Dengan menuliskan α∈R+ sebagai parameter margin, fungsi loss ini dapat didefinisikan sebagai berikut:
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230;Transfer neural style
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230;Motivasi: Tujuan dari mentransfer Neural style adalah untuk menghasilakn sebuah gambar G berdasarkan sebuah konten dan sebuah style S.
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230;[Konten C, Style S, gambar yang dihasilkan G]
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230;Aktifasi - Pada sebuah layer l, aktifasi dinotasikan sebagai a[l] dan berdimensi nH×nw×nc
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230;Fungsi cost content - Fungsi cost content Jcontent(C,G) digunakan untuk menghitung perbedaan antara gambar yang dihasilkan dan gambar konten yang sebenarnya C. Fungsi cost content didefinsikan sebagai berikut:
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230;Matriks style - Matriks style G[l] dari sebuah layer l adalah sebuah matrix Gram dimana setiap elemennya G[l]kk′ mengkuantifikasi seberapa besar korelasi antara channel k dan k'. Matriks style didefinisikan terhadap aktifasi a[l] sebagai berikut:
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230;Perlu diperhatikan: matriks style untuk gambar style dan gambar yang dihasilkan masing-masing dituliskan sebagai G[l] (S) dan G[l] (G).
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230;Fungsi cost style - Fungsi cost style Jstyle(S,G) digunakan untuk menentukan perbedaan antara gambar yang dihasilkan G dengan style yang diberikan S. Fungsi tersebut definisikan sebagai berikut:
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230;Fungsi cost overall - Fungsi cost overall didefinisikan sebagai sebuah kombinasi dari fungsi cost konten dan syle, dibobotkan oleh parameter α,β, sebagai berikut:
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230;Perlu diperhatikan: semakin tinggi nilai α akan membuat model lebih memperhatikan konten sedangkan semakin tinggi nilai β akan membuat model lebih memprehatikan style.
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230;Arsitektur menggunakan trik komputasi.
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230;Generative Adversarial Network - Generative adversarial networks, juga dikenala sebagai GANs, terdiri dari sebuah generatif dan diskriminatif  model , dimana generatif model didesain untuk menghasilkan keluaran palsu yang mendekati keluaran sebenarnya yang akan diberikan kepada diskriminatif model yang didesain untuk membedakan gambar palsu dan gambar sebenarnya.
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230;[Training, Noise, Gambar real-world, Generator, Discriminator, Real Fake]
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230;Perlu diperhatikan: penggunaan dari variasi GANs meliputi sistem yang dapat mengubah teks ke gambar, dan menghasilkan dan mensintese musik.
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230; ResNet ― Arsitektur Residual Network (juga disebut ResNet) menggunakan blok-blok residual dengan jumlah layer yang banyak untuk mengurangi training error. Blok residual memiliki karakteristik formula sebagai berikut:
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230;Inception Network ― Arsitektur ini menggunakan modul inception dan didesain dengan tujuan untuk meningkatkan performa network melalu diversifikasi fitur dengan menggunakan CNN yang berbeda-beda. Khususnya, inception model menggunakan trik 1×1 CNN untuk membatasi beban komputasi.
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230;Deep Learning cheatsheet sekarang tersedia di [Bahasa Indonesia]
+
+<br>
+
+
+**98. Original authors**
+
+&#10230;Penulis orisinil
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230;Diterjemahkan oleh X, Y dan Z
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230;Diulas oleh X, Y dan Z
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230;Lihat versi PDF pada GitHub
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230;Oleh X dan Y
+
+<br>

From 7d0ce941b76905f7b54e926a7c8ea0f5764f4c89 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Fri, 31 May 2019 23:12:21 +0900
Subject: [PATCH 194/531] [ja] Convolutional Neural Network

---
 ja/convolutional-neural-networks.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index fbcab4103..58ae9df33 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -354,7 +354,7 @@
 
 **51. [Image classification, Classification w. localization, Detection]**
 
-&#10230; [画像分類, 分類 w. ]
+&#10230; [画像分類, 分類 w. 局地化, 検出]
 
 <br>
 
@@ -403,7 +403,7 @@
 
 **58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
 
-&#10230;
+&#10230; [センターのボックス(bx, by), 縦bhと幅bw, 各参照ポイント　(l1x,l1y), ..., (lnx,lny)]
 
 <br>
 
@@ -438,7 +438,7 @@
 
 **63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
 
-&#10230;
+&#10230; [与えられたクラス, ステップ1: 最大予測確率があるボックスを取り。, ステップ2: 前のボックスと一緒にIoU⩾0.5のボックスを切り捨てる。]
 
 <br>
 
@@ -459,21 +459,21 @@
 
 **66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
 
-&#10230; [ステップ1: 入力画像をGxGグリッドに分ける。, ステップ2: ]
+&#10230; [ステップ1: 入力画像をGxGグリッドに分ける。, ステップ2: 各グリッドセルに対して次の形式のyを予測するCNNを実行する:,k回繰り返す]
 
 <br>
 
 
 **67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
 
-&#10230;
+&#10230; ここで、pcは物体認識の確率、bx,by,bh,bwはバウンディングボックスのプロパーティ、c1, ..., cpはpクラスのうちどれが検出されたかのワンホット表現です。
 
 <br>
 
 
 **68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
 
-&#10230;
+&#10230; 潜在的な重複バウンディングボックスを除去する為に非最大抑制アルゴリズムを実行する。
 
 <br>
 
@@ -613,7 +613,7 @@ Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実
 
 **88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
 
-&#10230;
+&#10230;　スタイルコスト関数 - スタイルコスト関数Jstyle(S,G)はスタイルSと生成された画像Gどう違うかを決定する為利用される。次のように定義される:
 
 <br>
 

From 0b4c4aeeba4f0f09804631834d3c9ca5dc909525 Mon Sep 17 00:00:00 2001
From: Robert Altena <Rob@Ra-ai.com>
Date: Sat, 1 Jun 2019 08:47:09 +0900
Subject: [PATCH 195/531] up ti line 249.

---
 ja/refresher-linear-algebra.md | 25 +++++++++++++------------
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/ja/refresher-linear-algebra.md b/ja/refresher-linear-algebra.md
index 70eb6cdf4..85f381cc6 100644
--- a/ja/refresher-linear-algebra.md
+++ b/ja/refresher-linear-algebra.md
@@ -176,73 +176,74 @@ aTr、i、bTr、iはベクトル行。　ac、j、bc、jはそれぞれAとBの
 **30. Matrix properties**
 
 &#10230;
-
+行列の性質
 <br>
 
 **31. Definitions**
 
 &#10230;
-
+定義
 <br>
 
 **32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
 
 &#10230;
-
+対称分解 ― 行列Aは次のように対称および反対称部分で表現できます。
 <br>
 
 **33. [Symmetric, Antisymmetric]**
 
 &#10230;
-
+[対称、反対称]
 <br>
 
 **34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
 
 &#10230;
-
+ノルムは関数N:V⟶[0,+∞[　Vはベクトル空間、　すべてのx、y∈Vについて：
+]]
 <br>
 
 **35. N(ax)=|a|N(x) for a scalar**
 
 &#10230;
-
+N(ax)=|a|N(x) スカラー用
 <br>
 
 **36. if N(x)=0, then x=0**
 
 &#10230;
-
+N（x）= 0の場合、x = 0
 <br>
 
 **37. For x∈V, the most commonly used norms are summed up in the table below:**
 
 &#10230;
-
+x∈V、一般的に使用されるノルムは、以下の表にまとめられています。
 <br>
 
 **38. [Norm, Notation, Definition, Use case]**
 
 &#10230;
-
+[ノルム、表記法、定義、使用事例]
 <br>
 
 **39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
 
 &#10230;
-
+線形依存 ― ベクトルの集合は、その集合内のベクトルのうちの1つが他のベクトルの線形結合として定義できる場合、線形従属であると言われます。
 <br>
 
 **40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
 
 &#10230;
-
+備考：この方法でベクトルを書くことができない場合、ベクトルは線形独立していると言われます。
 <br>
 
 **41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
 
 &#10230;
-
+行列の階数　―　行列Aの階数をrank（A）と表記します。　それはその列によって生成されたベクトル空間の次元です。これは、Aの線形独立列の最大数に相当します。
 <br>
 
 **42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**

From 1ed7da40bbab4cab2502c732c6c1e34e3685b828 Mon Sep 17 00:00:00 2001
From: Yuta Kanzawa <yutakanzawa@gmail.com>
Date: Sat, 1 Jun 2019 14:17:06 +0900
Subject: [PATCH 196/531] [ja] Supervised Learning

WIP. Third commit

No 45-68; Page 3
---
 ja/cheatsheet-supervised-learning.md | 50 ++++++++++++++--------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/ja/cheatsheet-supervised-learning.md b/ja/cheatsheet-supervised-learning.md
index cb5fac160..11ed450ba 100644
--- a/ja/cheatsheet-supervised-learning.md
+++ b/ja/cheatsheet-supervised-learning.md
@@ -240,7 +240,7 @@
 
 **41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
 
-&#10230;サポートベクターマシンの目的は、データ点から線への最短距離が最大となる線を求めることである。
+&#10230;サポートベクターマシンの目的は、データ点からの最短距離が最大となる境界線を求めることである。
 
 <br>
 
@@ -264,145 +264,145 @@
 
 **45. support vectors**
 
-&#10230;
+&#10230;サポートベクター
 
 <br>
 
 **46. Remark: the line is defined as wTx−b=0.**
 
-&#10230;
+&#10230;備考：直線はwTx−b=0と定義する。
 
 <br>
 
 **47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
 
-&#10230;
+&#10230;ヒンジ損失 ― ヒンジ損失はSVMの設定に用いられ、次のように定義される：
 
 <br>
 
 **48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
 
-&#10230;
+&#10230;カーネル ― 特徴写像をϕとすると、カーネルKは次のように定義される：
 
 <br>
 
 **49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
 
-&#10230;
+&#10230;実際には、K(x,z)=exp(−||x−z||22σ2)と定義され、ガウシアンカーネルと呼ばれるカーネルKがよく使われる。
 
 <br>
 
 **50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
 
-&#10230;
+&#10230;非線形分離問題, カーネル写像の適用, 元の空間における決定境界
 
 <br>
 
 **51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
 
-&#10230;
+&#10230;備考：カーネルを用いてコスト関数を計算する「カーネルトリック」を用いる。なぜなら、明示的な写像ϕを実際には知る必要はないし、それはしばしば非常に複雑になってしまうからである。代わりに、K(x,z)の値のみが必要である。
 
 <br>
 
 **52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
 
-&#10230;
+&#10230;ラグランジアン ― ラグランジアンL(w,b)を次のように定義する：
 
 <br>
 
 **53. Remark: the coefficients βi are called the Lagrange multipliers.**
 
-&#10230;
+&#10230;備考：係数βiはラグランジュ乗数と呼ばれる。
 
 <br>
 
 **54. Generative Learning**
 
-&#10230;
+&#10230;生成学習
 
 <br>
 
 **55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
 
-&#10230;
+&#10230;生成モデルは、P(x|y)を推定することによりデータがどのように生成されるのかを学習しようとする。それはその後ベイズの定理を用いてP(y|x)を推定することに使える。
 
 <br>
 
 **56. Gaussian Discriminant Analysis**
 
-&#10230;
+&#10230;ガウシアン判別分析
 
 <br>
 
 **57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
 
-&#10230;
+&#10230;前提 ― ガウシアン判別分析はyとx|y=0とx|y=1は次のようであることを前提とする：
 
 <br>
 
 **58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
 
-&#10230;
+&#10230;推定 ― 尤度を最大にすると得られる推定量は下表に集約される：
 
 <br>
 
 **59. Naive Bayes**
 
-&#10230;
+&#10230;ナイーブベイズ
 
 <br>
 
 **60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
 
-&#10230;
+&#10230;仮定 ― ナイーブベイズモデルは、個々のデータ点の特徴量が全て独立であると仮定する：
 
 <br>
 
 **61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
 
-&#10230;
+&#10230;解 ― 対数尤度を最大にすると次の解を得る。ただし、k∈{0,1},l∈[[1,L]]とする。
 
 <br>
 
 **62. Remark: Naive Bayes is widely used for text classification and spam detection.**
 
-&#10230;
+&#10230;備考：ナイーブベイズはテキスト分類やスパム検知に幅広く使われている。
 
 <br>
 
 **63. Tree-based and ensemble methods**
 
-&#10230;
+&#10230;ツリーとアンサンブル学習
 
 <br>
 
 **64. These methods can be used for both regression and classification problems.**
 
-&#10230;
+&#10230;これらの方法は回帰と分類問題の両方に使える。
 
 <br>
 
 **65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
 
-&#10230;
+&#10230;CART ― 分類・回帰ツリー (CART)は、一般には決定木として知られ、二分木として表される。非常に解釈しやすいという利点がある。
 
 <br>
 
 **66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
 
-&#10230;
+&#10230;ランダムフォレスト ― これはツリーをベースにしたもので、ランダムに選択された特徴量の集合から構築された多数の決定木を用いる。単純な決定木と異なり、非常に解釈しにくいが、一般的に良い性能が出るのでよく使われるアルゴリズムである。
 
 <br>
 
 **67. Remark: random forests are a type of ensemble methods.**
 
-&#10230;
+&#10230;備考：ランダムフォレストはアンサンブル学習の1種である。
 
 <br>
 
 **68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
 
-&#10230;
+&#10230;ブースティング ― ブースティングの考え方は、複数の弱い学習機を束ねることで1つのより強い学習機を作るというものである。主なものは次の表に集約される：
 
 <br>
 

From fdd8b582a369dc712a5a9c9ec468ac14cbfff2db Mon Sep 17 00:00:00 2001
From: Yuta Kanzawa <yutakanzawa@gmail.com>
Date: Sat, 1 Jun 2019 14:45:16 +0900
Subject: [PATCH 197/531] [ja] Supervised Learning

WIP. Typo correction

No 68; Page 3
---
 ja/cheatsheet-supervised-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/cheatsheet-supervised-learning.md b/ja/cheatsheet-supervised-learning.md
index 11ed450ba..09734db5e 100644
--- a/ja/cheatsheet-supervised-learning.md
+++ b/ja/cheatsheet-supervised-learning.md
@@ -402,7 +402,7 @@
 
 **68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
 
-&#10230;ブースティング ― ブースティングの考え方は、複数の弱い学習機を束ねることで1つのより強い学習機を作るというものである。主なものは次の表に集約される：
+&#10230;ブースティング ― ブースティングの考え方は、複数の弱い学習器を束ねることで1つのより強い学習器を作るというものである。主なものは次の表に集約される：
 
 <br>
 

From 43b169031b157a133606604fd5c5c13f0db59c50 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sat, 1 Jun 2019 12:01:07 -0700
Subject: [PATCH 198/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index b1474508c..8eb2845d4 100644
--- a/README.md
+++ b/README.md
@@ -55,7 +55,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 |Cheatsheet topic|Bahasa Indonesia|
 :---|:---:|
-|Convolutional Neural Nets|not started|
+|Convolutional Neural Nets|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/155)|
 |Recurrent Neural Nets|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|
 |DL tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
 

From 63d49059feb9d632a5e54fdbdecbe4a49941fd40 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sat, 1 Jun 2019 12:02:40 -0700
Subject: [PATCH 199/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 8eb2845d4..c0fe81a14 100644
--- a/README.md
+++ b/README.md
@@ -71,7 +71,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|
+|Deep learning|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/156)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|
 |Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|
 |Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|not started|not started|done|not started|not started|
 |ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|done|not started|not started|

From d9b8492afc0f977a0112a83d3a65c347541cad14 Mon Sep 17 00:00:00 2001
From: Robert Altena <Rob@Ra-ai.com>
Date: Sun, 2 Jun 2019 09:40:31 +0900
Subject: [PATCH 200/531] up to line 270.

---
 ja/refresher-linear-algebra.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/ja/refresher-linear-algebra.md b/ja/refresher-linear-algebra.md
index 85f381cc6..dba624443 100644
--- a/ja/refresher-linear-algebra.md
+++ b/ja/refresher-linear-algebra.md
@@ -249,25 +249,25 @@ x∈V、一般的に使用されるノルムは、以下の表にまとめられ
 **42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
 
 &#10230;
-
+半正定値行列 ― 以下の式が成り立つとき、行列 A∈Rn×n、　A⪰0 は半正定値(PSD)
 <br>
 
 **43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
 
 &#10230;
-
+備考：　同様に、行列Ａは、正定値行列であると言われ、A≻0、それが全ての非ゼロベクトルを満足するＰＳＤ行列である場合と表記される。
 <br>
 
 **44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
 &#10230;
-
+固有値、固有ベクトル　―　与えられた行列A∈Rn×n。以下の式が成り立つとき、もしベクトルz∈Rn∖{0}、固有ベクトルと呼ばれる、が存在する場合ならばλはAの固有値であると言われる：
 <br>
 
 **45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
 &#10230;
-
+スペクトル定理 ― A∈Rn×nとする。　Aが対称ならば、Aは実直交行列U∈Rn×nによって対角化可能です。Λ=diag(λ1,...,λn)と書くと、次のようになります。
 <br>
 
 **46. diagonal**

From 59b83f605ebbeab523e55ab66039329b49257b71 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Sun, 2 Jun 2019 15:10:32 +0900
Subject: [PATCH 201/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index 58ae9df33..2120cde5d 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -172,7 +172,8 @@
 
 **25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
 
-&#10230;
+&#10230; フィルタの寸法 - Cチャネルを含まれている入力に適用されるFxFサイズのフィルタは0x0x1サイズの出力特徴図(活性化マップとも呼ばれている)を作り出し、IxIxCサイズの入力に対して畳み込みを実施するFxFxCボリュームです。
+
 
 <br>
 
@@ -186,7 +187,7 @@
 
 **27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
 
-&#10230;
+&#10230; 注意: FxFサイズのK個別のフィルタを適用すると、0x0xKサイズの出力特徴図を得られる。
 
 <br>
 
@@ -277,7 +278,7 @@
 
 **40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
 
-&#10230;
+&#10230; [入力は平坦化される, ニューラルごとにひとつのバイアスパラメータ, FCニューラルの数は構造制約がない]
 
 <br>
 
@@ -431,7 +432,7 @@
 
 **62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
 
-&#10230;
+&#10230; 非最大抑制 - 非最大抑制技術のねらいは最も代表的なもの選択によって同物体の重複する重なり合う境界ボックスを除去することです。
 
 <br>
 
@@ -599,14 +600,14 @@ Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実
 
 **86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
 
-&#10230;
+&#10230; スタイル行列 - 与えられた層lのスタイル行列 G[l]はグラム配列で、各要素G[l]kk′がチャネルkとｋ′の相関関係を定量化する。
 
 <br>
 
 
 **87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
 
-&#10230;
+&#10230; 注意: スタイル画像及び生成された画像に対するスタイル行列はそれぞれG[l] (S)、G[l] (G)と表示される。
 
 <br>
 
@@ -627,7 +628,7 @@ Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実
 
 **90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
 
-&#10230;
+&#10230; 注意: αのより高い値はモデルが内容をより気にするようにさせ、βのより高い値はスタイルをより気にするようになる。
 
 <br>
 

From 390d4645afcb8a180f2ef41ff6f861789c2f01e5 Mon Sep 17 00:00:00 2001
From: kamulau <kamuela.lau@gmail.com>
Date: Fri, 31 May 2019 18:35:54 +0900
Subject: [PATCH 202/531] Initial add of half the Japanese translation for
 deep-learning-tips-and-tricks

---
 jp/deep-learning-tips-and-tricks.md | 457 ++++++++++++++++++++++++++++
 1 file changed, 457 insertions(+)
 create mode 100644 jp/deep-learning-tips-and-tricks.md

diff --git a/jp/deep-learning-tips-and-tricks.md b/jp/deep-learning-tips-and-tricks.md
new file mode 100644
index 000000000..8a3757f3e
--- /dev/null
+++ b/jp/deep-learning-tips-and-tricks.md
@@ -0,0 +1,457 @@
+**Deep Learning Tips and Tricks translation**
+
+<br>
+
+**1. Deep Learning Tips and Tricks cheatsheet**
+
+&#10230; 
+
+<br> 深層学習のアドバイスやコツのチートシート
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - 深層学習
+
+<br>
+
+
+**3. Tips and tricks**
+
+&#10230; アドバイスやコツ
+
+<br>
+
+
+**4. [Data processing, Data augmentation, Batch normalization]**
+
+&#10230;データ処理、Data augmentation (データ拡張)、Batch normalization (バッチ正規化)
+
+<br>
+
+
+**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
+
+&#10230; ニューラルネットワークの学習、エポック、ミニバッチ、交差エントロピー誤差、誤差逆伝播法、勾配降下法、重み更新、勾配チェック
+
+<br>
+
+
+**6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
+
+&#10230;パラメータチューニング、Xavier初期化、転移学習、学習率、適応学習率
+
+<br>
+
+
+**7. [Regularization, Dropout, Weight regularization, Early stopping]**
+
+&#10230;正規化、Dropout (ドロップアウト)、重みの正規化、Early stopping (学習の早々な終了)
+
+<br>
+
+
+**8. [Good practices, Overfitting small batch, Gradient checking]**
+
+&#10230;
+
+<br>
+
+
+**9. View PDF version on GitHub**
+
+&#10230;GitHubでPDF版を見る
+
+<br>
+
+
+**10. Data processing**
+
+&#10230;データ処理
+
+<br>
+
+
+**11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
+
+&#10230;Data augmentation (データ拡張) - 大抵の場合は、深層学習のモデルを得るには大量のデータが必要です。Data augmentation という技術を用いて既存のデータから新しいデータを作り、データを増やすことがよく役立ちます。以下、Data augmentation の主な手法はまとまっています。特に、以下の画像が入力されたら、下記の技術を適用できます。
+
+<br>
+
+
+**12. [Original, Flip, Rotation, Random crop]**
+
+&#10230;元の画像、反転、回転、ランダムな切り抜き
+
+<br>
+
+
+**13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
+
+&#10230;修正なしの画像、画像の意味が変わらぬ軸における反転、わずかな回転、不正確な水平線の校正（calibration）のシミュレーション、ランダムな部分の拡張、連続のランダムな切り抜きは可能
+
+<br>
+
+
+**14. [Color shift, Noise addition, Information loss, Contrast change]**
+
+&#10230;カラーシフト、ノイズの付加、情報損失、コントラスト（鮮やかさ）の修正
+
+<br>
+
+
+**15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
+
+&#10230;RGBのわずかな修正、照らされ方によるノイズを捉える、ノイズの付加、入力画像の画質のバリエーションに対する耐性の増加、画像の一部を不使用、画像の一部がないときを真似る、明るさの変化、時間によるコントラストをコントロール
+
+<br>
+
+
+**16. Remark: data is usually augmented on the fly during training.**
+
+&#10230;備考：データ拡張は基本的には学習時に行う
+
+<br>
+
+
+**17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
+
+&#10230;batch normalization - ハイパーパラメータ γ、β のステップがバッチ {xi}を正規化します。平均と分散をμB,σ2Bと表記すると、以下で行えます。
+
+<br>
+
+
+**18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
+
+&#10230;高い学習率を利用可能にするのと初期化への依存を減らすのが目的で基本的には全結合層・畳み込み層のあとで非線形層の前に行います。
+
+<br>
+
+
+**19. Training a neural network**
+
+&#10230;ニューラルネットワークの学習
+
+<br>
+
+
+**20. Definitions**
+
+&#10230;定義
+
+<br>
+
+
+**21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
+
+&#10230;エポック - モデル学習においては、エポックとはモデルが全データで学習した一つのイテレーションのことを指します。
+
+<br>
+
+
+**22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
+
+&#10230;ミニバッチの勾配降下法 - 学習時には、計算量が多いため、基本的には全データに基づいて重みを更新しません。また、ノイズの影響のため、1個のデータでも更新しません。それよりむしろ、ミニバッチで重みを更新し、ミニバッチの大きさはチューニングできるハイパーパラメータの一つです。	
+
+<br>
+
+
+**23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
+
+&#10230;損失関数 - モデルの精度・良さを数値化するために、基本的には損失関数Lでモデルの出力zがどれくらい正解zを推測するか評価します。
+
+<br>
+
+
+**24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
+
+&#10230;交差エントロピー誤差 - ニューラルネットワークにおける二項分類では、交差エントロピー誤差L(z,y)は多用されており、以下のように定義されています。
+
+<br>
+
+
+**25. Finding optimal weights**
+
+&#10230;最適な重みの探索
+
+<br>
+
+
+**26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
+
+&#10230;
+
+<br>
+
+
+**27. Using this method, each weight is updated with the rule:**
+
+&#10230;
+
+<br>
+
+
+**28. Updating weights ― In a neural network, weights are updated as follows:**
+
+&#10230;
+
+<br>
+
+
+**29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
+
+&#10230;
+
+<br>
+
+
+**30. [Forward propagation, Backpropagation, Weights update]**
+
+&#10230;
+
+<br>
+
+
+**31. Parameter tuning**
+
+&#10230;
+
+<br>
+
+
+**32. Weights initialization**
+
+&#10230;
+
+<br>
+
+
+**33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
+
+&#10230;
+
+<br>
+
+
+**34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
+
+&#10230;
+
+<br>
+
+
+**35. [Training size, Illustration, Explanation]**
+
+&#10230;
+
+<br>
+
+
+**36. [Small, Medium, Large]**
+
+&#10230;
+
+<br>
+
+
+**37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
+
+&#10230;
+
+<br>
+
+
+**38. Optimizing convergence**
+
+&#10230;
+
+<br>
+
+
+**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.
+**
+
+&#10230;
+
+<br>
+
+
+**40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**41. [Method, Explanation, Update of w, Update of b]**
+
+&#10230;
+
+<br>
+
+
+**42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
+
+&#10230;
+
+<br>
+
+
+**43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
+
+&#10230;
+
+<br>
+
+
+**44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
+
+&#10230;
+
+<br>
+
+
+**45. Remark: other methods include Adadelta, Adagrad and SGD.**
+
+&#10230;
+
+<br>
+
+
+**46. Regularization**
+
+&#10230;
+
+<br>
+
+
+**47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
+
+&#10230;
+
+<br>
+
+
+**48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
+
+&#10230;
+
+<br>
+
+
+**49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**50. [LASSO, Ridge, Elastic Net]**
+
+&#10230;
+
+<br>
+
+**50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230;
+
+<br>
+
+**51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
+
+&#10230;
+
+<br>
+
+
+**52. [Error, Validation, Training, early stopping, Epochs]**
+
+&#10230;
+
+<br>
+
+
+**53. Good practices**
+
+&#10230;
+
+<br>
+
+
+**54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
+
+&#10230;
+
+<br>
+
+
+**55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
+
+&#10230;
+
+<br>
+
+
+**56. [Type, Numerical gradient, Analytical gradient]**
+
+&#10230;
+
+<br>
+
+
+**57. [Formula, Comments]**
+
+&#10230;
+
+<br>
+
+
+**58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
+
+&#10230;
+
+<br>
+
+
+**59. ['Exact' result, Direct computation, Used in the final implementation]**
+
+&#10230;
+
+<br>
+
+
+**60. The Deep Learning cheatsheets are now available in [target language].
+
+&#10230;
+
+
+**61. Original authors**
+
+&#10230;
+
+<br>
+
+**62.Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**63.Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**64.View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+**65.By X and Y**
+
+&#10230;
+
+<br>

From 327999e6ae20d94aac5dfbdcd8e512189ac1c805 Mon Sep 17 00:00:00 2001
From: kamulau <kamuela.lau@gmail.com>
Date: Sun, 2 Jun 2019 22:46:15 +0900
Subject: [PATCH 203/531] Completed translations up until #42

---
 jp/deep-learning-tips-and-tricks.md | 32 ++++++++++++++---------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/jp/deep-learning-tips-and-tricks.md b/jp/deep-learning-tips-and-tricks.md
index 8a3757f3e..10e3b4939 100644
--- a/jp/deep-learning-tips-and-tricks.md
+++ b/jp/deep-learning-tips-and-tricks.md
@@ -181,89 +181,89 @@
 
 &#10230;
 
-<br>
+<br>誤差逆伝播法 - 実際の出力と期待の出力の差に基づいてニューラルネットワークの重みを更新する手法です。チェーンルールを用いて各重みで微分をとります。
 
 
 **27. Using this method, each weight is updated with the rule:**
 
-&#10230;
+&#10230;各重みは以下のルールで重みを更新します。
 
 <br>
 
 
 **28. Updating weights ― In a neural network, weights are updated as follows:**
 
-&#10230;
+&#10230;重みの更新 - ニューラルネットワークでは、重みは以下の通り更新します。
 
 <br>
 
 
 **29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
 
-&#10230;
+&#10230;ステップ１：訓練データのバッチでフォワードプロパゲーションで損失を求めます。ステップ２：損失を用いて逆伝播法を行い勾配をえます。ステップ３：勾配を用いて重みを更新します。	
 
 <br>
 
 
 **30. [Forward propagation, Backpropagation, Weights update]**
 
-&#10230;
+&#10230;伝播法、逆伝播法、重みの更新
 
 <br>
 
 
 **31. Parameter tuning**
 
-&#10230;
+&#10230;パラメータチューニング
 
 <br>
 
 
 **32. Weights initialization**
 
-&#10230;
+&#10230;重みの初期化
 
 <br>
 
 
 **33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
 
-&#10230;
+&#10230;Xavier初期化 - ランダムで重みを初期化するよりもむしろニューラルネットワークのアーキテクチャの特徴を用いて重みを初期化する手法です。
 
 <br>
 
 
 **34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
 
-&#10230;
+&#10230;転移学習 - 深層学習のモデルを学習させるには大量のデータと、それ以上に時間が必要です。膨大なデータで数日・数週間をかけて学習済みのモデルを利用し、自分のユースケースに活かせることが多いです。データの量次第では、以下の生かす方法があります。
 
 <br>
 
 
 **35. [Training size, Illustration, Explanation]**
 
-&#10230;
+&#10230;トレーニングサイズ、イラストレーション、解説
 
 <br>
 
 
 **36. [Small, Medium, Large]**
 
-&#10230;
+&#10230;スモール、ミディアム、ラージ
 
 <br>
 
 
 **37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
 
-&#10230;
+&#10230;全層を凍結、softmaxで重みを学習、ほぼ全部の層を凍結、最終層とsoftmaxで学習、学習済みの重みで初期化することで層とsoftmaxで学習
 
 <br>
 
 
 **38. Optimizing convergence**
 
-&#10230;
+&#10230;収束の最適化
 
 <br>
 
@@ -273,19 +273,19 @@
 
 &#10230;
 
-<br>
+<br>学習率 - αやηとよく表記される学習率とは、重みの更新の速さを表現します。固定で指定するか、もしくは適応的に変えます。もっとも多用されている手法は、適切に学習率を変える Adam です。
 
 
 **40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
 
-&#10230;
+&#10230;適応学習率法 - 学習時間の短縮や精度の向上のために学習率を変更することです。Adamがもっとも多用されている手法だが、他の手法も役に立つことがあります。以下の一覧表で適応学習率法がまとまっています。
 
 <br>
 
 
 **41. [Method, Explanation, Update of w, Update of b]**
 
-&#10230;
+&#10230;手法、解説、wの更新、bの更新
 
 <br>
 

From 2fe237721bc3247c8cd93d8e2db3540ed520bd38 Mon Sep 17 00:00:00 2001
From: kamulau <kamuela.lau@gmail.com>
Date: Sun, 2 Jun 2019 23:03:33 +0900
Subject: [PATCH 204/531] Finished translation up to number 50

---
 jp/deep-learning-tips-and-tricks.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/jp/deep-learning-tips-and-tricks.md b/jp/deep-learning-tips-and-tricks.md
index 10e3b4939..b5c00b0eb 100644
--- a/jp/deep-learning-tips-and-tricks.md
+++ b/jp/deep-learning-tips-and-tricks.md
@@ -292,63 +292,63 @@
 
 **42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
 
-&#10230;
+&#10230;運動量、振動の減少、SGDの改良、チューニングするパラメータが2つある
 
 <br>
 
 
 **43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
 
-&#10230;
+&#10230;RMSprop, 二条平均平方根のプロパゲーション、振動をコントロールすることで学習アルゴリズムを高速化する
 
 <br>
 
 
 **44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
 
-&#10230;
+&#10230;Adam, Adaptive Moment estimation, もっとも人気のある手法、チューニングするパラメータが4つある
 
 <br>
 
 
 **45. Remark: other methods include Adadelta, Adagrad and SGD.**
 
-&#10230;
+&#10230;備考：他にAdadelta, Adagrad, SGD などの手法があります。
 
 <br>
 
 
 **46. Regularization**
 
-&#10230;
+&#10230;正規化
 
 <br>
 
 
 **47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
 
-&#10230;
+&#10230;ドロップアウト - ドロップアウトとは、ニューラルネットワークで過学習を避けるために	p>0の確率でノードをドロップアウト（無効化に）します。モデルを特定の特徴量に依存しすぎることを強制的に避けさせます。
 
 <br>
 
 
 **48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
 
-&#10230;
+&#10230;備考：ほとんどの深層学習のフレームワークでは、ドロップアウトを'keep'というパラメータ（1-p)でパラメータ化します。
 
 <br>
 
 
 **49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
 
-&#10230;
+&#10230;重みの最適化 - 重みが大きくなりすぎず、モデルが過学習しないために、モデルの重みに対して正規化を行います。主な正規化手法は以下でまとまっています。
 
 <br>
 
 
 **50. [LASSO, Ridge, Elastic Net]**
 
-&#10230;
+&#10230;LASSO, Ridge, Elastic Net
 
 <br>
 

From 60014627763a8cc5d28bdd85e4d4f10436813b76 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Sun, 2 Jun 2019 23:07:22 +0900
Subject: [PATCH 205/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index 2120cde5d..931a029e7 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -172,7 +172,7 @@
 
 **25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
 
-&#10230; フィルタの寸法 - Cチャネルを含まれている入力に適用されるFxFサイズのフィルタは0x0x1サイズの出力特徴図(活性化マップとも呼ばれている)を作り出し、IxIxCサイズの入力に対して畳み込みを実施するFxFxCボリュームです。
+&#10230; フィルタの寸法 - C個別のチャネルを含む入力に適用されるFxFサイズのフィルタは0x0x1サイズの出力特徴図(活性化マップとも呼ばれている)を作り出し、IxIxCサイズの入力に対して畳み込みを実施するFxFxCボリュームです。
 
 
 <br>
@@ -201,7 +201,7 @@
 
 **29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
 
-&#10230;
+&#10230; ゼロパディング - ゼロパディングは入力の境界線の各側にP個別のゼロ追加プロセスを表す。この値は手動で指定されることも、以下に詳述する３つのモードのいずれを通じて自動的に設定されることもできる。
 
 <br>
 
@@ -215,7 +215,7 @@
 
 **31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
 
-&#10230;
+&#10230; [パディングなし, もし寸法が一致しなかったら最後の畳み込みを落とす, 特徴図のサイズが[IS]サイズになるようなパディング, 出力サイズは数学的に便利です, ハーフパディングとも呼ばれる, 入力の限界に端部畳み込みが適用されるような最大パディング, フィルタはエンドツーエンド入力を観察する]
 
 <br>
 
@@ -432,7 +432,7 @@
 
 **62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
 
-&#10230; 非最大抑制 - 非最大抑制技術のねらいは最も代表的なもの選択によって同物体の重複する重なり合う境界ボックスを除去することです。
+&#10230; 非最大抑制 - 非最大抑制技術のねらいは最も代表的なもの選択によって同物体の重複する重なり合う境界ボックスを除去することです。0.6未満予測確率があるボックスを全て除去した後、残りのボックスがある間に以下のステップが繰り返される。
 
 <br>
 
@@ -467,7 +467,7 @@
 
 **67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
 
-&#10230; ここで、pcは物体認識の確率、bx,by,bh,bwはバウンディングボックスのプロパーティ、c1, ..., cpはpクラスのうちどれが検出されたかのワンホット表現です。
+&#10230; ここで、pcは物体認識の確率、bx,by,bh,bwはバウンディングボックスのプロパーティ、c1, ..., cpはpクラスのうちどれが検出されたかのワンホット表現で、kはアンカーボックスの数です。
 
 <br>
 

From 60758dc212821725b9f19d7336fdf224d72d3b55 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 2 Jun 2019 11:28:11 -0700
Subject: [PATCH 206/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index c0fe81a14..dd233ff65 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |Convolutional Neural Nets|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
 |Recurrent Neural Nets|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/146)|not started|not started|
-|DL tips and tricks|not started|done|done|not started|not started|not started|
+|DL tips and tricks|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/157)|not started|not started|
 
 |Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|

From c95fe7c763ca1b2d55a7dc2c6096b643eafb6c38 Mon Sep 17 00:00:00 2001
From: kamulau <kamuela.lau@gmail.com>
Date: Mon, 3 Jun 2019 12:55:05 +0900
Subject: [PATCH 207/531] Finished rest of translations.

---
 jp/deep-learning-tips-and-tricks.md | 34 ++++++++++++++---------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/jp/deep-learning-tips-and-tricks.md b/jp/deep-learning-tips-and-tricks.md
index b5c00b0eb..75cfe29a5 100644
--- a/jp/deep-learning-tips-and-tricks.md
+++ b/jp/deep-learning-tips-and-tricks.md
@@ -6,7 +6,7 @@
 
 &#10230; 
 
-<br> 深層学習のアドバイスやコツのチートシート
+<br> 深層学習（ディープラーニング）のアドバイスやコツのチートシート
 
 
 **2. CS 230 - Deep Learning**
@@ -354,104 +354,104 @@
 
 **50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
-&#10230;
+&#10230;bis. 係数を0へ小さくする、変数選択に良い、係数を小さくする、変数選択と小さい係数のトレードオフ
 
 <br>
 
 **51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
 
-&#10230;
+&#10230;Early stopping - バリデーションの損失が収束するか、あるいは増加し始めたときに学習を早々に止める正規方法
 
 <br>
 
 
 **52. [Error, Validation, Training, early stopping, Epochs]**
 
-&#10230;
+&#10230;損失、評価、学習、early stopping、エポック
 
 <br>
 
 
 **53. Good practices**
 
-&#10230;
+&#10230;おすすめの技法
 
 <br>
 
 
 **54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
 
-&#10230;
+&#10230;小さいバッチの過学習 - モデルをデバッグするときに、モデルのアーキテクチャを検証するために小さいテストを作ることが役立つことが多いです。特に、モデルを正しく学習できるのを確認するために、ミニバッチでネットワークを学習し、過学習が発生するかどうかチェックすることがあります。モデルが複雑すぎるか、単純すぎると、普通のトレーニングセットどころか、小さいバッチでさえ過学習できないのです。
 
 <br>
 
 
 **55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
 
-&#10230;
+&#10230;Gradient checking (勾配チェック) - Gradient checking とは、ニューラルネットワークで逆伝播法時に用いられる手法です。特定の点で数値勾配と逆伝播法時に計算した勾配を比較する手法で、逆伝播法の実装が正しいことなど確認できます。
 
 <br>
 
 
 **56. [Type, Numerical gradient, Analytical gradient]**
 
-&#10230;
+&#10230;種類、数値勾配、勾配
 
 <br>
 
 
 **57. [Formula, Comments]**
 
-&#10230;
+&#10230;数式、コメント
 
 <br>
 
 
 **58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
 
-&#10230;
+&#10230;計算量が多い；損失を次元ごとに２回計算する必要がある、勾配の実装のチェックに用いられる、hが小さすぎると数値的不安定だが、大きすぎると近似が正確でなくなるというトレードオフががある
 
 <br>
 
 
 **59. ['Exact' result, Direct computation, Used in the final implementation]**
 
-&#10230;
+&#10230;エグザクトの勾配、直接計算する、最終的な実装で使われる
 
 <br>
 
 
 **60. The Deep Learning cheatsheets are now available in [target language].
 
-&#10230;
+&#10230;深層学習のチートシートは[対象言語]で利用可能になりました。
 
 
 **61. Original authors**
 
-&#10230;
+&#10230;原著者
 
 <br>
 
 **62.Translated by X, Y and Z**
 
-&#10230;
+&#10230;X,Y,そしてZにより翻訳されました。
 
 <br>
 
 **63.Reviewed by X, Y and Z**
 
-&#10230;
+&#10230;X,Y,そしてZにより校正されました。
 
 <br>
 
 **64.View PDF version on GitHub**
 
-&#10230;
+&#10230;GitHubでPDF版を見る
 
 <br>
 
 **65.By X and Y**
 
-&#10230;
+&#10230;XそしてYによる。
 
 <br>

From a0a0f3d5eb379415865201f1c3afdf4e46967a3d Mon Sep 17 00:00:00 2001
From: kamulau <kamuela.lau@gmail.com>
Date: Mon, 3 Jun 2019 13:04:04 +0900
Subject: [PATCH 208/531] Added translation for analytical gradient

---
 jp/deep-learning-tips-and-tricks.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/jp/deep-learning-tips-and-tricks.md b/jp/deep-learning-tips-and-tricks.md
index 75cfe29a5..8c776d52c 100644
--- a/jp/deep-learning-tips-and-tricks.md
+++ b/jp/deep-learning-tips-and-tricks.md
@@ -388,14 +388,14 @@
 
 **55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
 
-&#10230;Gradient checking (勾配チェック) - Gradient checking とは、ニューラルネットワークで逆伝播法時に用いられる手法です。特定の点で数値勾配と逆伝播法時に計算した勾配を比較する手法で、逆伝播法の実装が正しいことなど確認できます。
+&#10230;Gradient checking (勾配チェック) - Gradient checking とは、ニューラルネットワークで逆伝播法時に用いられる手法です。特定の点で数値計算で計算した勾配と逆伝播法時に計算した勾配を比較する手法で、逆伝播法の実装が正しいことなど確認できます。
 
 <br>
 
 
 **56. [Type, Numerical gradient, Analytical gradient]**
 
-&#10230;種類、数値勾配、勾配
+&#10230;種類、数値勾配、勾配の理論値
 
 <br>
 

From 67cab5ad4253f13213944f4c180a01511ce664e9 Mon Sep 17 00:00:00 2001
From: tt-anh-eole <tt-anh@eole.co.jp>
Date: Mon, 3 Jun 2019 16:11:49 +0900
Subject: [PATCH 209/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index 931a029e7..138994a3d 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -229,7 +229,7 @@
 
 **33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
 
-&#10230;
+&#10230; 畳み込み層内のパラメータ互換性 - Iを入力ボリュームサイズの長さ、Fをフィルタの長さ、Pをゼロパディングの量, Sをストライドとすると、その寸法に沿った特徴図の出力サイズOは次式で与えられる:
 
 <br>
 
@@ -418,7 +418,7 @@
 
 **60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
 
-&#10230;
+&#10230; 注意: 常にIoU∈[0,1]を持ってます。慣例により、予測されたバウンディングボックスBpはIoU(Bp,Ba)⩾0.5の場合適度に良いと見なされる。
 
 <br>
 
@@ -481,35 +481,35 @@
 
 **69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
 
-&#10230; [原画像, GxGグリッドでの分割, 物体検出, 非最大抑制]
+&#10230; [元の画像, GxGグリッドでの分割, 物体検出, 非最大抑制]
 
 <br>
 
 
 **70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
 
-&#10230;
+&#10230; 注意: pc=0時、ネットワークは物体を検出しません。その場合には適当な予測 bx, ..., cpそれぞれは無視する必要があります。
 
 <br>
 
 
 **71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
 
-&#10230;
+&#10230; R-CNN - 畳み込みニューラルネットワークを利用した領域は最初に潜在的な関連する境界ボックスを見つけるため画像を分割し、次にそれらの境界ボックス内の最も可能性の高いオブジェクトを見つけるため検出アルゴリズムを実行する物体検出アルゴリズムです。
 
 <br>
 
 
 **72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
 
-&#10230; [原画像, セグメンテーション, 物体予測, 非最大抑制]
+&#10230; [元の画像, セグメンテーション, 物体予測, 非最大抑制]
 
 <br>
 
 
 **73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
 
-&#10230; 注意: 原アルゴリズムは計算コストが高くて遅くても、より新たなアーキテクチャでは、
+&#10230; 注意: 元のアルゴリズムは計算コストが高くて遅くても、より新たなアーキテクチャでは、
 Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実行できる。
 <br>
 
@@ -537,7 +537,7 @@ Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実
 
 **77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
 
-&#10230;
+&#10230; [これは正しい人ですか?, 一対一見上げる, これはデータベース内のk人のうちの一人ですか, 一対多見上げる]
 
 <br>
 
@@ -593,7 +593,7 @@ Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実
 
 **85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
 
-&#10230;
+&#10230; コンテンツコスト関数 - Jcontent(C, G)というコンテンツコスト関数は元のコンテンツ画像Cと生成された画像Gとの違いを決定するため利用される。以下のように定義される:
 
 <br>
 
@@ -621,7 +621,7 @@ Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実
 
 **89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
 
-&#10230;
+&#10230; 全体コスト関数 - 全体コスト関数は以下のようにパラメータα,βによって重み付けされ、スタイルコスト関数とコンテンツの組み合わせた物として定義される:
 
 <br>
 
@@ -663,7 +663,7 @@ Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実
 
 **95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
 
-&#10230;
+&#10230; ResNet - 残渣ネットワークアーキテクチャ（ResNetとも呼ばれる）はトレーニングエラーを減らすため多数の層がある残差ブロックを使用する。残差ブロックは次の特定方程式を有する。
 
 <br>
 

From 3835afdb8936a9fa95c5520d3cd29d87af55bd25 Mon Sep 17 00:00:00 2001
From: tt-anh-eole <tt-anh@eole.co.jp>
Date: Mon, 3 Jun 2019 18:40:49 +0900
Subject: [PATCH 210/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index 138994a3d..df22f8e60 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -250,7 +250,7 @@
 
 **36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
 
-&#10230;
+&#10230; モデルの複雑さを理解する - モデルの複雑さを評価する為モデルのアーキテクチャが持つことになるパラメータの数を決定することはしばしば有用です。畳み込みニューラルネットワーク内で、以下のように行なわれる。
 
 <br>
 
@@ -285,7 +285,7 @@
 
 **41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
 
-&#10230;
+&#10230; 受容的なフィルド - 層kの受容的なフィルドはk番目の活性化図の各ピックセルが見られる入力のRkxRkを表示されるエリアです。
 
 <br>
 
@@ -306,7 +306,7 @@
 
 **44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
 
-&#10230;
+&#10230; 整流線形ユニット - 整流線形ユニット層(ReLU)はボリュームの全ての要素に利用される活性化関数gです。ReLUの目的は非線型性をネットワークに紹介する。ReLUの変種は以下の表でまとめられる:
 
 <br>
 
@@ -425,7 +425,7 @@
 
 **61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
 
-&#10230;
+&#10230; アンカーボックス - アンカーボクシングは重複バウンディングボックスを予測する為利用される技術です。実際に、
 
 <br>
 

From bab0f90a521a160ae17f577c1ad94e240fa7643a Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Mon, 3 Jun 2019 23:00:02 +0900
Subject: [PATCH 211/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index df22f8e60..e4bc1ba70 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -285,7 +285,7 @@
 
 **41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
 
-&#10230; 受容的なフィルド - 層kの受容的なフィルドはk番目の活性化図の各ピックセルが見られる入力のRkxRkを表示されるエリアです。
+&#10230; 受容的なフィルド - 層kの受容的なフィルドはk番目の活性化図の各ピックセルが見られる入力のRkxRkを表示されるエリアです。j層のフィルタサイズをFj、i層のストライド値をSi、規約S0=1とすると、k層での受容的なフィルドは式で計算される:
 
 <br>
 
@@ -418,14 +418,14 @@
 
 **60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
 
-&#10230; 注意: 常にIoU∈[0,1]を持ってます。慣例により、予測されたバウンディングボックスBpはIoU(Bp,Ba)⩾0.5の場合適度に良いと見なされる。
+&#10230; 注意: 常にIoU∈[0,1]を持ってます。規約により、予測されたバウンディングボックスBpはIoU(Bp,Ba)⩾0.5の場合適度に良いと見なされる。
 
 <br>
 
 
 **61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
 
-&#10230; アンカーボックス - アンカーボクシングは重複バウンディングボックスを予測する為利用される技術です。実際に、
+&#10230; アンカーボックス - アンカーボクシングは重複バウンディングボックスを予測する為利用される技術です。実際には、同時に複雑のボックスを予測すろことを許可されており、各ボックス予測は与えられた幾何学的なプロパーティのセットを持つように制約される。例えば、最初の予測は与えられたフォームの長方形のボックスになる可能性があり、二番目のボックスは異なる幾何学的なフォームの別の長方形になります。
 
 <br>
 

From e297c1e8b52f07b33daae4983b9c4933170bb602 Mon Sep 17 00:00:00 2001
From: tt-anh-eole <tt-anh@eole.co.jp>
Date: Tue, 4 Jun 2019 18:02:19 +0900
Subject: [PATCH 212/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index e4bc1ba70..b1a5f192b 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -320,7 +320,7 @@
 
 **46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
 
-&#10230; 
+&#10230; [生物学的に解釈可能な非線形複雑性, 負の値の為dyingReLUの問題を示す,どこても差別化可能]
 
 <br>
 
@@ -411,7 +411,7 @@
 
 **59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
 
-&#10230;
+&#10230; 労働組合の交差点 - 労働組合の交差点(IoUとも呼ばれる)は予測バウンディングボックスBpが実際のバウンディングボックスBaに対してどれだけ正しくかを定量化する関数です。次のように定義される:
 
 <br>
 
@@ -544,7 +544,7 @@ Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実
 
 **78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
 
-&#10230;
+&#10230; ワンショット学習 - ワンショット学習は二つの与えられた画像の違いを定量かする類似性関数を学ぶ為有限トレーニングセットを利用する顔認証アルゴリズムです。二つの画像に適用される類似性関数はしばしばd(画像１、画像２)と記される。
 
 <br>
 

From a3e8f5a5f5c4d61db5240c1645aaddcad67e177c Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Tue, 4 Jun 2019 21:26:17 +0900
Subject: [PATCH 213/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index e4bc1ba70..3c5578540 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -383,7 +383,7 @@
 
 **55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
 
-&#10230; 検出 - 物体検出の文脈では、画像内で物体を特定するのかそれとも複雑な形状を検出するのかによって、様々な方法は使用される。二つの主なものは次の表でまとめられる。
+&#10230; 検出 - 物体検出の文脈では、画像内で物体を特定するのかそれとも複雑な形状を検出するのかによって、様々な方法は使用される。二つの主なものは次の表でまとめられる:
 
 <br>
 

From 268df74ead1442908068386703fe1d59932462af Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Tue, 4 Jun 2019 23:00:52 +0900
Subject: [PATCH 214/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index bfd1c5875..095b94dd3 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -432,7 +432,7 @@
 
 **62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
 
-&#10230; 非最大抑制 - 非最大抑制技術のねらいは最も代表的なもの選択によって同物体の重複する重なり合う境界ボックスを除去することです。0.6未満予測確率があるボックスを全て除去した後、残りのボックスがある間に以下のステップが繰り返される。
+&#10230; 非最大抑制 - 非最大抑制技術のねらいは最も代表的なもの選択によって同物体の重複する重なり合う境界ボックスを除去することです。0.6未満予測確率があるボックスを全て除去した後、残りのボックスがある間に以下のステップが繰り返される:
 
 <br>
 
@@ -509,8 +509,8 @@
 
 **73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
 
-&#10230; 注意: 元のアルゴリズムは計算コストが高くて遅くても、より新たなアーキテクチャでは、
-Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実行できる。
+&#10230; 注意: 元のアルゴリズムは計算コストが高くて遅くても、より新たなアーキテクチャでは、Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実行できる。
+
 <br>
 
 
@@ -558,7 +558,7 @@ Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実
 
 **80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
 
-&#10230;
+&#10230; トリプレット損失 - トリプレット損失ℓはトリプレットの画像A(アンカー)、P(ポジティブ)、N(負)の埋め込み表現で計算する損失関数です。アンカーとポジティブ例は同じクラスに属し、ネガティブ例は別のものに属する。マージンパラメータはα∈R+と呼ぶことによってこの損失は次のように定義される:
 
 <br>
 
@@ -642,7 +642,7 @@ Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実
 
 **92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
 
-&#10230;
+&#10230; 生成型敵対的ネットワーク - 生成型敵対的ネットワーク、GANsとも呼ばれるは生成モデルと識別モデルで構成される、生成モデルの目的は生成された画像と実像を区別する目的とする識別にフィードされる最も真実の出力を生成する。
 
 <br>
 
@@ -670,7 +670,7 @@ Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実
 
 **96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
 
-&#10230;
+&#10230; インセプションネットワーク - このアーキテクチャはインセプションモジュールを利用し、特徴多様化を通じてパーフォーマンス改善の為別の畳み込みを試してみる目的とする。特に、計算負荷を限定する為1×1畳み込みトリックを使う。
 
 <br>
 

From 3134b1c90cda1168abafcb59d329f866ec5c237d Mon Sep 17 00:00:00 2001
From: kamulau <kamuela.lau@gmail.com>
Date: Thu, 6 Jun 2019 12:22:15 +0900
Subject: [PATCH 215/531] Changed name of jp directory to ja

---
 {jp => ja}/deep-learning-tips-and-tricks.md | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename {jp => ja}/deep-learning-tips-and-tricks.md (100%)

diff --git a/jp/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
similarity index 100%
rename from jp/deep-learning-tips-and-tricks.md
rename to ja/deep-learning-tips-and-tricks.md

From 7a25c7a835bcd77b4f6d79c257e4f78d3eb33c49 Mon Sep 17 00:00:00 2001
From: kamulau <kamuela.lau@gmail.com>
Date: Thu, 6 Jun 2019 12:26:42 +0900
Subject: [PATCH 216/531] Made revisions from reviewer suggestions.

---
 ja/deep-learning-tips-and-tricks.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index 8c776d52c..f2828dde2 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -74,7 +74,7 @@
 
 **11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
 
-&#10230;Data augmentation (データ拡張) - 大抵の場合は、深層学習のモデルを得るには大量のデータが必要です。Data augmentation という技術を用いて既存のデータから新しいデータを作り、データを増やすことがよく役立ちます。以下、Data augmentation の主な手法はまとまっています。特に、以下の画像が入力されたら、下記の技術を適用できます。
+&#10230;Data augmentation (データ拡張) - 大抵の場合は、深層学習のモデルを適切に訓練するには大量のデータが必要です。Data augmentation という技術を用いて既存のデータから、データを増やすことがよく役立ちます。以下、Data augmentation の主な手法はまとまっています。より正確には、以下の入力画像に対して、下記の技術を適用できます。
 
 <br>
 
@@ -278,7 +278,7 @@
 
 **40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
 
-&#10230;適応学習率法 - 学習時間の短縮や精度の向上のために学習率を変更することです。Adamがもっとも多用されている手法だが、他の手法も役に立つことがあります。以下の一覧表で適応学習率法がまとまっています。
+&#10230;適応学習率法 - 学習時間の短縮や精度の向上のために学習率を変更することです。Adamがもっとも多用されている手法だが、他の手法も役に立つことがあります。適応学習率法を下記の表にまとめました。
 
 <br>
 

From f0446540cb6a8e606ed1502497ddf2a88f4533f1 Mon Sep 17 00:00:00 2001
From: kamulau <kamuela.lau@gmail.com>
Date: Thu, 6 Jun 2019 12:29:37 +0900
Subject: [PATCH 217/531] Fixed placement of some translations that followed
 the <br> tag

---
 ja/deep-learning-tips-and-tricks.md | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index f2828dde2..3edd2e975 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -4,21 +4,21 @@
 
 **1. Deep Learning Tips and Tricks cheatsheet**
 
-&#10230; 
+&#10230; 深層学習（ディープラーニング）のアドバイスやコツのチートシート
 
-<br> 深層学習（ディープラーニング）のアドバイスやコツのチートシート
+<br> 
 
 
 **2. CS 230 - Deep Learning**
 
-&#10230; CS 230 - 深層学習
+&#10230;CS 230 - 深層学習
 
 <br>
 
 
 **3. Tips and tricks**
 
-&#10230; アドバイスやコツ
+&#10230;アドバイスやコツ
 
 <br>
 
@@ -109,7 +109,7 @@
 
 **16. Remark: data is usually augmented on the fly during training.**
 
-&#10230;備考：データ拡張は基本的には学習時に行う
+&#10230;備考：データ拡張は基本的には学習時に行う。
 
 <br>
 
@@ -179,9 +179,9 @@
 
 **26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
 
-&#10230;
+&#10230;誤差逆伝播法 - 実際の出力と期待の出力の差に基づいてニューラルネットワークの重みを更新する手法です。チェーンルールを用いて各重みで微分をとります。
 
-<br>誤差逆伝播法 - 実際の出力と期待の出力の差に基づいてニューラルネットワークの重みを更新する手法です。チェーンルールを用いて各重みで微分をとります。
+<br>
 
 
 **27. Using this method, each weight is updated with the rule:**
@@ -271,9 +271,9 @@
 **39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.
 **
 
-&#10230;
+&#10230;学習率 - αやηとよく表記される学習率とは、重みの更新の速さを表現します。固定で指定するか、もしくは適応的に変えます。もっとも多用されている手法は、適切に学習率を変える Adam です。
 
-<br>学習率 - αやηとよく表記される学習率とは、重みの更新の速さを表現します。固定で指定するか、もしくは適応的に変えます。もっとも多用されている手法は、適切に学習率を変える Adam です。
+<br>
 
 
 **40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**

From 366218ee6f90de67c9ad25df73d744f409e65ff7 Mon Sep 17 00:00:00 2001
From: kamulau <kamuela.lau@gmail.com>
Date: Thu, 6 Jun 2019 12:31:25 +0900
Subject: [PATCH 218/531] Added missing translation

---
 ja/deep-learning-tips-and-tricks.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index 3edd2e975..d3900a3c2 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -4,7 +4,7 @@
 
 **1. Deep Learning Tips and Tricks cheatsheet**
 
-&#10230; 深層学習（ディープラーニング）のアドバイスやコツのチートシート
+&#10230;深層学習（ディープラーニング）のアドバイスやコツのチートシート
 
 <br> 
 
@@ -32,7 +32,7 @@
 
 **5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
 
-&#10230; ニューラルネットワークの学習、エポック、ミニバッチ、交差エントロピー誤差、誤差逆伝播法、勾配降下法、重み更新、勾配チェック
+&#10230;ニューラルネットワークの学習、エポック、ミニバッチ、交差エントロピー誤差、誤差逆伝播法、勾配降下法、重み更新、勾配チェック
 
 <br>
 
@@ -53,7 +53,7 @@
 
 **8. [Good practices, Overfitting small batch, Gradient checking]**
 
-&#10230;
+&#10230;おすすめの技法、小さいバッチの過学習、勾配チェック
 
 <br>
 

From 82d3090f4b82ef8dbc7ef201d4ae3bb528e212bb Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Thu, 6 Jun 2019 22:54:20 +0900
Subject: [PATCH 219/531] vi translating

---
 vi/cheatsheet-deep-learning.md                | 321 ++++++++
 ...tsheet-machine-learning-tips-and-tricks.md | 285 +++++++
 vi/cheatsheet-supervised-learning.md          | 567 ++++++++++++++
 vi/cheatsheet-unsupervised-learning.md        | 340 +++++++++
 vi/convolutional-neural-networks.md           | 716 ++++++++++++++++++
 vi/deep-learning-tips-and-tricks.md           | 457 +++++++++++
 vi/recurrent-neural-networks.md               | 677 +++++++++++++++++
 vi/refresher-linear-algebra.md                | 339 +++++++++
 vi/refresher-probability.md                   | 381 ++++++++++
 9 files changed, 4083 insertions(+)
 create mode 100644 vi/cheatsheet-deep-learning.md
 create mode 100644 vi/cheatsheet-machine-learning-tips-and-tricks.md
 create mode 100644 vi/cheatsheet-supervised-learning.md
 create mode 100644 vi/cheatsheet-unsupervised-learning.md
 create mode 100644 vi/convolutional-neural-networks.md
 create mode 100644 vi/deep-learning-tips-and-tricks.md
 create mode 100644 vi/recurrent-neural-networks.md
 create mode 100644 vi/refresher-linear-algebra.md
 create mode 100644 vi/refresher-probability.md

diff --git a/vi/cheatsheet-deep-learning.md b/vi/cheatsheet-deep-learning.md
new file mode 100644
index 000000000..ff3b3c508
--- /dev/null
+++ b/vi/cheatsheet-deep-learning.md
@@ -0,0 +1,321 @@
+**1. Deep Learning cheatsheet**
+
+&#10230; Deep Learning cheatsheet
+
+<br>
+
+**2. Neural Networks**
+
+&#10230; Mạng Neural
+
+<br>
+
+**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
+
+&#10230; Mạng Neural là 1 lớp của các models được xây dựng với các tầng (layers). Các loại mạng Neural thường được sử dụng bao gồm: Mạng Neural tích chập (Convolutional Neural Networks) và Mạng Neural hồi quy (Recurrent Neural Networks).
+
+<br>
+
+**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
+
+&#10230; Kiến trúc - Các thuật ngữ xoay quanh kiến trúc của mạng neural được mô tả như hình phía dưới
+
+<br>
+
+**5. [Input layer, hidden layer, output layer]**
+
+&#10230; [Tầng đầu vào, tầng ẩn, tầng đầu ra]
+
+<br>
+
+**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
+
+&#10230; Bằng việc kí hiệu i là tầng thứ i của mạng, j là đơn vị ẩn (hidden unit) thứ j của tầng, ta có:
+
+<br>
+
+**7. where we note w, b, z the weight, bias and output respectively.**
+
+&#10230; Chúng ta kí hiệu w, b, z tương ứng với trọng số (weights), bias và đầu ra.
+
+<br>
+
+**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
+
+&#10230; Hàm kích hoạt (Activation function) - Hàm kích hoạt được sử dụng ở phần cuối của đơn vị ẩn để đưa ra độ phức tạp phi tuyến tính (non-linear) cho mô hình (model). Đây là những trường hợp phổ biến nhất:
+
+<br>
+
+**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
+
+&#10230; [Sigmoid, Tanh, ReLU, Leaky ReLU]
+
+<br>
+
+**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
+
+&#10230; Mất mát (loss) Cross-entropy - Trong bối cảnh của mạng neural, mất mát cross-entropy L(z, y) thường được sử dụng và định nghĩa như sau:
+
+<br>
+
+**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
+
+&#10230; Tốc độ học (Learning rate) - Tốc độ học, thường được kí hiệu bởi α hoặc đôi khi là η, chỉ ra tốc độ mà trọng số được cập nhật. Thông số này có thể là cố định hoặc được thay đổi tuỳ biến. Phương thức (method) phổ biến nhất hiện tại là Adam, đó là phương thức thay đổi tốc độ học một cách phù hợp nhất có thể.
+
+<br>
+
+**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
+
+&#10230; Backpropagation (Lan truyền ngược) - Backpropagation là phương thức dùng để cập nhật trọng số trong mạng neural bằng cách tính toán đầu ra thực sự và đầu ra mong muốn. Đạo hàm liên quan tới trọng số w được tính bằng cách sử dụng quy tắc chuỗi (chain rule) theo như cách dưới đây:
+
+<br>
+
+**13. As a result, the weight is updated as follows:**
+
+&#10230; Như kết quả, trọng số được cập nhật như sau:
+
+<br>
+
+**14. Updating weights ― In a neural network, weights are updated as follows:**
+
+&#10230; Cập nhật trọng số - Trong mạng neural, trọng số được cập nhật như sau:
+
+<br>
+
+**15. Step 1: Take a batch of training data.**
+
+&#10230; Bước 1: Lấy một mẻ (batch) dữ liệu huấn luyện (training data).
+
+<br>
+
+**16. Step 2: Perform forward propagation to obtain the corresponding loss.**
+
+&#10230; Bước 2: Thực thi lan truyền xuôi (forward propagation) để lấy được mất mát (loss) tương ứng.
+
+<br>
+
+**17. Step 3: Backpropagate the loss to get the gradients.**
+
+&#10230; Bước 3: Lan truyền ngược mất mát để lấy được gradients (độ dốc).
+
+<br>
+
+**18. Step 4: Use the gradients to update the weights of the network.**
+
+&#10230; Bước 4: Sử dụng gradients để cập nhật trọng số của mạng (network).
+
+<br>
+
+**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
+
+&#10230; Dropout - Dropout là thuật ngữ kĩ thuật dùng trong việc tránh overfitting tập dữ liệu huấn luyện
+
+<br>
+
+**20. Convolutional Neural Networks**
+
+&#10230; Mạng neural tích chập (Convolutional Neural Networks)
+
+<br>
+
+**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
+
+&#10230; Yêu cầu của tầng tích chập (Convolutional layer) - Bằng việc ghi chú W là kích cỡ của volume đầu vào, F là kích cỡ của neurals thuộc convolutional layer, P là số lượng zero padding, khi đó số lượng neurals N phù hợp với volume cho trước sẽ như sau:
+
+<br>
+
+**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
+
+&#10230; Batch normalization (chuẩn hoá) - Đây là bước mà các hyperparameter γ,β chuẩn hoá batch (mẻ) {xi}. Bằng việc kí hiệu μB,σ2B là giá trị trung bình, phương sai mà ta muốn gán cho batch, nó được thực hiện như sau:
+
+<br>
+
+**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
+
+&#10230; Nó thường được hoàn thành sau fully connected/convolutional layer và trước non-linearity layer và mục tiêu là cho phép tốc độ học cao hơn cũng như giảm đi sự phụ thuộc mạnh mẽ vào việc khởi tạo.
+
+<br>
+
+**24. Recurrent Neural Networks**
+
+&#10230; Mạng neural hồi quy (Recurrent Neural Networks)
+
+<br>
+
+**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
+
+&#10230; Các loại cổng - Đây là các loại cổng (gate) khác nhau mà chúng ta sẽ gặp ở một mạng neural hồi quy điển hình:
+
+<br>
+
+**26. [Input gate, forget gate, gate, output gate]**
+
+&#10230; [Cổng đầu vào, cổng quên, cổng đầu ra]
+
+<br>
+
+**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
+
+&#10230; [Ghi vào cell hay không?, Xoá cell hay không?, Ghi bao nhiêu vào cell?, Cần tiết lộ bao nhiêu về cell?]
+
+<br>
+
+**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
+
+&#10230; LSTM - Mạng bộ nhớ ngắn dài (LSTM) là 1 loại RNN model tránh vấn đề vanishing gradient (độ dốc biến mất đột ngột) bằng cách thêm vào cổng 'quên' ('forget' gates).
+
+<br>
+
+**29. Reinforcement Learning and Control**
+
+&#10230; Reinforcement Learning và Control
+
+<br>
+
+**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
+
+&#10230; Mục tiêu của reinforcement learning đó là cho tác tử (agent) học cách làm sao để phát triển trong một môi trường
+
+<br>
+
+**31. Definitions**
+
+&#10230; Định nghĩa
+
+<br>
+
+**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
+
+&#10230; Tiến trình quyết định Markov (Markov decision processes) - Tiến trình quyết định Markov (MDP) là một dạng 5-tuple (S,A,{Psa},γ,R) mà ở đó:
+
+<br>
+
+**33. S is the set of states**
+
+&#10230; S là tập hợp các trạng thái (states)
+
+<br>
+
+**34. A is the set of actions**
+
+&#10230; A là tập hợp các hành động (actions)
+
+<br>
+
+**35. {Psa} are the state transition probabilities for s∈S and a∈A**
+
+&#10230; {Psa} là xác suất chuyển tiếp trạng thái cho s∈S và a∈A
+
+<br>
+
+**36. γ∈[0,1[ is the discount factor**
+
+&#10230; γ∈[0,1[ là discount factor
+
+<br>
+
+**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
+
+&#10230; R:S×A⟶R hoặc R:S⟶R là reward function (hàm reward) mà giải thuật muốn tối đa hoá.
+
+<br>
+
+**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
+
+&#10230; Policy - Policy π là 1 hàm π:S⟶A có nhiệm vụ ánh xạ states tới actions
+
+<br>
+
+**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
+
+&#10230; Chú ý: Ta quy ước rằng ta thực thi policy π cho trước nếu cho trước state s ta có action a=π(s)
+
+<br>
+
+**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
+
+&#10230; Hàm giá trị (Value function) - Với policy cho trước π và state s, ta định nghĩa value function Vπ như sau:
+
+<br>
+
+**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
+
+&#10230; Phương trình Bellman - Phương trình tối ưu Bellman đặc trưng hoá value function Vπ∗ của policy tối ưu (optimal policy) π∗:
+
+<br>
+
+**42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
+
+&#10230; Chú ý: ta quy ước optimal policy π∗ đối với state s cho trước như sau:
+
+<br>
+
+**43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
+
+&#10230; Giải thuật duyệt giá trị (Value iteration) - Giải thuật duyệt giá trị có 2 loại:
+
+<br>
+
+**44. 1) We initialize the value:**
+
+&#10230; 1) Ta khởi tạo gái trị (value):
+
+<br>
+
+**45. 2) We iterate the value based on the values before:**
+
+&#10230; 2) Ta duyệt qua giá trị dựa theo giá trị phía trước:
+
+<br>
+
+**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
+
+&#10230; 
+
+<br>
+
+**47. times took action a in state s and got to s′**
+
+&#10230;
+
+<br>
+
+**48. times took action a in state s**
+
+&#10230;
+
+<br>
+
+**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
+
+&#10230;
+
+<br>
+
+**50. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+**51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
+
+&#10230;
+
+<br>
+
+**52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
+
+&#10230;
+
+<br>
+
+**53. [Recurrent Neural Networks, Gates, LSTM]**
+
+&#10230;
+
+<br>
+
+**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
+
+&#10230;
diff --git a/vi/cheatsheet-machine-learning-tips-and-tricks.md b/vi/cheatsheet-machine-learning-tips-and-tricks.md
new file mode 100644
index 000000000..9712297b8
--- /dev/null
+++ b/vi/cheatsheet-machine-learning-tips-and-tricks.md
@@ -0,0 +1,285 @@
+**1. Machine Learning tips and tricks cheatsheet**
+
+&#10230;
+
+<br>
+
+**2. Classification metrics**
+
+&#10230;
+
+<br>
+
+**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
+
+&#10230;
+
+<br>
+
+**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+**5. [Predicted class, Actual class]**
+
+&#10230;
+
+<br>
+
+**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
+
+&#10230;
+
+<br>
+
+**7. [Metric, Formula, Interpretation]**
+
+&#10230;
+
+<br>
+
+**8. Overall performance of model**
+
+&#10230;
+
+<br>
+
+**9. How accurate the positive predictions are**
+
+&#10230;
+
+<br>
+
+**10. Coverage of actual positive sample**
+
+&#10230;
+
+<br>
+
+**11. Coverage of actual negative sample**
+
+&#10230;
+
+<br>
+
+**12. Hybrid metric useful for unbalanced classes**
+
+&#10230;
+
+<br>
+
+**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**14. [Metric, Formula, Equivalent]**
+
+&#10230;
+
+<br>
+
+**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
+
+&#10230;
+
+<br>
+
+**16. [Actual, Predicted]**
+
+&#10230;
+
+<br>
+
+**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
+
+&#10230;
+
+<br>
+
+**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
+
+&#10230;
+
+<br>
+
+**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
+
+&#10230;
+
+<br>
+
+**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
+
+&#10230;
+
+<br>
+
+**22. Model selection**
+
+&#10230;
+
+<br>
+
+**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230;
+
+<br>
+
+**24. [Training set, Validation set, Testing set]**
+
+&#10230;
+
+<br>
+
+**25. [Model is trained, Model is assessed, Model gives predictions]**
+
+&#10230;
+
+<br>
+
+**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
+
+&#10230;
+
+<br>
+
+**27. [Also called hold-out or development set, Unseen data]**
+
+&#10230;
+
+<br>
+
+**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+&#10230;
+
+<br>
+
+**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
+
+&#10230;
+
+<br>
+
+**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
+
+&#10230;
+
+<br>
+
+**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
+
+&#10230;
+
+<br>
+
+**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230;
+
+<br>
+
+**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230;
+
+<br>
+
+**35. Diagnostics**
+
+&#10230;
+
+<br>
+
+**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
+
+&#10230;
+
+<br>
+
+**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
+
+&#10230;
+
+<br>
+
+**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
+
+&#10230;
+
+<br>
+
+**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
+
+&#10230;
+
+<br>
+
+**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
+
+&#10230;
+
+<br>
+
+**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
+
+&#10230;
+
+<br>
+
+**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
+
+&#10230;
+
+<br>
+
+**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
+
+&#10230;
+
+<br>
+
+**44. Regression metrics**
+
+&#10230;
+
+<br>
+
+**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
+
+&#10230;
+
+<br>
+
+**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
+
+&#10230;
+
+<br>
+
+**47. [Model selection, cross-validation, regularization]**
+
+&#10230;
+
+<br>
+
+**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
+
+&#10230;
diff --git a/vi/cheatsheet-supervised-learning.md b/vi/cheatsheet-supervised-learning.md
new file mode 100644
index 000000000..a6b19ea1c
--- /dev/null
+++ b/vi/cheatsheet-supervised-learning.md
@@ -0,0 +1,567 @@
+**1. Supervised Learning cheatsheet**
+
+&#10230;
+
+<br>
+
+**2. Introduction to Supervised Learning**
+
+&#10230;
+
+<br>
+
+**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
+
+&#10230;
+
+<br>
+
+**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**5. [Regression, Classifier, Outcome, Examples]**
+
+&#10230;
+
+<br>
+
+**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
+
+&#10230;
+
+<br>
+
+**7. Type of model ― The different models are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
+
+&#10230;
+
+<br>
+
+**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
+
+&#10230;
+
+<br>
+
+**10. Notations and general concepts**
+
+&#10230;
+
+<br>
+
+**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
+
+&#10230;
+
+<br>
+
+**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
+
+&#10230;
+
+<br>
+
+**14. [Linear regression, Logistic regression, SVM, Neural Network]**
+
+&#10230;
+
+<br>
+
+**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
+
+&#10230;
+
+<br>
+
+**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
+
+&#10230;
+
+<br>
+
+**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
+
+&#10230;
+
+<br>
+
+**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
+
+&#10230;
+
+<br>
+
+**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
+
+&#10230;
+
+<br>
+
+**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
+
+&#10230;
+
+<br>
+
+**21. Linear models**
+
+&#10230;
+
+<br>
+
+**22. Linear regression**
+
+&#10230;
+
+<br>
+
+**23. We assume here that y|x;θ∼N(μ,σ2)**
+
+&#10230;
+
+<br>
+
+**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
+
+&#10230;
+
+<br>
+
+**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
+
+&#10230;
+
+<br>
+
+**26. Remark: the update rule is a particular case of the gradient ascent.**
+
+&#10230;
+
+<br>
+
+**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
+
+&#10230;
+
+<br>
+
+**28. Classification and logistic regression**
+
+&#10230;
+
+<br>
+
+**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
+
+&#10230;
+
+<br>
+
+**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
+
+&#10230;
+
+<br>
+
+**31. Remark: there is no closed form solution for the case of logistic regressions.**
+
+&#10230;
+
+<br>
+
+**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
+
+&#10230;
+
+<br>
+
+**33. Generalized Linear Models**
+
+&#10230;
+
+<br>
+
+**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
+
+&#10230;
+
+<br>
+
+**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
+
+&#10230;
+
+<br>
+
+**36. Here are the most common exponential distributions summed up in the following table:**
+
+&#10230;
+
+<br>
+
+**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
+
+&#10230;
+
+<br>
+
+**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
+
+&#10230;
+
+<br>
+
+**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
+
+&#10230;
+
+<br>
+
+**40. Support Vector Machines**
+
+&#10230;
+
+<br>
+
+**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
+
+&#10230;
+
+<br>
+
+**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
+
+&#10230;
+
+<br>
+
+**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
+
+&#10230;
+
+<br>
+
+**44. such that**
+
+&#10230;
+
+<br>
+
+**45. support vectors**
+
+&#10230;
+
+<br>
+
+**46. Remark: the line is defined as wTx−b=0.**
+
+&#10230;
+
+<br>
+
+**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
+
+&#10230;
+
+<br>
+
+**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
+
+&#10230;
+
+<br>
+
+**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
+
+&#10230;
+
+<br>
+
+**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
+
+&#10230;
+
+<br>
+
+**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
+
+&#10230;
+
+<br>
+
+**53. Remark: the coefficients βi are called the Lagrange multipliers.**
+
+&#10230;
+
+<br>
+
+**54. Generative Learning**
+
+&#10230;
+
+<br>
+
+**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
+
+&#10230;
+
+<br>
+
+**56. Gaussian Discriminant Analysis**
+
+&#10230;
+
+<br>
+
+**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
+
+&#10230;
+
+<br>
+
+**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
+
+&#10230;
+
+<br>
+
+**59. Naive Bayes**
+
+&#10230;
+
+<br>
+
+**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
+
+&#10230;
+
+<br>
+
+**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
+
+&#10230;
+
+<br>
+
+**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
+
+&#10230;
+
+<br>
+
+**63. Tree-based and ensemble methods**
+
+&#10230;
+
+<br>
+
+**64. These methods can be used for both regression and classification problems.**
+
+&#10230;
+
+<br>
+
+**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
+
+&#10230;
+
+<br>
+
+**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
+
+&#10230;
+
+<br>
+
+**67. Remark: random forests are a type of ensemble methods.**
+
+&#10230;
+
+<br>
+
+**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**69. [Adaptive boosting, Gradient boosting]**
+
+&#10230;
+
+<br>
+
+**70. High weights are put on errors to improve at the next boosting step**
+
+&#10230;
+
+<br>
+
+**71. Weak learners trained on remaining errors**
+
+&#10230;
+
+<br>
+
+**72. Other non-parametric approaches**
+
+&#10230;
+
+<br>
+
+**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
+
+&#10230;
+
+<br>
+
+**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
+
+&#10230;
+
+<br>
+
+**75. Learning Theory**
+
+&#10230;
+
+<br>
+
+**76. Union bound ― Let A1,...,Ak be k events. We have:**
+
+&#10230;
+
+<br>
+
+**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
+
+&#10230;
+
+<br>
+
+**78. Remark: this inequality is also known as the Chernoff bound.**
+
+&#10230;
+
+<br>
+
+**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
+
+&#10230;
+
+<br>
+
+**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
+
+&#10230;
+
+<br>
+
+**81: the training and testing sets follow the same distribution **
+
+&#10230;
+
+<br>
+
+**82. the training examples are drawn independently**
+
+&#10230;
+
+<br>
+
+**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
+
+&#10230;
+
+<br>
+
+**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
+
+&#10230;
+
+<br>
+
+**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
+
+&#10230;
+
+<br>
+
+**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
+
+&#10230;
+
+<br>
+
+**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
+
+&#10230;
+
+<br>
+
+**88. [Introduction, Type of prediction, Type of model]**
+
+&#10230;
+
+<br>
+
+**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
+
+&#10230;
+
+<br>
+
+**90. [Linear models, linear regression, logistic regression, generalized linear models]**
+
+&#10230;
+
+<br>
+
+**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
+
+&#10230;
+
+<br>
+
+**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
+
+&#10230;
+
+<br>
+
+**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
+
+&#10230;
+
+<br>
+
+**94. [Other methods, k-NN]**
+
+&#10230;
+
+<br>
+
+**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
+
+&#10230;
diff --git a/vi/cheatsheet-unsupervised-learning.md b/vi/cheatsheet-unsupervised-learning.md
new file mode 100644
index 000000000..6daab3b21
--- /dev/null
+++ b/vi/cheatsheet-unsupervised-learning.md
@@ -0,0 +1,340 @@
+**1. Unsupervised Learning cheatsheet**
+
+&#10230;
+
+<br>
+
+**2. Introduction to Unsupervised Learning**
+
+&#10230;
+
+<br>
+
+**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
+
+&#10230;
+
+<br>
+
+**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
+
+&#10230;
+
+<br>
+
+**5. Clustering**
+
+&#10230;
+
+<br>
+
+**6. Expectation-Maximization**
+
+&#10230;
+
+<br>
+
+**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
+
+&#10230;
+
+<br>
+
+**8. [Setting, Latent variable z, Comments]**
+
+&#10230;
+
+<br>
+
+**9. [Mixture of k Gaussians, Factor analysis]**
+
+&#10230;
+
+<br>
+
+**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230;
+
+<br>
+
+**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
+
+&#10230;
+
+<br>
+
+**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
+
+&#10230;
+
+<br>
+
+**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
+
+&#10230;
+
+<br>
+
+**14. k-means clustering**
+
+&#10230;
+
+<br>
+
+**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
+
+&#10230;
+
+<br>
+
+**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230;
+
+<br>
+
+**17. [Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230;
+
+<br>
+
+**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
+
+&#10230;
+
+<br>
+
+**19. Hierarchical clustering**
+
+&#10230;
+
+<br>
+
+**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
+
+&#10230;
+
+<br>
+
+**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**22. [Ward linkage, Average linkage, Complete linkage]**
+
+&#10230;
+
+<br>
+
+**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
+
+&#10230;
+
+<br>
+
+**24. Clustering assessment metrics**
+
+&#10230;
+
+<br>
+
+**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
+
+&#10230;
+
+<br>
+
+**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
+
+&#10230;
+
+<br>
+
+**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
+
+&#10230;
+
+<br>
+
+**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+**29. Dimension reduction**
+
+&#10230;
+
+<br>
+
+**30. Principal component analysis**
+
+&#10230;
+
+<br>
+
+**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
+
+&#10230;
+
+<br>
+
+**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+
+<br>
+
+**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+
+<br>
+
+**34. diagonal**
+
+&#10230;
+
+<br>
+
+**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230;
+
+<br>
+
+**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
+dimensions by maximizing the variance of the data as follows:**
+
+&#10230;
+
+<br>
+
+**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230;
+
+<br>
+
+**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
+
+&#10230;
+
+<br>
+
+**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
+
+&#10230;
+
+<br>
+
+**40. Step 4: Project the data on spanR(u1,...,uk).**
+
+&#10230;
+
+<br>
+
+**41. This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230;
+
+<br>
+
+**42. [Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230;
+
+<br>
+
+**43. Independent component analysis**
+
+&#10230;
+
+<br>
+
+**44. It is a technique meant to find the underlying generating sources.**
+
+&#10230;
+
+<br>
+
+**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
+
+&#10230;
+
+<br>
+
+**46. The goal is to find the unmixing matrix W=A−1.**
+
+&#10230;
+
+<br>
+
+**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
+
+&#10230;
+
+<br>
+
+**48. Write the probability of x=As=W−1s as:**
+
+&#10230;
+
+<br>
+
+**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
+
+&#10230;
+
+<br>
+
+**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
+
+&#10230;
+
+<br>
+
+**51. The Machine Learning cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+**52. Original authors**
+
+&#10230;
+
+<br>
+
+**53. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**54. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**55. [Introduction, Motivation, Jensen's inequality]**
+
+&#10230;
+
+<br>
+
+**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
+
+&#10230;
+
+<br>
+
+**57. [Dimension reduction, PCA, ICA]**
+
+&#10230;
diff --git a/vi/convolutional-neural-networks.md b/vi/convolutional-neural-networks.md
new file mode 100644
index 000000000..cb7e676ca
--- /dev/null
+++ b/vi/convolutional-neural-networks.md
@@ -0,0 +1,716 @@
+**Convolutional Neural Networks translation**
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230; Convolutional Neural Networks cheatsheet
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Deep Learning
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+&#10230; [Tổng quan, Kiến trúc]
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230; [Loại tầng (layer), Convolution (Tích chập), Pooling, Fully connected]
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230;
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230;
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230;
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230;
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230;
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230;
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230;
+
+<br>
+
+
+**12. Overview**
+
+&#10230;
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230;
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230;
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230;
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230;
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230;
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230;
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230;
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230;
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230;
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230;
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230;
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230;
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230;
+
+<br>
+
+
+**26. Filter**
+
+&#10230;
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230;
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230;
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230;
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230;
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230;
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230;
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230;
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230;
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230;
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230;
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230;
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230;
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230;
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230;
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230;
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230;
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230;
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230;
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230;
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230;
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**48. where**
+
+&#10230;
+
+<br>
+
+
+**49. Object detection**
+
+&#10230;
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230;
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230;
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230;
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230;
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230;
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230;
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230;
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230;
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230;
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230;
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230;
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230;
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230;
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230;
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230;
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230;
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230;
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230;
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230;
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230;
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230;
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230;
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230;
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230;
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230;
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230;
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230;
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230;
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230;
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230;
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230;
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230;
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230;
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230;
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230;
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230;
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230;
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230;
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230;
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230;
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230;
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230;
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230;
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+
+**98. Original authors**
+
+&#10230;
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230;
+
+<br>
diff --git a/vi/deep-learning-tips-and-tricks.md b/vi/deep-learning-tips-and-tricks.md
new file mode 100644
index 000000000..347234ec2
--- /dev/null
+++ b/vi/deep-learning-tips-and-tricks.md
@@ -0,0 +1,457 @@
+**Deep Learning Tips and Tricks translation**
+
+<br>
+
+**1. Deep Learning Tips and Tricks cheatsheet**
+
+&#10230;
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230;
+
+<br>
+
+
+**3. Tips and tricks**
+
+&#10230;
+
+<br>
+
+
+**4. [Data processing, Data augmentation, Batch normalization]**
+
+&#10230;
+
+<br>
+
+
+**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
+
+&#10230;
+
+<br>
+
+
+**6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
+
+&#10230;
+
+<br>
+
+
+**7. [Regularization, Dropout, Weight regularization, Early stopping]**
+
+&#10230;
+
+<br>
+
+
+**8. [Good practices, Overfitting small batch, Gradient checking]**
+
+&#10230;
+
+<br>
+
+
+**9. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**10. Data processing**
+
+&#10230;
+
+<br>
+
+
+**11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
+
+&#10230;
+
+<br>
+
+
+**12. [Original, Flip, Rotation, Random crop]**
+
+&#10230;
+
+<br>
+
+
+**13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
+
+&#10230;
+
+<br>
+
+
+**14. [Color shift, Noise addition, Information loss, Contrast change]**
+
+&#10230;
+
+<br>
+
+
+**15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
+
+&#10230;
+
+<br>
+
+
+**16. Remark: data is usually augmented on the fly during training.**
+
+&#10230;
+
+<br>
+
+
+**17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
+
+&#10230;
+
+<br>
+
+
+**18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
+
+&#10230;
+
+<br>
+
+
+**19. Training a neural network**
+
+&#10230;
+
+<br>
+
+
+**20. Definitions**
+
+&#10230;
+
+<br>
+
+
+**21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
+
+&#10230;
+
+<br>
+
+
+**22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
+
+&#10230;
+
+<br>
+
+
+**23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
+
+&#10230;
+
+<br>
+
+
+**24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**25. Finding optimal weights**
+
+&#10230;
+
+<br>
+
+
+**26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
+
+&#10230;
+
+<br>
+
+
+**27. Using this method, each weight is updated with the rule:**
+
+&#10230;
+
+<br>
+
+
+**28. Updating weights ― In a neural network, weights are updated as follows:**
+
+&#10230;
+
+<br>
+
+
+**29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
+
+&#10230;
+
+<br>
+
+
+**30. [Forward propagation, Backpropagation, Weights update]**
+
+&#10230;
+
+<br>
+
+
+**31. Parameter tuning**
+
+&#10230;
+
+<br>
+
+
+**32. Weights initialization**
+
+&#10230;
+
+<br>
+
+
+**33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
+
+&#10230;
+
+<br>
+
+
+**34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
+
+&#10230;
+
+<br>
+
+
+**35. [Training size, Illustration, Explanation]**
+
+&#10230;
+
+<br>
+
+
+**36. [Small, Medium, Large]**
+
+&#10230;
+
+<br>
+
+
+**37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
+
+&#10230;
+
+<br>
+
+
+**38. Optimizing convergence**
+
+&#10230;
+
+<br>
+
+
+**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.
+**
+
+&#10230;
+
+<br>
+
+
+**40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**41. [Method, Explanation, Update of w, Update of b]**
+
+&#10230;
+
+<br>
+
+
+**42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
+
+&#10230;
+
+<br>
+
+
+**43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
+
+&#10230;
+
+<br>
+
+
+**44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
+
+&#10230;
+
+<br>
+
+
+**45. Remark: other methods include Adadelta, Adagrad and SGD.**
+
+&#10230;
+
+<br>
+
+
+**46. Regularization**
+
+&#10230;
+
+<br>
+
+
+**47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
+
+&#10230;
+
+<br>
+
+
+**48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
+
+&#10230;
+
+<br>
+
+
+**49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**50. [LASSO, Ridge, Elastic Net]**
+
+&#10230;
+
+<br>
+
+**50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230;
+
+<br>
+
+**51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
+
+&#10230;
+
+<br>
+
+
+**52. [Error, Validation, Training, early stopping, Epochs]**
+
+&#10230;
+
+<br>
+
+
+**53. Good practices**
+
+&#10230;
+
+<br>
+
+
+**54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
+
+&#10230;
+
+<br>
+
+
+**55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
+
+&#10230;
+
+<br>
+
+
+**56. [Type, Numerical gradient, Analytical gradient]**
+
+&#10230;
+
+<br>
+
+
+**57. [Formula, Comments]**
+
+&#10230;
+
+<br>
+
+
+**58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
+
+&#10230;
+
+<br>
+
+
+**59. ['Exact' result, Direct computation, Used in the final implementation]**
+
+&#10230;
+
+<br>
+
+
+**60. The Deep Learning cheatsheets are now available in [target language].
+
+&#10230;
+
+
+**61. Original authors**
+
+&#10230;
+
+<br>
+
+**62.Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**63.Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**64.View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+**65.By X and Y**
+
+&#10230;
+
+<br>
diff --git a/vi/recurrent-neural-networks.md b/vi/recurrent-neural-networks.md
new file mode 100644
index 000000000..191e400a1
--- /dev/null
+++ b/vi/recurrent-neural-networks.md
@@ -0,0 +1,677 @@
+**Recurrent Neural Networks translation**
+
+<br>
+
+**1. Recurrent Neural Networks cheatsheet**
+
+&#10230;
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230;
+
+<br>
+
+
+**3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
+
+&#10230;
+
+<br>
+
+
+**4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
+
+&#10230;
+
+<br>
+
+
+**5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
+
+&#10230;
+
+<br>
+
+
+**6. [Comparing words, Cosine similarity, t-SNE]**
+
+&#10230;
+
+<br>
+
+
+**7. [Language model, n-gram, Perplexity]**
+
+&#10230;
+
+<br>
+
+
+**8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
+
+&#10230;
+
+<br>
+
+
+**9. [Attention, Attention model, Attention weights]**
+
+&#10230;
+
+<br>
+
+
+**10. Overview**
+
+&#10230;
+
+<br>
+
+
+**11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
+
+&#10230;
+
+<br>
+
+
+**12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
+
+&#10230;
+
+<br>
+
+
+**13. and**
+
+&#10230;
+
+<br>
+
+
+**14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
+
+&#10230;
+
+<br>
+
+
+**15. The pros and cons of a typical RNN architecture are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
+
+&#10230;
+
+<br>
+
+
+**17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
+
+&#10230;
+
+<br>
+
+
+**18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**19. [Type of RNN, Illustration, Example]**
+
+&#10230;
+
+<br>
+
+
+**20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
+
+&#10230;
+
+<br>
+
+
+**21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
+
+&#10230;
+
+<br>
+
+
+**22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
+
+&#10230;
+
+<br>
+
+
+**23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
+
+&#10230;
+
+<br>
+
+
+**24. Handling long term dependencies**
+
+&#10230;
+
+<br>
+
+
+**25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
+
+&#10230;
+
+<br>
+
+
+**26. [Sigmoid, Tanh, RELU]**
+
+&#10230;
+
+<br>
+
+
+**27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
+
+&#10230;
+
+<br>
+
+
+**28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
+
+&#10230;
+
+<br>
+
+
+**29. clipped**
+
+&#10230;
+
+<br>
+
+
+**30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
+
+&#10230;
+
+<br>
+
+
+**31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**32. [Type of gate, Role, Used in]**
+
+&#10230;
+
+<br>
+
+
+**33. [Update gate, Relevance gate, Forget gate, Output gate]**
+
+&#10230;
+
+<br>
+
+
+**34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
+
+&#10230;
+
+<br>
+
+
+**35. [LSTM, GRU]**
+
+&#10230;
+
+<br>
+
+
+**36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
+
+&#10230;
+
+<br>
+
+
+**37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
+
+&#10230;
+
+<br>
+
+
+**38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
+
+&#10230;
+
+<br>
+
+
+**39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
+
+&#10230;
+
+<br>
+
+
+**40. [Bidirectional (BRNN), Deep (DRNN)]**
+
+&#10230;
+
+<br>
+
+
+**41. Learning word representation**
+
+&#10230;
+
+<br>
+
+
+**42. In this section, we note V the vocabulary and |V| its size.**
+
+&#10230;
+
+<br>
+
+
+**43. Motivation and notations**
+
+&#10230;
+
+<br>
+
+
+**44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**45. [1-hot representation, Word embedding]**
+
+&#10230;
+
+<br>
+
+
+**46. [teddy bear, book, soft]**
+
+&#10230;
+
+<br>
+
+
+**47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
+
+&#10230;
+
+<br>
+
+
+**48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
+
+&#10230;
+
+<br>
+
+
+**49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
+
+&#10230;
+
+<br>
+
+
+**50. Word embeddings**
+
+&#10230;
+
+<br>
+
+
+**51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
+
+&#10230;
+
+<br>
+
+
+**52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
+
+&#10230;
+
+<br>
+
+
+**53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
+
+&#10230;
+
+<br>
+
+
+**54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
+
+&#10230;
+
+<br>
+
+
+**55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
+
+&#10230;
+
+<br>
+
+
+**56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
+
+&#10230;
+
+<br>
+
+
+**57. Remark: this method is less computationally expensive than the skip-gram model.**
+
+&#10230;
+
+<br>
+
+
+**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
+
+&#10230;
+
+<br>
+
+
+**58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
+Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
+
+&#10230;
+
+<br>
+
+
+**59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
+
+&#10230;
+
+<br>
+
+
+**60. Comparing words**
+
+&#10230;
+
+<br>
+
+
+**61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
+
+&#10230;
+
+<br>
+
+
+**62. Remark: θ is the angle between words w1 and w2.**
+
+&#10230;
+
+<br>
+
+
+**63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
+
+&#10230;
+
+<br>
+
+
+**64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
+
+&#10230;
+
+<br>
+
+
+**65. Language model**
+
+&#10230;
+
+<br>
+
+
+**66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
+
+&#10230;
+
+<br>
+
+
+**67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
+
+&#10230;
+
+<br>
+
+
+**68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**69. Remark: PP is commonly used in t-SNE.**
+
+&#10230;
+
+<br>
+
+
+**70. Machine translation**
+
+&#10230;
+
+<br>
+
+
+**71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
+
+&#10230;
+
+<br>
+
+
+**72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
+
+&#10230;
+
+<br>
+
+
+**73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
+
+&#10230;
+
+<br>
+
+
+**74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
+
+&#10230;
+
+<br>
+
+
+**75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
+
+&#10230;
+
+<br>
+
+
+**76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
+
+&#10230;
+
+<br>
+
+
+**77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
+
+&#10230;
+
+<br>
+
+
+**78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
+
+&#10230;
+
+<br>
+
+
+**79. [Case, Root cause, Remedies]**
+
+&#10230;
+
+<br>
+
+
+**80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
+
+&#10230;
+
+<br>
+
+
+**81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**82. where pn is the bleu score on n-gram only defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
+
+&#10230;
+
+<br>
+
+
+**84. Attention**
+
+&#10230;
+
+<br>
+
+
+**85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
+
+&#10230;
+
+<br>
+
+
+**86. with**
+
+&#10230;
+
+<br>
+
+
+**87. Remark: the attention scores are commonly used in image captioning and machine translation.**
+
+&#10230;
+
+<br>
+
+
+**88. A cute teddy bear is reading Persian literature.**
+
+&#10230;
+
+<br>
+
+
+**89. Attention weight ― The amount of attention that the output y<t> should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**90. Remark: computation complexity is quadratic with respect to Tx.**
+
+&#10230;
+
+<br>
+
+
+**91. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+**92. Original authors**
+
+&#10230;
+
+<br>
+
+**93. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**94. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**95. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+**96. By X and Y**
+
+&#10230;
+
+<br>
diff --git a/vi/refresher-linear-algebra.md b/vi/refresher-linear-algebra.md
new file mode 100644
index 000000000..a6b440d1e
--- /dev/null
+++ b/vi/refresher-linear-algebra.md
@@ -0,0 +1,339 @@
+**1. Linear Algebra and Calculus refresher**
+
+&#10230;
+
+<br>
+
+**2. General notations**
+
+&#10230;
+
+<br>
+
+**3. Definitions**
+
+&#10230;
+
+<br>
+
+**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
+
+&#10230;
+
+<br>
+
+**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
+
+&#10230;
+
+<br>
+
+**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
+
+&#10230;
+
+<br>
+
+**7. Main matrices**
+
+&#10230;
+
+<br>
+
+**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
+
+&#10230;
+
+<br>
+
+**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
+
+&#10230;
+
+<br>
+
+**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
+
+&#10230;
+
+<br>
+
+**11. Remark: we also note D as diag(d1,...,dn).**
+
+&#10230;
+
+<br>
+
+**12. Matrix operations**
+
+&#10230;
+
+<br>
+
+**13. Multiplication**
+
+&#10230;
+
+<br>
+
+**14. Vector-vector ― There are two types of vector-vector products:**
+
+&#10230;
+
+<br>
+
+**15. inner product: for x,y∈Rn, we have:**
+
+&#10230;
+
+<br>
+
+**16. outer product: for x∈Rm,y∈Rn, we have:**
+
+&#10230;
+
+<br>
+
+**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
+
+&#10230;
+
+<br>
+
+**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
+
+&#10230;
+
+<br>
+
+**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
+
+&#10230;
+
+<br>
+
+**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
+
+&#10230;
+
+<br>
+
+**21. Other operations**
+
+&#10230;
+
+<br>
+
+**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
+
+&#10230;
+
+<br>
+
+**23. Remark: for matrices A,B, we have (AB)T=BTAT**
+
+&#10230;
+
+<br>
+
+**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
+
+&#10230;
+
+<br>
+
+**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
+
+&#10230;
+
+<br>
+
+**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
+
+&#10230;
+
+<br>
+
+**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
+
+&#10230;
+
+<br>
+
+**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
+
+&#10230;
+
+<br>
+
+**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
+
+&#10230;
+
+<br>
+
+**30. Matrix properties**
+
+&#10230;
+
+<br>
+
+**31. Definitions**
+
+&#10230;
+
+<br>
+
+**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
+
+&#10230;
+
+<br>
+
+**33. [Symmetric, Antisymmetric]**
+
+&#10230;
+
+<br>
+
+**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
+
+&#10230;
+
+<br>
+
+**35. N(ax)=|a|N(x) for a scalar**
+
+&#10230;
+
+<br>
+
+**36. if N(x)=0, then x=0**
+
+&#10230;
+
+<br>
+
+**37. For x∈V, the most commonly used norms are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**38. [Norm, Notation, Definition, Use case]**
+
+&#10230;
+
+<br>
+
+**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
+
+&#10230;
+
+<br>
+
+**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
+
+&#10230;
+
+<br>
+
+**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
+
+&#10230;
+
+<br>
+
+**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
+
+&#10230;
+
+<br>
+
+**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
+
+&#10230;
+
+<br>
+
+**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+
+<br>
+
+**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+
+<br>
+
+**46. diagonal**
+
+&#10230;
+
+<br>
+
+**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
+
+&#10230;
+
+<br>
+
+**48. Matrix calculus**
+
+&#10230;
+
+<br>
+
+**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
+
+&#10230;
+
+<br>
+
+**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
+
+&#10230;
+
+<br>
+
+**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
+
+&#10230;
+
+<br>
+
+**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
+
+&#10230;
+
+<br>
+
+**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
+
+&#10230;
+
+<br>
+
+**54. [General notations, Definitions, Main matrices]**
+
+&#10230;
+
+<br>
+
+**55. [Matrix operations, Multiplication, Other operations]**
+
+&#10230;
+
+<br>
+
+**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
+
+&#10230;
+
+<br>
+
+**57. [Matrix calculus, Gradient, Hessian, Operations]**
+
+&#10230;
diff --git a/vi/refresher-probability.md b/vi/refresher-probability.md
new file mode 100644
index 000000000..5c9b34656
--- /dev/null
+++ b/vi/refresher-probability.md
@@ -0,0 +1,381 @@
+**1. Probabilities and Statistics refresher**
+
+&#10230;
+
+<br>
+
+**2. Introduction to Probability and Combinatorics**
+
+&#10230;
+
+<br>
+
+**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
+
+&#10230;
+
+<br>
+
+**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
+
+&#10230;
+
+<br>
+
+**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
+
+&#10230;
+
+<br>
+
+**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
+
+&#10230;
+
+<br>
+
+**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
+
+&#10230;
+
+<br>
+
+**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
+
+&#10230;
+
+<br>
+
+**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
+
+&#10230;
+
+<br>
+
+**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
+
+&#10230;
+
+<br>
+
+**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
+
+&#10230;
+
+<br>
+
+**12. Conditional Probability**
+
+&#10230;
+
+<br>
+
+**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
+
+&#10230;
+
+<br>
+
+**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
+
+&#10230;
+
+<br>
+
+**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
+
+&#10230;
+
+<br>
+
+**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
+
+&#10230;
+
+<br>
+
+**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
+
+&#10230;
+
+<br>
+
+**18. Independence ― Two events A and B are independent if and only if we have:**
+
+&#10230;
+
+<br>
+
+**19. Random Variables**
+
+&#10230;
+
+<br>
+
+**20. Definitions**
+
+&#10230;
+
+<br>
+
+**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
+
+&#10230;
+
+<br>
+
+**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
+
+&#10230;
+
+<br>
+
+**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
+
+&#10230;
+
+<br>
+
+**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
+
+&#10230;
+
+<br>
+
+**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
+
+&#10230;
+
+<br>
+
+**26. [Case, CDF F, PDF f, Properties of PDF]**
+
+&#10230;
+
+<br>
+
+**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
+
+&#10230;
+
+<br>
+
+**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
+
+&#10230;
+
+<br>
+
+**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
+
+&#10230;
+
+<br>
+
+**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
+
+&#10230;
+
+<br>
+
+**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
+
+&#10230;
+
+<br>
+
+**32. Probability Distributions**
+
+&#10230;
+
+<br>
+
+**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
+
+&#10230;
+
+<br>
+
+**34. Main distributions ― Here are the main distributions to have in mind:**
+
+&#10230;
+
+<br>
+
+**35. [Type, Distribution]**
+
+&#10230;
+
+<br>
+
+**36. Jointly Distributed Random Variables**
+
+&#10230;
+
+<br>
+
+**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
+
+&#10230;
+
+<br>
+
+**38. [Case, Marginal density, Cumulative function]**
+
+&#10230;
+
+<br>
+
+**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
+
+&#10230;
+
+<br>
+
+**40. Independence ― Two random variables X and Y are said to be independent if we have:**
+
+&#10230;
+
+<br>
+
+**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
+
+&#10230;
+
+<br>
+
+**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
+
+&#10230;
+
+<br>
+
+**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
+
+&#10230;
+
+<br>
+
+**44. Remark 2: If X and Y are independent, then ρXY=0.**
+
+&#10230;
+
+<br>
+
+**45. Parameter estimation**
+
+&#10230;
+
+<br>
+
+**46. Definitions**
+
+&#10230;
+
+<br>
+
+**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
+
+&#10230;
+
+<br>
+
+**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
+
+&#10230;
+
+<br>
+
+**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
+
+&#10230;
+
+<br>
+
+**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
+
+&#10230;
+
+<br>
+
+**51. Estimating the mean**
+
+&#10230;
+
+<br>
+
+**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
+
+&#10230;
+
+<br>
+
+**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
+
+&#10230;
+
+<br>
+
+**55. Estimating the variance**
+
+&#10230;
+
+<br>
+
+**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
+
+&#10230;
+
+<br>
+
+**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
+
+&#10230;
+
+<br>
+
+**59. [Introduction, Sample space, Event, Permutation]**
+
+&#10230;
+
+<br>
+
+**60. [Conditional probability, Bayes' rule, Independence]**
+
+&#10230;
+
+<br>
+
+**61. [Random variables, Definitions, Expectation, Variance]**
+
+&#10230;
+
+<br>
+
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
+
+&#10230;
+
+<br>
+
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
+
+&#10230;
+
+<br>
+
+**64. [Parameter estimation, Mean, Variance]**
+
+&#10230;

From 65b4de28126e8fade55503df6774a8c3d90a5957 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Thu, 6 Jun 2019 22:58:37 +0900
Subject: [PATCH 220/531] vi translating for cheatsheet-deep-learning

---
 ...tsheet-machine-learning-tips-and-tricks.md | 285 -------
 vi/cheatsheet-supervised-learning.md          | 567 --------------
 vi/cheatsheet-unsupervised-learning.md        | 340 ---------
 vi/convolutional-neural-networks.md           | 716 ------------------
 vi/deep-learning-tips-and-tricks.md           | 457 -----------
 vi/recurrent-neural-networks.md               | 677 -----------------
 vi/refresher-linear-algebra.md                | 339 ---------
 vi/refresher-probability.md                   | 381 ----------
 8 files changed, 3762 deletions(-)
 delete mode 100644 vi/cheatsheet-machine-learning-tips-and-tricks.md
 delete mode 100644 vi/cheatsheet-supervised-learning.md
 delete mode 100644 vi/cheatsheet-unsupervised-learning.md
 delete mode 100644 vi/convolutional-neural-networks.md
 delete mode 100644 vi/deep-learning-tips-and-tricks.md
 delete mode 100644 vi/recurrent-neural-networks.md
 delete mode 100644 vi/refresher-linear-algebra.md
 delete mode 100644 vi/refresher-probability.md

diff --git a/vi/cheatsheet-machine-learning-tips-and-tricks.md b/vi/cheatsheet-machine-learning-tips-and-tricks.md
deleted file mode 100644
index 9712297b8..000000000
--- a/vi/cheatsheet-machine-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,285 +0,0 @@
-**1. Machine Learning tips and tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Classification metrics**
-
-&#10230;
-
-<br>
-
-**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
-
-&#10230;
-
-<br>
-
-**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**5. [Predicted class, Actual class]**
-
-&#10230;
-
-<br>
-
-**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
-
-&#10230;
-
-<br>
-
-**7. [Metric, Formula, Interpretation]**
-
-&#10230;
-
-<br>
-
-**8. Overall performance of model**
-
-&#10230;
-
-<br>
-
-**9. How accurate the positive predictions are**
-
-&#10230;
-
-<br>
-
-**10. Coverage of actual positive sample**
-
-&#10230;
-
-<br>
-
-**11. Coverage of actual negative sample**
-
-&#10230;
-
-<br>
-
-**12. Hybrid metric useful for unbalanced classes**
-
-&#10230;
-
-<br>
-
-**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**14. [Metric, Formula, Equivalent]**
-
-&#10230;
-
-<br>
-
-**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
-
-&#10230;
-
-<br>
-
-**16. [Actual, Predicted]**
-
-&#10230;
-
-<br>
-
-**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
-
-&#10230;
-
-<br>
-
-**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
-
-&#10230;
-
-<br>
-
-**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
-
-&#10230;
-
-<br>
-
-**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
-
-&#10230;
-
-<br>
-
-**22. Model selection**
-
-&#10230;
-
-<br>
-
-**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
-
-&#10230;
-
-<br>
-
-**24. [Training set, Validation set, Testing set]**
-
-&#10230;
-
-<br>
-
-**25. [Model is trained, Model is assessed, Model gives predictions]**
-
-&#10230;
-
-<br>
-
-**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
-
-&#10230;
-
-<br>
-
-**27. [Also called hold-out or development set, Unseen data]**
-
-&#10230;
-
-<br>
-
-**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
-
-&#10230;
-
-<br>
-
-**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
-
-&#10230;
-
-<br>
-
-**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
-
-&#10230;
-
-<br>
-
-**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
-
-&#10230;
-
-<br>
-
-**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
-
-&#10230;
-
-<br>
-
-**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-**35. Diagnostics**
-
-&#10230;
-
-<br>
-
-**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
-
-&#10230;
-
-<br>
-
-**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
-
-&#10230;
-
-<br>
-
-**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
-
-&#10230;
-
-<br>
-
-**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
-
-&#10230;
-
-<br>
-
-**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
-
-&#10230;
-
-<br>
-
-**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
-
-&#10230;
-
-<br>
-
-**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
-
-&#10230;
-
-<br>
-
-**44. Regression metrics**
-
-&#10230;
-
-<br>
-
-**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-**47. [Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/vi/cheatsheet-supervised-learning.md b/vi/cheatsheet-supervised-learning.md
deleted file mode 100644
index a6b19ea1c..000000000
--- a/vi/cheatsheet-supervised-learning.md
+++ /dev/null
@@ -1,567 +0,0 @@
-**1. Supervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Supervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
-
-&#10230;
-
-<br>
-
-**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**5. [Regression, Classifier, Outcome, Examples]**
-
-&#10230;
-
-<br>
-
-**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**7. Type of model ― The different models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
-
-&#10230;
-
-<br>
-
-**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**10. Notations and general concepts**
-
-&#10230;
-
-<br>
-
-**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
-
-&#10230;
-
-<br>
-
-**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
-
-&#10230;
-
-<br>
-
-**14. [Linear regression, Logistic regression, SVM, Neural Network]**
-
-&#10230;
-
-<br>
-
-**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
-
-&#10230;
-
-<br>
-
-**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
-
-&#10230;
-
-<br>
-
-**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
-
-&#10230;
-
-<br>
-
-**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
-
-&#10230;
-
-<br>
-
-**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
-
-&#10230;
-
-<br>
-
-**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
-
-&#10230;
-
-<br>
-
-**21. Linear models**
-
-&#10230;
-
-<br>
-
-**22. Linear regression**
-
-&#10230;
-
-<br>
-
-**23. We assume here that y|x;θ∼N(μ,σ2)**
-
-&#10230;
-
-<br>
-
-**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
-
-&#10230;
-
-<br>
-
-**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
-
-&#10230;
-
-<br>
-
-**26. Remark: the update rule is a particular case of the gradient ascent.**
-
-&#10230;
-
-<br>
-
-**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
-
-&#10230;
-
-<br>
-
-**28. Classification and logistic regression**
-
-&#10230;
-
-<br>
-
-**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
-
-&#10230;
-
-<br>
-
-**31. Remark: there is no closed form solution for the case of logistic regressions.**
-
-&#10230;
-
-<br>
-
-**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
-
-&#10230;
-
-<br>
-
-**33. Generalized Linear Models**
-
-&#10230;
-
-<br>
-
-**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
-
-&#10230;
-
-<br>
-
-**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
-
-&#10230;
-
-<br>
-
-**36. Here are the most common exponential distributions summed up in the following table:**
-
-&#10230;
-
-<br>
-
-**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
-
-&#10230;
-
-<br>
-
-**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
-
-&#10230;
-
-<br>
-
-**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
-
-&#10230;
-
-<br>
-
-**40. Support Vector Machines**
-
-&#10230;
-
-<br>
-
-**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
-
-&#10230;
-
-<br>
-
-**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
-
-&#10230;
-
-<br>
-
-**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
-
-&#10230;
-
-<br>
-
-**44. such that**
-
-&#10230;
-
-<br>
-
-**45. support vectors**
-
-&#10230;
-
-<br>
-
-**46. Remark: the line is defined as wTx−b=0.**
-
-&#10230;
-
-<br>
-
-**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
-
-&#10230;
-
-<br>
-
-**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
-
-&#10230;
-
-<br>
-
-**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
-
-&#10230;
-
-<br>
-
-**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
-
-&#10230;
-
-<br>
-
-**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the coefficients βi are called the Lagrange multipliers.**
-
-&#10230;
-
-<br>
-
-**54. Generative Learning**
-
-&#10230;
-
-<br>
-
-**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
-
-&#10230;
-
-<br>
-
-**56. Gaussian Discriminant Analysis**
-
-&#10230;
-
-<br>
-
-**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
-
-&#10230;
-
-<br>
-
-**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
-
-&#10230;
-
-<br>
-
-**59. Naive Bayes**
-
-&#10230;
-
-<br>
-
-**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
-
-&#10230;
-
-<br>
-
-**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
-
-&#10230;
-
-<br>
-
-**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
-
-&#10230;
-
-<br>
-
-**63. Tree-based and ensemble methods**
-
-&#10230;
-
-<br>
-
-**64. These methods can be used for both regression and classification problems.**
-
-&#10230;
-
-<br>
-
-**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
-
-&#10230;
-
-<br>
-
-**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
-
-&#10230;
-
-<br>
-
-**67. Remark: random forests are a type of ensemble methods.**
-
-&#10230;
-
-<br>
-
-**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**69. [Adaptive boosting, Gradient boosting]**
-
-&#10230;
-
-<br>
-
-**70. High weights are put on errors to improve at the next boosting step**
-
-&#10230;
-
-<br>
-
-**71. Weak learners trained on remaining errors**
-
-&#10230;
-
-<br>
-
-**72. Other non-parametric approaches**
-
-&#10230;
-
-<br>
-
-**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
-
-&#10230;
-
-<br>
-
-**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**75. Learning Theory**
-
-&#10230;
-
-<br>
-
-**76. Union bound ― Let A1,...,Ak be k events. We have:**
-
-&#10230;
-
-<br>
-
-**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
-
-&#10230;
-
-<br>
-
-**78. Remark: this inequality is also known as the Chernoff bound.**
-
-&#10230;
-
-<br>
-
-**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
-
-&#10230;
-
-<br>
-
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
-
-&#10230;
-
-<br>
-
-**81: the training and testing sets follow the same distribution **
-
-&#10230;
-
-<br>
-
-**82. the training examples are drawn independently**
-
-&#10230;
-
-<br>
-
-**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
-
-&#10230;
-
-<br>
-
-**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
-
-&#10230;
-
-<br>
-
-**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
-
-&#10230;
-
-<br>
-
-**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**88. [Introduction, Type of prediction, Type of model]**
-
-&#10230;
-
-<br>
-
-**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
-
-&#10230;
-
-<br>
-
-**90. [Linear models, linear regression, logistic regression, generalized linear models]**
-
-&#10230;
-
-<br>
-
-**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
-
-&#10230;
-
-<br>
-
-**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
-
-&#10230;
-
-<br>
-
-**94. [Other methods, k-NN]**
-
-&#10230;
-
-<br>
-
-**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
-
-&#10230;
diff --git a/vi/cheatsheet-unsupervised-learning.md b/vi/cheatsheet-unsupervised-learning.md
deleted file mode 100644
index 6daab3b21..000000000
--- a/vi/cheatsheet-unsupervised-learning.md
+++ /dev/null
@@ -1,340 +0,0 @@
-**1. Unsupervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Unsupervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
-
-&#10230;
-
-<br>
-
-**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
-
-&#10230;
-
-<br>
-
-**5. Clustering**
-
-&#10230;
-
-<br>
-
-**6. Expectation-Maximization**
-
-&#10230;
-
-<br>
-
-**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
-
-&#10230;
-
-<br>
-
-**8. [Setting, Latent variable z, Comments]**
-
-&#10230;
-
-<br>
-
-**9. [Mixture of k Gaussians, Factor analysis]**
-
-&#10230;
-
-<br>
-
-**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
-
-&#10230;
-
-<br>
-
-**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
-
-&#10230;
-
-<br>
-
-**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
-
-&#10230;
-
-<br>
-
-**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
-
-&#10230;
-
-<br>
-
-**14. k-means clustering**
-
-&#10230;
-
-<br>
-
-**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
-
-&#10230;
-
-<br>
-
-**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
-
-&#10230;
-
-<br>
-
-**17. [Means initialization, Cluster assignment, Means update, Convergence]**
-
-&#10230;
-
-<br>
-
-**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
-
-&#10230;
-
-<br>
-
-**19. Hierarchical clustering**
-
-&#10230;
-
-<br>
-
-**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
-
-&#10230;
-
-<br>
-
-**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**22. [Ward linkage, Average linkage, Complete linkage]**
-
-&#10230;
-
-<br>
-
-**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
-
-&#10230;
-
-<br>
-
-**24. Clustering assessment metrics**
-
-&#10230;
-
-<br>
-
-**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
-
-&#10230;
-
-<br>
-
-**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
-
-&#10230;
-
-<br>
-
-**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
-
-&#10230;
-
-<br>
-
-**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Dimension reduction**
-
-&#10230;
-
-<br>
-
-**30. Principal component analysis**
-
-&#10230;
-
-<br>
-
-**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
-
-&#10230;
-
-<br>
-
-**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**34. diagonal**
-
-&#10230;
-
-<br>
-
-**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
-
-&#10230;
-
-<br>
-
-**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
-dimensions by maximizing the variance of the data as follows:**
-
-&#10230;
-
-<br>
-
-**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
-
-&#10230;
-
-<br>
-
-**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
-
-&#10230;
-
-<br>
-
-**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
-
-&#10230;
-
-<br>
-
-**40. Step 4: Project the data on spanR(u1,...,uk).**
-
-&#10230;
-
-<br>
-
-**41. This procedure maximizes the variance among all k-dimensional spaces.**
-
-&#10230;
-
-<br>
-
-**42. [Data in feature space, Find principal components, Data in principal components space]**
-
-&#10230;
-
-<br>
-
-**43. Independent component analysis**
-
-&#10230;
-
-<br>
-
-**44. It is a technique meant to find the underlying generating sources.**
-
-&#10230;
-
-<br>
-
-**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
-
-&#10230;
-
-<br>
-
-**46. The goal is to find the unmixing matrix W=A−1.**
-
-&#10230;
-
-<br>
-
-**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
-
-&#10230;
-
-<br>
-
-**48. Write the probability of x=As=W−1s as:**
-
-&#10230;
-
-<br>
-
-**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
-
-&#10230;
-
-<br>
-
-**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
-
-&#10230;
-
-<br>
-
-**51. The Machine Learning cheatsheets are now available in [target language].**
-
-&#10230;
-
-<br>
-
-**52. Original authors**
-
-&#10230;
-
-<br>
-
-**53. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**54. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**55. [Introduction, Motivation, Jensen's inequality]**
-
-&#10230;
-
-<br>
-
-**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
-
-&#10230;
-
-<br>
-
-**57. [Dimension reduction, PCA, ICA]**
-
-&#10230;
diff --git a/vi/convolutional-neural-networks.md b/vi/convolutional-neural-networks.md
deleted file mode 100644
index cb7e676ca..000000000
--- a/vi/convolutional-neural-networks.md
+++ /dev/null
@@ -1,716 +0,0 @@
-**Convolutional Neural Networks translation**
-
-<br>
-
-**1. Convolutional Neural Networks cheatsheet**
-
-&#10230; Convolutional Neural Networks cheatsheet
-
-<br>
-
-
-**2. CS 230 - Deep Learning**
-
-&#10230; CS 230 - Deep Learning
-
-<br>
-
-
-**3. [Overview, Architecture structure]**
-
-&#10230; [Tổng quan, Kiến trúc]
-
-<br>
-
-
-**4. [Types of layer, Convolution, Pooling, Fully connected]**
-
-&#10230; [Loại tầng (layer), Convolution (Tích chập), Pooling, Fully connected]
-
-<br>
-
-
-**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
-
-&#10230;
-
-<br>
-
-
-**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
-
-&#10230;
-
-<br>
-
-
-**7. [Activation functions, Rectified Linear Unit, Softmax]**
-
-&#10230;
-
-<br>
-
-
-**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
-
-&#10230;
-
-<br>
-
-
-**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
-
-&#10230;
-
-<br>
-
-
-**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
-
-&#10230;
-
-<br>
-
-
-**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
-
-&#10230;
-
-<br>
-
-
-**12. Overview**
-
-&#10230;
-
-<br>
-
-
-**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
-
-&#10230;
-
-<br>
-
-
-**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
-
-&#10230;
-
-<br>
-
-
-**15. Types of layer**
-
-&#10230;
-
-<br>
-
-
-**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
-
-&#10230;
-
-<br>
-
-
-**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
-
-&#10230;
-
-<br>
-
-
-**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
-
-&#10230;
-
-<br>
-
-
-**19. [Type, Purpose, Illustration, Comments]**
-
-&#10230;
-
-<br>
-
-
-**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
-
-&#10230;
-
-<br>
-
-
-**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
-
-&#10230;
-
-<br>
-
-
-**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
-
-&#10230;
-
-<br>
-
-
-**23. Filter hyperparameters**
-
-&#10230;
-
-<br>
-
-
-**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
-
-&#10230;
-
-<br>
-
-
-**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
-
-&#10230;
-
-<br>
-
-
-**26. Filter**
-
-&#10230;
-
-<br>
-
-
-**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
-
-&#10230;
-
-<br>
-
-
-**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
-
-&#10230;
-
-<br>
-
-
-**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
-
-&#10230;
-
-<br>
-
-
-**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
-
-&#10230;
-
-<br>
-
-
-**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
-
-&#10230;
-
-<br>
-
-
-**32. Tuning hyperparameters**
-
-&#10230;
-
-<br>
-
-
-**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
-
-&#10230;
-
-<br>
-
-
-**34. [Input, Filter, Output]**
-
-&#10230;
-
-<br>
-
-
-**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
-
-&#10230;
-
-<br>
-
-
-**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
-
-&#10230;
-
-<br>
-
-
-**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
-
-&#10230;
-
-<br>
-
-
-**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
-
-&#10230;
-
-<br>
-
-
-**39. [Pooling operation done channel-wise, In most cases, S=F]**
-
-&#10230;
-
-<br>
-
-
-**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
-
-&#10230;
-
-<br>
-
-
-**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
-
-&#10230;
-
-<br>
-
-
-**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
-
-&#10230;
-
-<br>
-
-
-**43. Commonly used activation functions**
-
-&#10230;
-
-<br>
-
-
-**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
-
-&#10230;
-
-<br>
-
-
-**45. [ReLU, Leaky ReLU, ELU, with]**
-
-&#10230;
-
-<br>
-
-
-**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
-
-&#10230;
-
-<br>
-
-
-**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**48. where**
-
-&#10230;
-
-<br>
-
-
-**49. Object detection**
-
-&#10230;
-
-<br>
-
-
-**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
-
-&#10230;
-
-<br>
-
-
-**51. [Image classification, Classification w. localization, Detection]**
-
-&#10230;
-
-<br>
-
-
-**52. [Teddy bear, Book]**
-
-&#10230;
-
-<br>
-
-
-**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
-
-&#10230;
-
-<br>
-
-
-**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
-
-&#10230;
-
-<br>
-
-
-**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-
-**56. [Bounding box detection, Landmark detection]**
-
-&#10230;
-
-<br>
-
-
-**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
-
-&#10230;
-
-<br>
-
-
-**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
-
-&#10230;
-
-<br>
-
-
-**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
-
-&#10230;
-
-<br>
-
-
-**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
-
-&#10230;
-
-<br>
-
-
-**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
-
-&#10230;
-
-<br>
-
-
-**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
-
-&#10230;
-
-<br>
-
-
-**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
-
-&#10230;
-
-<br>
-
-
-**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
-
-&#10230;
-
-<br>
-
-
-**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
-
-&#10230;
-
-<br>
-
-
-**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
-
-&#10230;
-
-<br>
-
-
-**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
-
-&#10230;
-
-<br>
-
-
-**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
-
-&#10230;
-
-<br>
-
-
-**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
-
-&#10230;
-
-<br>
-
-
-**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
-
-&#10230;
-
-<br>
-
-
-**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
-
-&#10230;
-
-<br>
-
-
-**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
-
-&#10230;
-
-<br>
-
-
-**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
-
-&#10230;
-
-<br>
-
-
-**74. Face verification and recognition**
-
-&#10230;
-
-<br>
-
-
-**75. Types of models ― Two main types of model are summed up in table below:**
-
-&#10230;
-
-<br>
-
-
-**76. [Face verification, Face recognition, Query, Reference, Database]**
-
-&#10230;
-
-<br>
-
-
-**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
-
-&#10230;
-
-<br>
-
-
-**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
-
-&#10230;
-
-<br>
-
-
-**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
-
-&#10230;
-
-<br>
-
-
-**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**81. Neural style transfer**
-
-&#10230;
-
-<br>
-
-
-**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
-
-&#10230;
-
-<br>
-
-
-**83. [Content C, Style S, Generated image G]**
-
-&#10230;
-
-<br>
-
-
-**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
-
-&#10230;
-
-<br>
-
-
-**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
-
-&#10230;
-
-<br>
-
-
-**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
-
-&#10230;
-
-<br>
-
-
-**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
-
-&#10230;
-
-<br>
-
-
-**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
-
-&#10230;
-
-<br>
-
-
-**91. Architectures using computational tricks**
-
-&#10230;
-
-<br>
-
-
-**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
-
-&#10230;
-
-<br>
-
-
-**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
-
-&#10230;
-
-<br>
-
-
-**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
-
-&#10230;
-
-<br>
-
-
-**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
-
-&#10230;
-
-<br>
-
-
-**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
-
-&#10230;
-
-<br>
-
-
-**97. The Deep Learning cheatsheets are now available in [target language].**
-
-&#10230;
-
-<br>
-
-
-**98. Original authors**
-
-&#10230;
-
-<br>
-
-
-**99. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-
-**100. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-
-**101. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-
-**102. By X and Y**
-
-&#10230;
-
-<br>
diff --git a/vi/deep-learning-tips-and-tricks.md b/vi/deep-learning-tips-and-tricks.md
deleted file mode 100644
index 347234ec2..000000000
--- a/vi/deep-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,457 +0,0 @@
-**Deep Learning Tips and Tricks translation**
-
-<br>
-
-**1. Deep Learning Tips and Tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-
-**2. CS 230 - Deep Learning**
-
-&#10230;
-
-<br>
-
-
-**3. Tips and tricks**
-
-&#10230;
-
-<br>
-
-
-**4. [Data processing, Data augmentation, Batch normalization]**
-
-&#10230;
-
-<br>
-
-
-**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
-
-&#10230;
-
-<br>
-
-
-**6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
-
-&#10230;
-
-<br>
-
-
-**7. [Regularization, Dropout, Weight regularization, Early stopping]**
-
-&#10230;
-
-<br>
-
-
-**8. [Good practices, Overfitting small batch, Gradient checking]**
-
-&#10230;
-
-<br>
-
-
-**9. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-
-**10. Data processing**
-
-&#10230;
-
-<br>
-
-
-**11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
-
-&#10230;
-
-<br>
-
-
-**12. [Original, Flip, Rotation, Random crop]**
-
-&#10230;
-
-<br>
-
-
-**13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
-
-&#10230;
-
-<br>
-
-
-**14. [Color shift, Noise addition, Information loss, Contrast change]**
-
-&#10230;
-
-<br>
-
-
-**15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
-
-&#10230;
-
-<br>
-
-
-**16. Remark: data is usually augmented on the fly during training.**
-
-&#10230;
-
-<br>
-
-
-**17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
-
-&#10230;
-
-<br>
-
-
-**18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
-
-&#10230;
-
-<br>
-
-
-**19. Training a neural network**
-
-&#10230;
-
-<br>
-
-
-**20. Definitions**
-
-&#10230;
-
-<br>
-
-
-**21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
-
-&#10230;
-
-<br>
-
-
-**22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
-
-&#10230;
-
-<br>
-
-
-**23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
-
-&#10230;
-
-<br>
-
-
-**24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**25. Finding optimal weights**
-
-&#10230;
-
-<br>
-
-
-**26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
-
-&#10230;
-
-<br>
-
-
-**27. Using this method, each weight is updated with the rule:**
-
-&#10230;
-
-<br>
-
-
-**28. Updating weights ― In a neural network, weights are updated as follows:**
-
-&#10230;
-
-<br>
-
-
-**29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
-
-&#10230;
-
-<br>
-
-
-**30. [Forward propagation, Backpropagation, Weights update]**
-
-&#10230;
-
-<br>
-
-
-**31. Parameter tuning**
-
-&#10230;
-
-<br>
-
-
-**32. Weights initialization**
-
-&#10230;
-
-<br>
-
-
-**33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
-
-&#10230;
-
-<br>
-
-
-**34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
-
-&#10230;
-
-<br>
-
-
-**35. [Training size, Illustration, Explanation]**
-
-&#10230;
-
-<br>
-
-
-**36. [Small, Medium, Large]**
-
-&#10230;
-
-<br>
-
-
-**37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
-
-&#10230;
-
-<br>
-
-
-**38. Optimizing convergence**
-
-&#10230;
-
-<br>
-
-
-**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.
-**
-
-&#10230;
-
-<br>
-
-
-**40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-
-**41. [Method, Explanation, Update of w, Update of b]**
-
-&#10230;
-
-<br>
-
-
-**42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
-
-&#10230;
-
-<br>
-
-
-**43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
-
-&#10230;
-
-<br>
-
-
-**44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
-
-&#10230;
-
-<br>
-
-
-**45. Remark: other methods include Adadelta, Adagrad and SGD.**
-
-&#10230;
-
-<br>
-
-
-**46. Regularization**
-
-&#10230;
-
-<br>
-
-
-**47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
-
-&#10230;
-
-<br>
-
-
-**48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
-
-&#10230;
-
-<br>
-
-
-**49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-
-**50. [LASSO, Ridge, Elastic Net]**
-
-&#10230;
-
-<br>
-
-**50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-**51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
-
-&#10230;
-
-<br>
-
-
-**52. [Error, Validation, Training, early stopping, Epochs]**
-
-&#10230;
-
-<br>
-
-
-**53. Good practices**
-
-&#10230;
-
-<br>
-
-
-**54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
-
-&#10230;
-
-<br>
-
-
-**55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
-
-&#10230;
-
-<br>
-
-
-**56. [Type, Numerical gradient, Analytical gradient]**
-
-&#10230;
-
-<br>
-
-
-**57. [Formula, Comments]**
-
-&#10230;
-
-<br>
-
-
-**58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
-
-&#10230;
-
-<br>
-
-
-**59. ['Exact' result, Direct computation, Used in the final implementation]**
-
-&#10230;
-
-<br>
-
-
-**60. The Deep Learning cheatsheets are now available in [target language].
-
-&#10230;
-
-
-**61. Original authors**
-
-&#10230;
-
-<br>
-
-**62.Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**63.Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**64.View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-**65.By X and Y**
-
-&#10230;
-
-<br>
diff --git a/vi/recurrent-neural-networks.md b/vi/recurrent-neural-networks.md
deleted file mode 100644
index 191e400a1..000000000
--- a/vi/recurrent-neural-networks.md
+++ /dev/null
@@ -1,677 +0,0 @@
-**Recurrent Neural Networks translation**
-
-<br>
-
-**1. Recurrent Neural Networks cheatsheet**
-
-&#10230;
-
-<br>
-
-
-**2. CS 230 - Deep Learning**
-
-&#10230;
-
-<br>
-
-
-**3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
-
-&#10230;
-
-<br>
-
-
-**4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
-
-&#10230;
-
-<br>
-
-
-**5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
-
-&#10230;
-
-<br>
-
-
-**6. [Comparing words, Cosine similarity, t-SNE]**
-
-&#10230;
-
-<br>
-
-
-**7. [Language model, n-gram, Perplexity]**
-
-&#10230;
-
-<br>
-
-
-**8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
-
-&#10230;
-
-<br>
-
-
-**9. [Attention, Attention model, Attention weights]**
-
-&#10230;
-
-<br>
-
-
-**10. Overview**
-
-&#10230;
-
-<br>
-
-
-**11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
-
-&#10230;
-
-<br>
-
-
-**12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
-
-&#10230;
-
-<br>
-
-
-**13. and**
-
-&#10230;
-
-<br>
-
-
-**14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
-
-&#10230;
-
-<br>
-
-
-**15. The pros and cons of a typical RNN architecture are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-
-**16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
-
-&#10230;
-
-<br>
-
-
-**17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
-
-&#10230;
-
-<br>
-
-
-**18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-
-**19. [Type of RNN, Illustration, Example]**
-
-&#10230;
-
-<br>
-
-
-**20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
-
-&#10230;
-
-<br>
-
-
-**21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
-
-&#10230;
-
-<br>
-
-
-**22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
-
-&#10230;
-
-<br>
-
-
-**23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
-
-&#10230;
-
-<br>
-
-
-**24. Handling long term dependencies**
-
-&#10230;
-
-<br>
-
-
-**25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
-
-&#10230;
-
-<br>
-
-
-**26. [Sigmoid, Tanh, RELU]**
-
-&#10230;
-
-<br>
-
-
-**27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
-
-&#10230;
-
-<br>
-
-
-**28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
-
-&#10230;
-
-<br>
-
-
-**29. clipped**
-
-&#10230;
-
-<br>
-
-
-**30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
-
-&#10230;
-
-<br>
-
-
-**31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-
-**32. [Type of gate, Role, Used in]**
-
-&#10230;
-
-<br>
-
-
-**33. [Update gate, Relevance gate, Forget gate, Output gate]**
-
-&#10230;
-
-<br>
-
-
-**34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
-
-&#10230;
-
-<br>
-
-
-**35. [LSTM, GRU]**
-
-&#10230;
-
-<br>
-
-
-**36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
-
-&#10230;
-
-<br>
-
-
-**37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
-
-&#10230;
-
-<br>
-
-
-**38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
-
-&#10230;
-
-<br>
-
-
-**39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
-
-&#10230;
-
-<br>
-
-
-**40. [Bidirectional (BRNN), Deep (DRNN)]**
-
-&#10230;
-
-<br>
-
-
-**41. Learning word representation**
-
-&#10230;
-
-<br>
-
-
-**42. In this section, we note V the vocabulary and |V| its size.**
-
-&#10230;
-
-<br>
-
-
-**43. Motivation and notations**
-
-&#10230;
-
-<br>
-
-
-**44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-
-**45. [1-hot representation, Word embedding]**
-
-&#10230;
-
-<br>
-
-
-**46. [teddy bear, book, soft]**
-
-&#10230;
-
-<br>
-
-
-**47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
-
-&#10230;
-
-<br>
-
-
-**48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
-
-&#10230;
-
-<br>
-
-
-**49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
-
-&#10230;
-
-<br>
-
-
-**50. Word embeddings**
-
-&#10230;
-
-<br>
-
-
-**51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
-
-&#10230;
-
-<br>
-
-
-**52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
-
-&#10230;
-
-<br>
-
-
-**53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
-
-&#10230;
-
-<br>
-
-
-**54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
-
-&#10230;
-
-<br>
-
-
-**55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
-
-&#10230;
-
-<br>
-
-
-**56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
-
-&#10230;
-
-<br>
-
-
-**57. Remark: this method is less computationally expensive than the skip-gram model.**
-
-&#10230;
-
-<br>
-
-
-**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
-
-&#10230;
-
-<br>
-
-
-**58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
-Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
-
-&#10230;
-
-<br>
-
-
-**59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
-
-&#10230;
-
-<br>
-
-
-**60. Comparing words**
-
-&#10230;
-
-<br>
-
-
-**61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
-
-&#10230;
-
-<br>
-
-
-**62. Remark: θ is the angle between words w1 and w2.**
-
-&#10230;
-
-<br>
-
-
-**63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
-
-&#10230;
-
-<br>
-
-
-**64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
-
-&#10230;
-
-<br>
-
-
-**65. Language model**
-
-&#10230;
-
-<br>
-
-
-**66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
-
-&#10230;
-
-<br>
-
-
-**67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
-
-&#10230;
-
-<br>
-
-
-**68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**69. Remark: PP is commonly used in t-SNE.**
-
-&#10230;
-
-<br>
-
-
-**70. Machine translation**
-
-&#10230;
-
-<br>
-
-
-**71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
-
-&#10230;
-
-<br>
-
-
-**72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
-
-&#10230;
-
-<br>
-
-
-**73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
-
-&#10230;
-
-<br>
-
-
-**74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
-
-&#10230;
-
-<br>
-
-
-**75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
-
-&#10230;
-
-<br>
-
-
-**76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
-
-&#10230;
-
-<br>
-
-
-**77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
-
-&#10230;
-
-<br>
-
-
-**78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
-
-&#10230;
-
-<br>
-
-
-**79. [Case, Root cause, Remedies]**
-
-&#10230;
-
-<br>
-
-
-**80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
-
-&#10230;
-
-<br>
-
-
-**81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**82. where pn is the bleu score on n-gram only defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
-
-&#10230;
-
-<br>
-
-
-**84. Attention**
-
-&#10230;
-
-<br>
-
-
-**85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
-
-&#10230;
-
-<br>
-
-
-**86. with**
-
-&#10230;
-
-<br>
-
-
-**87. Remark: the attention scores are commonly used in image captioning and machine translation.**
-
-&#10230;
-
-<br>
-
-
-**88. A cute teddy bear is reading Persian literature.**
-
-&#10230;
-
-<br>
-
-
-**89. Attention weight ― The amount of attention that the output y<t> should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
-
-&#10230;
-
-<br>
-
-
-**90. Remark: computation complexity is quadratic with respect to Tx.**
-
-&#10230;
-
-<br>
-
-
-**91. The Deep Learning cheatsheets are now available in [target language].**
-
-&#10230;
-
-<br>
-
-**92. Original authors**
-
-&#10230;
-
-<br>
-
-**93. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**94. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**95. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-**96. By X and Y**
-
-&#10230;
-
-<br>
diff --git a/vi/refresher-linear-algebra.md b/vi/refresher-linear-algebra.md
deleted file mode 100644
index a6b440d1e..000000000
--- a/vi/refresher-linear-algebra.md
+++ /dev/null
@@ -1,339 +0,0 @@
-**1. Linear Algebra and Calculus refresher**
-
-&#10230;
-
-<br>
-
-**2. General notations**
-
-&#10230;
-
-<br>
-
-**3. Definitions**
-
-&#10230;
-
-<br>
-
-**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
-
-&#10230;
-
-<br>
-
-**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
-
-&#10230;
-
-<br>
-
-**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
-
-&#10230;
-
-<br>
-
-**7. Main matrices**
-
-&#10230;
-
-<br>
-
-**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
-
-&#10230;
-
-<br>
-
-**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we also note D as diag(d1,...,dn).**
-
-&#10230;
-
-<br>
-
-**12. Matrix operations**
-
-&#10230;
-
-<br>
-
-**13. Multiplication**
-
-&#10230;
-
-<br>
-
-**14. Vector-vector ― There are two types of vector-vector products:**
-
-&#10230;
-
-<br>
-
-**15. inner product: for x,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**16. outer product: for x∈Rm,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
-
-&#10230;
-
-<br>
-
-**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
-
-&#10230;
-
-<br>
-
-**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
-
-&#10230;
-
-<br>
-
-**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
-
-&#10230;
-
-<br>
-
-**21. Other operations**
-
-&#10230;
-
-<br>
-
-**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
-
-&#10230;
-
-<br>
-
-**23. Remark: for matrices A,B, we have (AB)T=BTAT**
-
-&#10230;
-
-<br>
-
-**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
-
-&#10230;
-
-<br>
-
-**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
-
-&#10230;
-
-<br>
-
-**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
-
-&#10230;
-
-<br>
-
-**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
-
-&#10230;
-
-<br>
-
-**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
-
-&#10230;
-
-<br>
-
-**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
-
-&#10230;
-
-<br>
-
-**30. Matrix properties**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
-
-&#10230;
-
-<br>
-
-**33. [Symmetric, Antisymmetric]**
-
-&#10230;
-
-<br>
-
-**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
-
-&#10230;
-
-<br>
-
-**35. N(ax)=|a|N(x) for a scalar**
-
-&#10230;
-
-<br>
-
-**36. if N(x)=0, then x=0**
-
-&#10230;
-
-<br>
-
-**37. For x∈V, the most commonly used norms are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**38. [Norm, Notation, Definition, Use case]**
-
-&#10230;
-
-<br>
-
-**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
-
-&#10230;
-
-<br>
-
-**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
-
-&#10230;
-
-<br>
-
-**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
-
-&#10230;
-
-<br>
-
-**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
-
-&#10230;
-
-<br>
-
-**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
-
-&#10230;
-
-<br>
-
-**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**46. diagonal**
-
-&#10230;
-
-<br>
-
-**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
-
-&#10230;
-
-<br>
-
-**48. Matrix calculus**
-
-&#10230;
-
-<br>
-
-**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
-
-&#10230;
-
-<br>
-
-**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
-
-&#10230;
-
-<br>
-
-**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
-
-&#10230;
-
-<br>
-
-**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
-
-&#10230;
-
-<br>
-
-**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
-
-&#10230;
-
-<br>
-
-**54. [General notations, Definitions, Main matrices]**
-
-&#10230;
-
-<br>
-
-**55. [Matrix operations, Multiplication, Other operations]**
-
-&#10230;
-
-<br>
-
-**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
-
-&#10230;
-
-<br>
-
-**57. [Matrix calculus, Gradient, Hessian, Operations]**
-
-&#10230;
diff --git a/vi/refresher-probability.md b/vi/refresher-probability.md
deleted file mode 100644
index 5c9b34656..000000000
--- a/vi/refresher-probability.md
+++ /dev/null
@@ -1,381 +0,0 @@
-**1. Probabilities and Statistics refresher**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Probability and Combinatorics**
-
-&#10230;
-
-<br>
-
-**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-
-<br>
-
-**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-
-<br>
-
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-
-<br>
-
-**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-
-<br>
-
-**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-
-<br>
-
-**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-
-<br>
-
-**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-
-<br>
-
-**12. Conditional Probability**
-
-&#10230;
-
-<br>
-
-**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-
-<br>
-
-**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-
-<br>
-
-**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-
-<br>
-
-**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-
-<br>
-
-**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-
-<br>
-
-**18. Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-
-<br>
-
-**19. Random Variables**
-
-&#10230;
-
-<br>
-
-**20. Definitions**
-
-&#10230;
-
-<br>
-
-**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-
-<br>
-
-**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-
-<br>
-
-**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-
-<br>
-
-**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-
-<br>
-
-**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-
-<br>
-
-**26. [Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-
-<br>
-
-**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-
-<br>
-
-**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-
-<br>
-
-**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-
-<br>
-
-**32. Probability Distributions**
-
-&#10230;
-
-<br>
-
-**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-
-<br>
-
-**34. Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-
-<br>
-
-**35. [Type, Distribution]**
-
-&#10230;
-
-<br>
-
-**36. Jointly Distributed Random Variables**
-
-&#10230;
-
-<br>
-
-**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-
-<br>
-
-**38. [Case, Marginal density, Cumulative function]**
-
-&#10230;
-
-<br>
-
-**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**40. Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-
-<br>
-
-**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-
-<br>
-
-**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-
-<br>
-
-**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-
-<br>
-
-**44. Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-
-<br>
-
-**45. Parameter estimation**
-
-&#10230;
-
-<br>
-
-**46. Definitions**
-
-&#10230;
-
-<br>
-
-**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-
-<br>
-
-**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-
-<br>
-
-**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-
-<br>
-
-**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-
-<br>
-
-**51. Estimating the mean**
-
-&#10230;
-
-<br>
-
-**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
-
-&#10230;
-
-<br>
-
-**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-
-<br>
-
-**55. Estimating the variance**
-
-&#10230;
-
-<br>
-
-**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-
-<br>
-
-**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-
-<br>
-
-**59. [Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-
-<br>
-
-**60. [Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-
-<br>
-
-**61. [Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-
-<br>
-
-**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-
-<br>
-
-**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-
-<br>
-
-**64. [Parameter estimation, Mean, Variance]**
-
-&#10230;

From 778dff2d7e1e23a9caff460686753f72fb17a13c Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Thu, 6 Jun 2019 12:19:45 -0700
Subject: [PATCH 221/531] Add Vietnamese

---
 README.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index dd233ff65..19cd9baa7 100644
--- a/README.md
+++ b/README.md
@@ -89,14 +89,14 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|done|
 
 
-|Cheatsheet topic|Magyar|Deutsch|Bahasa Indonesia|
-|:---|:---:|:---:|:---:|
-|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|
-|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
-|Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|
-|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|
-|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
+|Cheatsheet topic|Magyar|Deutsch|Bahasa Indonesia|Tiếng Việt|
+|:---|:---:|:---:|:---:|:---:|
+|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|
+|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|not started|
+|Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|
+|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|not started|
+|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|not started|
+|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|not started|
 
 
 ## Acknowledgements

From 943b1fe002ab460078104bb4f2dfd2e41fa1ae90 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Thu, 6 Jun 2019 12:22:17 -0700
Subject: [PATCH 222/531] Reorder progression subparts

---
 README.md | 55 +++++++++++++++++++++++++++----------------------------
 1 file changed, 27 insertions(+), 28 deletions(-)

diff --git a/README.md b/README.md
index 19cd9baa7..d187a7a5c 100644
--- a/README.md
+++ b/README.md
@@ -34,32 +34,8 @@ The translation process of each cheatsheet contains two steps:
 Please make sure to propose the translation of **only one** cheatsheet per pull request -- it simplifies a lot the review process.
 
 
-## Progression for CS 230 (Deep Learning)
-|Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
-|Recurrent Neural Nets|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/146)|not started|not started|
-|DL tips and tricks|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/157)|not started|not started|
-
-|Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|not started|not started|done|not started|not started|
-|Recurrent Neural Nets|not started|not started|not started|done|not started|not started|
-|DL tips and tricks|not started|not started|not started|done|not started|not started|
-
-|Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|
-|:---|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|
-|Recurrent Neural Nets|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|
-|DL tips and tricks|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
-
-|Cheatsheet topic|Bahasa Indonesia|
-:---|:---:|
-|Convolutional Neural Nets|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/155)|
-|Recurrent Neural Nets|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|
-|DL tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
-
-## Progression for CS 229 (Machine Learning)
+## Progression 
+### CS 229 (Machine Learning)
 |Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |Deep learning|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|
@@ -78,7 +54,6 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|not started|not started|done|not started|not started|
 |Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|not started|not started|done|not started|not started|
 
-
 |Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|
 |:---|:---:|:---:|:---:|:---:|:---:|
 |Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|
@@ -88,7 +63,6 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|done|done|
 |Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|done|
 
-
 |Cheatsheet topic|Magyar|Deutsch|Bahasa Indonesia|Tiếng Việt|
 |:---|:---:|:---:|:---:|:---:|
 |Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|
@@ -99,5 +73,30 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|not started|
 
 
+### CS 230 (Deep Learning)
+|Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
+|:---|:---:|:---:|:---:|:---:|:---:|:---:|
+|Convolutional Neural Nets|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
+|Recurrent Neural Nets|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/146)|not started|not started|
+|DL tips and tricks|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/157)|not started|not started|
+
+|Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
+|:---|:---:|:---:|:---:|:---:|:---:|:---:|
+|Convolutional Neural Nets|not started|not started|not started|done|not started|not started|
+|Recurrent Neural Nets|not started|not started|not started|done|not started|not started|
+|DL tips and tricks|not started|not started|not started|done|not started|not started|
+
+|Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|
+|:---|:---:|:---:|:---:|:---:|:---:|
+|Convolutional Neural Nets|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|
+|Recurrent Neural Nets|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|
+|DL tips and tricks|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
+
+|Cheatsheet topic|Bahasa Indonesia|
+:---|:---:|
+|Convolutional Neural Nets|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/155)|
+|Recurrent Neural Nets|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|
+|DL tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
+
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From 782eab3635acd2ff5d4b5d71718466b4b41c8e81 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Thu, 6 Jun 2019 13:17:21 -0700
Subject: [PATCH 223/531] Change presentation of table for CS 229

---
 README.md | 56 ++++++++++++++++++++++---------------------------------
 1 file changed, 22 insertions(+), 34 deletions(-)

diff --git a/README.md b/README.md
index d187a7a5c..ae46eaee8 100644
--- a/README.md
+++ b/README.md
@@ -36,41 +36,29 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 ## Progression 
 ### CS 229 (Machine Learning)
-|Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|
-|Supervised learning|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/144)|done|done|
-|Unsupervised learning|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|
-|ML tips and tricks|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|
-|Probabilities and Statistics|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/142)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|
-|Linear algebra|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/140)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
-
-|Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
+| |Deep learning|Supervised|Unsupervised|ML tips|Probabilities|Algebra|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/156)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|
-|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|
-|Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|not started|not started|done|not started|not started|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|done|not started|not started|
-|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|not started|not started|done|not started|not started|
-|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|not started|not started|done|not started|not started|
-
-|Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|
-|:---|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|
-|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|
-|Unsupervised learning|not started|not started|not started|not started|done|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|done|
-|Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|done|done|
-|Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|done|
-
-|Cheatsheet topic|Magyar|Deutsch|Bahasa Indonesia|Tiếng Việt|
-|:---|:---:|:---:|:---:|:---:|
-|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|
-|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|not started|
-|Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|not started|
-|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|not started|
-|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|not started|
+|**العَرَبِيَّة**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|
+|**Català**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|
+|**Deutsch**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|
+|**Español**|done|done|done|done|done|done|
+|**فارسی**|done|done|done|done|done|done|
+|**Suomi**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|not started|not started|not started|
+|**Français**|done|done|done|done|done|done|
+|**עִבְרִית**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/156)|not started|not started|not started|not started|not started|
+|**हिन्दी**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|not started|not started|
+|**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
+|**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
+|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|not started|not started|
+|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/144)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/142)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/140)|
+|**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
+|**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|
+|**Português**|done|done|done|done|done|done|
+|**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started|
+|**Türkçe**|done|done|done|done|done|done|
+|**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
+|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|not started|not started|not started|not started|not started|
+|**中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 
 
 ### CS 230 (Deep Learning)

From 724d4912b7a4abc9da0f7ea1f4027be0b13d58f1 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Thu, 6 Jun 2019 14:07:58 -0700
Subject: [PATCH 224/531] Change presentation of table for CS 230

---
 README.md | 48 +++++++++++++++++++++++-------------------------
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/README.md b/README.md
index ae46eaee8..b55c419cd 100644
--- a/README.md
+++ b/README.md
@@ -33,7 +33,6 @@ The translation process of each cheatsheet contains two steps:
 ### Important note
 Please make sure to propose the translation of **only one** cheatsheet per pull request -- it simplifies a lot the review process.
 
-
 ## Progression 
 ### CS 229 (Machine Learning)
 | |Deep learning|Supervised|Unsupervised|ML tips|Probabilities|Algebra|
@@ -60,31 +59,30 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|not started|not started|not started|not started|not started|
 |**中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 
-
 ### CS 230 (Deep Learning)
-|Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
-|Recurrent Neural Nets|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/146)|not started|not started|
-|DL tips and tricks|not started|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/157)|not started|not started|
-
-|Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|not started|not started|done|not started|not started|
-|Recurrent Neural Nets|not started|not started|not started|done|not started|not started|
-|DL tips and tricks|not started|not started|not started|done|not started|not started|
-
-|Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|
-|:---|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|
-|Recurrent Neural Nets|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|
-|DL tips and tricks|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
-
-|Cheatsheet topic|Bahasa Indonesia|
-:---|:---:|
-|Convolutional Neural Nets|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/155)|
-|Recurrent Neural Nets|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|
-|DL tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
+| |Convolutional Neural Networks|Recurrent Neural Networks|DL tips|
+|:---|:---:|:---:|:---:|
+|**العَرَبِيَّة**|not started|not started|not started|
+|**Català**|not started|not started|not started|
+|**Deutsch**|not started|not started|not started|
+|**Español**|not started|not started|not started|
+|**فارسی**|done|done|done|
+|**Suomi**|not started|not started|not started|
+|**Français**|done|done|done|
+|**עִבְרִית**|not started|not started|not started|
+|**हिन्दी**|not started|not started|not started|
+|**Magyar**|not started|not started|not started|
+|**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/155)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
+|**Italiano**|not started|not started|not started|
+|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/146)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/157)|
+|**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
+|**Polski**|not started|not started|not started|
+|**Português**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|not started|
+|**Русский**|not started|not started|not started|
+|**Türkçe**|done|done|done|
+|**Українська**|not started|not started|not started|
+|**Tiếng Việt**|not started|not started|not started|
+|**中文**|not started|not started|not started|
 
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From 4b38ad210a0da83103622caf19c9f4a82568a263 Mon Sep 17 00:00:00 2001
From: Robert Altena <Rob@Ra-ai.com>
Date: Fri, 7 Jun 2019 11:52:13 +0900
Subject: [PATCH 225/531] Complete. Ready for review.

---
 ja/refresher-linear-algebra.md | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/ja/refresher-linear-algebra.md b/ja/refresher-linear-algebra.md
index dba624443..e72a56fb4 100644
--- a/ja/refresher-linear-algebra.md
+++ b/ja/refresher-linear-algebra.md
@@ -273,69 +273,70 @@ x∈V、一般的に使用されるノルムは、以下の表にまとめられ
 **46. diagonal**
 
 &#10230;
-
+対角
 <br>
 
 **47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
 
 &#10230;
-
+特異値分解 ― Aをm×nの行列とする。特異値分解（SVD）は、Ｕ ｍ×ｍのユニタリ行列、ｍ ｍ×ｎの対角行列、およびＶ ｎ×ｎのユニタリ行列の存在を保証する因数分解手法である、次のようになります。
 <br>
 
 **48. Matrix calculus**
 
 &#10230;
-
+行列微積分
 <br>
 
 **49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
 
 &#10230;
-
+勾配 ― f:Rm×n→Rを関数とし、A∈Rm×nを行列とする。 Aに対するfの勾配はm×n行列で、∇Af（A）と表記され。次のように：
 <br>
 
 **50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
 
 &#10230;
-
+備考：　fの勾配は、fがスカラーを返す関数である場合にのみ定義されます。
 <br>
 
 **51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
 
 &#10230;
-
+ヘッセ行列 ― f：Rn→Rを関数とし、x∈Rnをベクトルとする。 xに関するfのヘッセ行列は、次のように∇2xf（x）と表記されるn×n対称行列です。
 <br>
 
 **52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
 
 &#10230;
-
+備考：　fのヘッセ行列は、fがスカラーを返す関数である場合にのみ定義されます。
 <br>
 
 **53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
 
 &#10230;
-
+勾配演算 ― 行列A、B、Cの場合、次の勾配特性があります。
 <br>
 
 **54. [General notations, Definitions, Main matrices]**
 
 &#10230;
-
+[表記, 定義, 主行列]
 <br>
 
 **55. [Matrix operations, Multiplication, Other operations]**
 
 &#10230;
-
+[行列演算, 乗算, その他の演算]
 <br>
 
 **56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
 
 &#10230;
-
+[行列特性, 行列ノルム, 固有値/固有ベクトル, 特異値分解]
 <br>
 
 **57. [Matrix calculus, Gradient, Hessian, Operations]**
 
 &#10230;
+[行列計算, 勾配, ヘッセ行列, 演算]
\ No newline at end of file

From d4d48e1c9da305140b36564118edb5e92025292b Mon Sep 17 00:00:00 2001
From: tt-anh-eole <tt-anh@eole.co.jp>
Date: Fri, 7 Jun 2019 18:04:17 +0900
Subject: [PATCH 226/531] vi translating

---
 vi/cheatsheet-deep-learning.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/vi/cheatsheet-deep-learning.md b/vi/cheatsheet-deep-learning.md
index ff3b3c508..f0d8b477d 100644
--- a/vi/cheatsheet-deep-learning.md
+++ b/vi/cheatsheet-deep-learning.md
@@ -270,7 +270,7 @@
 
 **46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
 
-&#10230; 
+&#10230; Ước lượng khả năng tối đa (Maximum likelihood estimate) - Ước lượng khả năng tối đa cho xác suất chuyển tiếp trạng thái (state) sẽ như sau:
 
 <br>
 
@@ -288,31 +288,31 @@
 
 **49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
 
-&#10230;
+&#10230; Q-learning ― Q-learning là 1 dạng phán đoán phi mô hình (model-free) của Q, được thực hiện như sau:
 
 <br>
 
 **50. View PDF version on GitHub**
 
-&#10230;
+&#10230; Xem bản PDF trên GitHub
 
 <br>
 
 **51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
 
-&#10230;
+&#10230; [Mạng neural, Kiến trúc, Hàm kích hoạt, Lan truyền ngược, Dropout]
 
 <br>
 
 **52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
 
-&#10230;
+&#10230; [Mạng neural tích chập, Tầng chập, Chuẩn hoá lô (batch)]
 
 <br>
 
 **53. [Recurrent Neural Networks, Gates, LSTM]**
 
-&#10230;
+&#10230; [Mạng neural hồi quy, Gates, LSTM]
 
 <br>
 

From 06f1ad415ee02a22e571529700c362b66d490267 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Fri, 7 Jun 2019 22:19:34 +0900
Subject: [PATCH 227/531] vi translating for cheatsheet-deep-learning

---
 vi/cheatsheet-deep-learning.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/vi/cheatsheet-deep-learning.md b/vi/cheatsheet-deep-learning.md
index f0d8b477d..0a795fcf8 100644
--- a/vi/cheatsheet-deep-learning.md
+++ b/vi/cheatsheet-deep-learning.md
@@ -276,13 +276,13 @@
 
 **47. times took action a in state s and got to s′**
 
-&#10230;
+&#10230; thời gian hành động a tiêu tốn cho state s và biến đổi nó thành s′
 
 <br>
 
 **48. times took action a in state s**
 
-&#10230;
+&#10230; thời gian hành động a tiêu tốn cho state (trạng thái) s
 
 <br>
 
@@ -318,4 +318,4 @@
 
 **54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
 
-&#10230;
+&#10230; [Học tăng cường (Reinforcement learning), Tiến trình quyết định Markov, Lặp Giá trị/policy, Lập trình động xấp xỉ, Tìm kiếm Policy]

From b8d82ffaa6b6faaf7823772b255b51c04408d633 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Sat, 8 Jun 2019 23:28:35 +0900
Subject: [PATCH 228/531] vi translating for cheatsheet-deep-learning

---
 vi/cheatsheet-deep-learning.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/vi/cheatsheet-deep-learning.md b/vi/cheatsheet-deep-learning.md
index 0a795fcf8..da05af89b 100644
--- a/vi/cheatsheet-deep-learning.md
+++ b/vi/cheatsheet-deep-learning.md
@@ -108,7 +108,7 @@
 
 **19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
 
-&#10230; Dropout - Dropout là thuật ngữ kĩ thuật dùng trong việc tránh overfitting tập dữ liệu huấn luyện
+&#10230; Dropout - Dropout là thuật ngữ kĩ thuật dùng trong việc tránh overfitting tập dữ liệu huấn luyện bằng việc bỏ đi các đơn vị trong mạng neural. Trong thực tế, các neurals hoặc là bị bỏ đi bởi xác suất p hoặc được giữ lại với xác suất 1-p
 
 <br>
 
@@ -174,7 +174,7 @@
 
 **30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
 
-&#10230; Mục tiêu của reinforcement learning đó là cho tác tử (agent) học cách làm sao để phát triển trong một môi trường
+&#10230; Mục tiêu của reinforcement learning đó là cho tác tử (agent) học cách làm sao để tối ưu hoá trong một môi trường.
 
 <br>
 
@@ -252,7 +252,7 @@
 
 **43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
 
-&#10230; Giải thuật duyệt giá trị (Value iteration) - Giải thuật duyệt giá trị có 2 loại:
+&#10230; Giải thuật duyệt giá trị (Value iteration) - Giải thuật duyệt giá trị gồm 2 bước:
 
 <br>
 

From 657befd764339901d9bafb5a152d3dbd91c8cad2 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 9 Jun 2019 14:55:20 -0700
Subject: [PATCH 229/531] Update progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index b55c419cd..7f3f37092 100644
--- a/README.md
+++ b/README.md
@@ -56,7 +56,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|done|done|
 |**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
-|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|not started|not started|not started|not started|not started|
+|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|not started|not started|not started|not started|
 |**中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 
 ### CS 230 (Deep Learning)

From 9d5ea641ccae172e435ccbad06e26e5efec17eab Mon Sep 17 00:00:00 2001
From: Kamuela Lau <33002774+Kamulau@users.noreply.github.com>
Date: Mon, 10 Jun 2019 16:35:28 +0900
Subject: [PATCH 230/531] Update ja/deep-learning-tips-and-tricks.md

Co-Authored-By: HIROKI MORI <35646653+Hiroki-Mori360@users.noreply.github.com>
---
 ja/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index d3900a3c2..f6e5a5a5b 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -186,7 +186,7 @@
 
 **27. Using this method, each weight is updated with the rule:**
 
-&#10230;各重みは以下のルールで重みを更新します。
+&#10230;この方法を使用することで、それぞれの重みはそのルールにしたがって更新されます。
 
 <br>
 

From 912d6684f1edf8e7f801a46950a6a0f3e0f464e2 Mon Sep 17 00:00:00 2001
From: Kamuela Lau <33002774+Kamulau@users.noreply.github.com>
Date: Mon, 10 Jun 2019 16:35:42 +0900
Subject: [PATCH 231/531] Update ja/deep-learning-tips-and-tricks.md

Co-Authored-By: HIROKI MORI <35646653+Hiroki-Mori360@users.noreply.github.com>
---
 ja/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index f6e5a5a5b..9a29524bd 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -193,7 +193,7 @@
 
 **28. Updating weights ― In a neural network, weights are updated as follows:**
 
-&#10230;重みの更新 - ニューラルネットワークでは、重みは以下の通り更新します。
+&#10230;重みの更新 - ニューラルネットワークでは、以下の方法にしたがって重みが更新されます。
 
 <br>
 

From 4de6e2389834ca086bea8549c86b9291b851ae6c Mon Sep 17 00:00:00 2001
From: Kamuela Lau <33002774+Kamulau@users.noreply.github.com>
Date: Mon, 10 Jun 2019 16:35:51 +0900
Subject: [PATCH 232/531] Update ja/deep-learning-tips-and-tricks.md

Co-Authored-By: HIROKI MORI <35646653+Hiroki-Mori360@users.noreply.github.com>
---
 ja/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index 9a29524bd..171437907 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -200,7 +200,7 @@
 
 **29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
 
-&#10230;ステップ１：訓練データのバッチでフォワードプロパゲーションで損失を求めます。ステップ２：損失を用いて逆伝播法を行い勾配をえます。ステップ３：勾配を用いて重みを更新します。	
+&#10230;ステップ１：訓練データのバッチでフォワードプロパゲーションで損失を求めます。ステップ２：逆伝播法を用いてそれぞれの重みに対する損失の勾配を求めます。ステップ３：求めた勾配を用いてネットワークの重みを更新します。	
 
 <br>
 

From 052410bd2457d60595377f64529e0d3c63ac5d6c Mon Sep 17 00:00:00 2001
From: Kamuela Lau <33002774+Kamulau@users.noreply.github.com>
Date: Mon, 10 Jun 2019 16:36:00 +0900
Subject: [PATCH 233/531] Update ja/deep-learning-tips-and-tricks.md

Co-Authored-By: HIROKI MORI <35646653+Hiroki-Mori360@users.noreply.github.com>
---
 ja/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index 171437907..ce0c677ce 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -228,7 +228,7 @@
 
 **33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
 
-&#10230;Xavier初期化 - ランダムで重みを初期化するよりもむしろニューラルネットワークのアーキテクチャの特徴を用いて重みを初期化する手法です。
+&#10230;Xavier初期化 - 完全にランダムな方法で重みを初期化するのではなく、そのアーキテクチャのユニークな特徴を考慮に入れて重みを初期化する方法です。
 
 <br>
 

From c87e156ab4ac8d7a8d6d01c67d0d89cc23b5305f Mon Sep 17 00:00:00 2001
From: Kamuela Lau <33002774+Kamulau@users.noreply.github.com>
Date: Mon, 10 Jun 2019 16:36:07 +0900
Subject: [PATCH 234/531] Update ja/deep-learning-tips-and-tricks.md

Co-Authored-By: HIROKI MORI <35646653+Hiroki-Mori360@users.noreply.github.com>
---
 ja/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index ce0c677ce..cb98ccb22 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -235,7 +235,7 @@
 
 **34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
 
-&#10230;転移学習 - 深層学習のモデルを学習させるには大量のデータと、それ以上に時間が必要です。膨大なデータで数日・数週間をかけて学習済みのモデルを利用し、自分のユースケースに活かせることが多いです。データの量次第では、以下の生かす方法があります。
+&#10230;転移学習 - 深層学習のモデルを学習させるには大量のデータと何よりも時間が必要です。膨大なデータセットから数日・数週間をかけて構築した学習済みモデルを利用し、自身のユースケースに活かすことは有益であることが多いです。手元にあるデータ量次第ではありますが、これを利用する以下の方法があります。
 
 <br>
 

From 2549d4443389db4a6553e6b7bd9987ad3be4c068 Mon Sep 17 00:00:00 2001
From: Kamuela Lau <33002774+Kamulau@users.noreply.github.com>
Date: Mon, 10 Jun 2019 16:36:14 +0900
Subject: [PATCH 235/531] Update ja/deep-learning-tips-and-tricks.md

Co-Authored-By: HIROKI MORI <35646653+Hiroki-Mori360@users.noreply.github.com>
---
 ja/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index cb98ccb22..d705f44f8 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -271,7 +271,7 @@
 **39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.
 **
 
-&#10230;学習率 - αやηとよく表記される学習率とは、重みの更新の速さを表現します。固定で指定するか、もしくは適応的に変えます。もっとも多用されている手法は、適切に学習率を変える Adam です。
+&#10230;学習率 - 多くの場合αや時々ηと表記される学習率とは、重みの更新速度を表しています。学習率は固定することもできる上に、適応的に変更することもできます。現在もっとも使用される手法は、学習率を適切に調整するAdamと呼ばれる手法です。
 
 <br>
 

From 3721c9e1a00cd9ff1934843764caa6e5e0b0b6f2 Mon Sep 17 00:00:00 2001
From: Kamuela Lau <33002774+Kamulau@users.noreply.github.com>
Date: Mon, 10 Jun 2019 16:36:26 +0900
Subject: [PATCH 236/531] Update ja/deep-learning-tips-and-tricks.md

Co-Authored-By: HIROKI MORI <35646653+Hiroki-Mori360@users.noreply.github.com>
---
 ja/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index d705f44f8..84fdc871a 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -278,7 +278,7 @@
 
 **40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
 
-&#10230;適応学習率法 - 学習時間の短縮や精度の向上のために学習率を変更することです。Adamがもっとも多用されている手法だが、他の手法も役に立つことがあります。適応学習率法を下記の表にまとめました。
+&#10230;適応学習率法 - モデルを学習させる際に学習率を変動させることで、学習時間の短縮や精度の向上につながります。Adamがもっとも一般的に使用されている手法ではあるが、他の手法も役立つことがあります。それらの手法を下記の表にまとめました。
 
 <br>
 

From 3317b29e26c970eb2227a039d53454c4bd6c1f80 Mon Sep 17 00:00:00 2001
From: Kamuela Lau <33002774+Kamulau@users.noreply.github.com>
Date: Mon, 10 Jun 2019 16:37:04 +0900
Subject: [PATCH 237/531] Update ja/deep-learning-tips-and-tricks.md

Co-Authored-By: yoshiyukinakai <fpnz.tams@gmail.com>
---
 ja/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index 84fdc871a..d3542f92e 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -88,7 +88,7 @@
 
 **13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
 
-&#10230;修正なしの画像、画像の意味が変わらぬ軸における反転、わずかな回転、不正確な水平線の校正（calibration）のシミュレーション、ランダムな部分の拡張、連続のランダムな切り抜きは可能
+&#10230;何も変更されていない画像、画像の意味が変わらない軸における反転、わずかな角度の回転、不正確な水平線の校正（calibration）をシミュレートする、画像の一部へのランダムなフォーカス、連続して数回のランダムな切り抜きが可能
 
 <br>
 

From 4f2bac9234d0d382fd8a1e5238d575199ff040c3 Mon Sep 17 00:00:00 2001
From: Kamuela Lau <33002774+Kamulau@users.noreply.github.com>
Date: Mon, 10 Jun 2019 16:37:15 +0900
Subject: [PATCH 238/531] Update ja/deep-learning-tips-and-tricks.md

Co-Authored-By: yoshiyukinakai <fpnz.tams@gmail.com>
---
 ja/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index d3542f92e..db0abab27 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -102,7 +102,7 @@
 
 **15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
 
-&#10230;RGBのわずかな修正、照らされ方によるノイズを捉える、ノイズの付加、入力画像の画質のバリエーションに対する耐性の増加、画像の一部を不使用、画像の一部がないときを真似る、明るさの変化、時間によるコントラストをコントロール
+&#10230;RGBのわずかな修正、照らされ方によるノイズを捉える、ノイズの付加、入力画像の品質のばらつきへの耐性の強化、画像の一部を無視、画像の一部が欠ける可能性を再現する、明るさの変化、時刻による露出の違いをコントロールする
 
 <br>
 

From cc5585a4e7bc7f74728a39b38961f1420a220221 Mon Sep 17 00:00:00 2001
From: Kamuela Lau <33002774+Kamulau@users.noreply.github.com>
Date: Mon, 10 Jun 2019 16:37:22 +0900
Subject: [PATCH 239/531] Update ja/deep-learning-tips-and-tricks.md

Co-Authored-By: yoshiyukinakai <fpnz.tams@gmail.com>
---
 ja/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index db0abab27..a3c0a3562 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -109,7 +109,7 @@
 
 **16. Remark: data is usually augmented on the fly during training.**
 
-&#10230;備考：データ拡張は基本的には学習時に行う。
+&#10230;備考：データ拡張は基本的には学習時に臨機応変に行われる。
 
 <br>
 

From f2ff067982ddee9d27e3fa9c38dd474d3efb210c Mon Sep 17 00:00:00 2001
From: Kamuela Lau <33002774+Kamulau@users.noreply.github.com>
Date: Mon, 10 Jun 2019 16:37:32 +0900
Subject: [PATCH 240/531] Update ja/deep-learning-tips-and-tricks.md

Co-Authored-By: yoshiyukinakai <fpnz.tams@gmail.com>
---
 ja/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index a3c0a3562..9ae8a2f92 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -116,7 +116,7 @@
 
 **17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
 
-&#10230;batch normalization - ハイパーパラメータ γ、β のステップがバッチ {xi}を正規化します。平均と分散をμB,σ2Bと表記すると、以下で行えます。
+&#10230;batch normalization - ハイパーパラメータ γ、β によってバッチ {xi} を正規化するステップです。修正を加えたいバッチの平均と分散をμB,σ2Bと表記すると、以下のように行えます。
 
 <br>
 

From 71aa33c0d3c7611168239c107652859b02791e7d Mon Sep 17 00:00:00 2001
From: Kamuela Lau <33002774+Kamulau@users.noreply.github.com>
Date: Mon, 10 Jun 2019 16:37:44 +0900
Subject: [PATCH 241/531] Update ja/deep-learning-tips-and-tricks.md

Co-Authored-By: yoshiyukinakai <fpnz.tams@gmail.com>
---
 ja/deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index 9ae8a2f92..7733d3cbd 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -123,7 +123,7 @@
 
 **18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
 
-&#10230;高い学習率を利用可能にするのと初期化への依存を減らすのが目的で基本的には全結合層・畳み込み層のあとで非線形層の前に行います。
+&#10230;より高い学習率を利用可能にし初期化への強い依存を減らすことを目的として、基本的には全結合層・畳み込み層のあとで非線形層の前に行います。
 
 <br>
 

From a365c2faca53aa76a68810ab064d2d49d9e9cdf0 Mon Sep 17 00:00:00 2001
From: tt-anh-eole <tt-anh@eole.co.jp>
Date: Tue, 11 Jun 2019 14:51:43 +0900
Subject: [PATCH 242/531] vi translating for cheatsheet supervised learning

---
 vi/cheatsheet-supervised-learning.md | 567 +++++++++++++++++++++++++++
 1 file changed, 567 insertions(+)
 create mode 100644 vi/cheatsheet-supervised-learning.md

diff --git a/vi/cheatsheet-supervised-learning.md b/vi/cheatsheet-supervised-learning.md
new file mode 100644
index 000000000..91659fc60
--- /dev/null
+++ b/vi/cheatsheet-supervised-learning.md
@@ -0,0 +1,567 @@
+**1. Supervised Learning cheatsheet**
+
+&#10230; Cheatsheet học có giám sát
+
+<br>
+
+**2. Introduction to Supervised Learning**
+
+&#10230; Giới thiệu về học có giám sát
+
+<br>
+
+**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
+
+&#10230; Cho một tập hợp các điểm dữ liệu {x(1),...,x(m)} tương ứng với đó là tập các kết quả {y(1),...,y(m)}, chúng ta muốn xây dựng một bộ phân loại học được các dự đoán y từ x.
+
+<br>
+
+**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
+
+&#10230; Kiểu dự đoán - Các kiểu khác nhau của mô hình dự đoán được tổng kết trong bảng bên dưới: 
+
+<br>
+
+**5. [Regression, Classifier, Outcome, Examples]**
+
+&#10230; [Hồi quy, Phân loại, Đầu ra, Các ví dụ]
+
+<br>
+
+**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
+
+&#10230; [Tiếp diễn, Lớp, Hồi quy tuyến tính, Hồi quy Logistic, SVM, Naive Bayes]
+
+<br>
+
+**7. Type of model ― The different models are summed up in the table below:**
+
+&#10230; Kiểu của mô hình - Các mô hình khác nhau được tổng kết trong bảng bên dưới:
+
+<br>
+
+**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
+
+&#10230; [Mô hình phân biệt, Mô hình sáng tạo, Mục tiêu, Những gì học được, Hình minh hoạ, Các ví dụ]
+
+<br>
+
+**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
+
+&#10230; []
+
+<br>
+
+**10. Notations and general concepts**
+
+&#10230; Kí hiệu và các khái niệm tổng quát
+
+<br>
+
+**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
+
+&#10230; Hypothesis - Hypothesis được kí hiệu là h0, là một mô hình mà chúng ta chọn. Với dữ liệu đầu vào cho trước x(i), mô hình dự đoaans đầu ra là h0(x(i)).
+
+<br>
+
+**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
+
+&#10230; Hàm mất mát - Hàm mất mát là một hàm số dạng: L:(z,y)∈R×Y⟼L(z,y)∈R lấy đầu vào là giá trị dự đoán được z tương ứng với đầu ra thực tế là y, hàm có đầu ra là sự khác biệt giữa hai giá trị này. Các hàm mất mát phổ biến được tổng kết ở bảng dưới đây:
+
+<br>
+
+**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
+
+&#10230; []
+
+<br>
+
+**14. [Linear regression, Logistic regression, SVM, Neural Network]**
+
+&#10230; [Hồi quy tuyến tính, Hồi quy Logistic, SVM, Mạng neural]
+
+<br>
+
+**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
+
+&#10230; Hàm giá trị (Cost function) - Cost function J thường được sử dụng để đánh giá hiệu năng của mô hình và được định nghĩa với hàm mất mát L như sau:
+
+<br>
+
+**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
+
+&#10230; Gradient descent - Bằng việc kí hiệu α∈R là tốc độ học, việc cập nhật quy tắc/ luật cho gradient descent được mô tả với tốc độ học và cost function J như sau:
+
+<br>
+
+**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
+
+&#10230; Chú ý: Stochastic gradient descent (SGD) là việc cập nhật tham số dựa theo mỗi ví dụ huấn luyện, và batch gradient descent là dựa trên một lô (batch) các ví dụ huấn luyện.
+
+<br>
+
+**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
+
+&#10230;
+
+<br>
+
+**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
+
+&#10230; Giải thuật Newton - Giải thuật Newton là một phương thức số tìm θ thoả mãn điều kiện ℓ′(θ)=0. Quy tắc cập nhật của nó là như sau:
+
+<br>
+
+**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
+
+&#10230; Chú ý: Tổng quát hoá đa chiều, còn được biết đến như là phương thức Newton-Raphson, có quy tắc cập nhật như sau:
+
+<br>
+
+**21. Linear models**
+
+&#10230; Mô hình tuyến tính
+
+<br>
+
+**22. Linear regression**
+
+&#10230; Hồi quy tuyến tính
+
+<br>
+
+**23. We assume here that y|x;θ∼N(μ,σ2)**
+
+&#10230; Chúng ta giả sử ở đây rằng y|x;θ∼N(μ,σ2)
+
+<br>
+
+**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
+
+&#10230; Phương trình normal - Bằng việc kí hiệu X là ma trận thiết kế, giá trị của θ mà cực tiểu hoá cost function là một phương pháp dạng đóng như là:
+
+<br>
+
+**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
+
+&#10230;
+
+<br>
+
+**26. Remark: the update rule is a particular case of the gradient ascent.**
+
+&#10230; Chú ý: Luật cập nhật là một trường hợp đặc biệt của gradient ascent.
+
+<br>
+
+**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
+
+&#10230;
+
+<br>
+
+**28. Classification and logistic regression**
+
+&#10230; Phân loại và logistic hồi quy
+
+<br>
+
+**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
+
+&#10230; Hàm Sigmoid - Hàm sigmoid g, còn được biết đến như là hàm logistic, được định nghĩa như sau:
+
+<br>
+
+**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
+
+&#10230; Hồi quy logistic - Chúng ta giả sử ở đây rằng y|x;θ∼Bernoulli(ϕ). Ta có công thức như sau:
+
+<br>
+
+**31. Remark: there is no closed form solution for the case of logistic regressions.**
+
+&#10230;
+
+<br>
+
+**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
+
+&#10230;
+
+<br>
+
+**33. Generalized Linear Models**
+
+&#10230;
+
+<br>
+
+**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
+
+&#10230;
+
+<br>
+
+**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
+
+&#10230;
+
+<br>
+
+**36. Here are the most common exponential distributions summed up in the following table:**
+
+&#10230;
+
+<br>
+
+**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
+
+&#10230;
+
+<br>
+
+**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
+
+&#10230;
+
+<br>
+
+**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
+
+&#10230;
+
+<br>
+
+**40. Support Vector Machines**
+
+&#10230;
+
+<br>
+
+**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
+
+&#10230;
+
+<br>
+
+**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
+
+&#10230;
+
+<br>
+
+**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
+
+&#10230;
+
+<br>
+
+**44. such that**
+
+&#10230;
+
+<br>
+
+**45. support vectors**
+
+&#10230;
+
+<br>
+
+**46. Remark: the line is defined as wTx−b=0.**
+
+&#10230;
+
+<br>
+
+**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
+
+&#10230;
+
+<br>
+
+**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
+
+&#10230;
+
+<br>
+
+**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
+
+&#10230;
+
+<br>
+
+**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
+
+&#10230;
+
+<br>
+
+**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
+
+&#10230;
+
+<br>
+
+**53. Remark: the coefficients βi are called the Lagrange multipliers.**
+
+&#10230;
+
+<br>
+
+**54. Generative Learning**
+
+&#10230;
+
+<br>
+
+**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
+
+&#10230;
+
+<br>
+
+**56. Gaussian Discriminant Analysis**
+
+&#10230;
+
+<br>
+
+**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
+
+&#10230;
+
+<br>
+
+**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
+
+&#10230;
+
+<br>
+
+**59. Naive Bayes**
+
+&#10230;
+
+<br>
+
+**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
+
+&#10230;
+
+<br>
+
+**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
+
+&#10230;
+
+<br>
+
+**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
+
+&#10230;
+
+<br>
+
+**63. Tree-based and ensemble methods**
+
+&#10230;
+
+<br>
+
+**64. These methods can be used for both regression and classification problems.**
+
+&#10230;
+
+<br>
+
+**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
+
+&#10230;
+
+<br>
+
+**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
+
+&#10230;
+
+<br>
+
+**67. Remark: random forests are a type of ensemble methods.**
+
+&#10230;
+
+<br>
+
+**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**69. [Adaptive boosting, Gradient boosting]**
+
+&#10230;
+
+<br>
+
+**70. High weights are put on errors to improve at the next boosting step**
+
+&#10230;
+
+<br>
+
+**71. Weak learners trained on remaining errors**
+
+&#10230;
+
+<br>
+
+**72. Other non-parametric approaches**
+
+&#10230;
+
+<br>
+
+**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
+
+&#10230;
+
+<br>
+
+**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
+
+&#10230;
+
+<br>
+
+**75. Learning Theory**
+
+&#10230;
+
+<br>
+
+**76. Union bound ― Let A1,...,Ak be k events. We have:**
+
+&#10230;
+
+<br>
+
+**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
+
+&#10230;
+
+<br>
+
+**78. Remark: this inequality is also known as the Chernoff bound.**
+
+&#10230;
+
+<br>
+
+**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
+
+&#10230;
+
+<br>
+
+**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
+
+&#10230;
+
+<br>
+
+**81: the training and testing sets follow the same distribution **
+
+&#10230;
+
+<br>
+
+**82. the training examples are drawn independently**
+
+&#10230;
+
+<br>
+
+**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
+
+&#10230;
+
+<br>
+
+**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
+
+&#10230;
+
+<br>
+
+**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
+
+&#10230;
+
+<br>
+
+**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
+
+&#10230;
+
+<br>
+
+**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
+
+&#10230;
+
+<br>
+
+**88. [Introduction, Type of prediction, Type of model]**
+
+&#10230;
+
+<br>
+
+**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
+
+&#10230;
+
+<br>
+
+**90. [Linear models, linear regression, logistic regression, generalized linear models]**
+
+&#10230;
+
+<br>
+
+**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
+
+&#10230;
+
+<br>
+
+**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
+
+&#10230;
+
+<br>
+
+**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
+
+&#10230;
+
+<br>
+
+**94. [Other methods, k-NN]**
+
+&#10230;
+
+<br>
+
+**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
+
+&#10230;

From 0681a5059dcd233837d1d10e1cd272d87e5873d1 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 11 Jun 2019 10:19:06 -0700
Subject: [PATCH 243/531] Add [ja] contributors

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index dc4167fc2..b544a257c 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -89,6 +89,9 @@
   Kwang Hyeok Ahn (translation of Unsupervised Learning)
 
 --ja
+  Kamuela Lau (translation of deep learning tips and tricks)
+  Yoshiyuki Nakai (review of deep learning tips and tricks)
+  Hiroki Mori (review of deep learning tips and tricks)
 
 --pt
   Leticia Portella (translation of convolutional neural networks)

From d7ceda88512d92a2ba15b342350b2dba516640e7 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 11 Jun 2019 10:26:32 -0700
Subject: [PATCH 244/531] Update [ja] status

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 7f3f37092..fb9ae9f47 100644
--- a/README.md
+++ b/README.md
@@ -74,7 +74,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Magyar**|not started|not started|not started|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/155)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
 |**Italiano**|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/146)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/157)|
+|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/146)|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
 |**Polski**|not started|not started|not started|
 |**Português**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|not started|

From fe02b65498374da12ae91d2490da346ae490a4ab Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Wed, 12 Jun 2019 22:19:59 +0900
Subject: [PATCH 245/531] vi translation for supervised learning

---
 vi/cheatsheet-supervised-learning.md | 40 ++++++++++++++--------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/vi/cheatsheet-supervised-learning.md b/vi/cheatsheet-supervised-learning.md
index 91659fc60..cfcee9526 100644
--- a/vi/cheatsheet-supervised-learning.md
+++ b/vi/cheatsheet-supervised-learning.md
@@ -72,7 +72,7 @@
 
 **13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
 
-&#10230; []
+&#10230;
 
 <br>
 
@@ -144,7 +144,7 @@
 
 **25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
 
-&#10230;
+&#10230; Giải thuật LMS - Bằng việc kí hiệu α là tốc độ học, quy tắc cập nhật của giải thuật Least Mean Squares (LMS) cho tập huấn luyện của m điểm dữ liệu, còn được biết như là quy tắc học Widrow-Hoff, là như sau:
 
 <br>
 
@@ -156,7 +156,7 @@
 
 **27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
 
-&#10230;
+&#10230; LWR - Hồi quy trọng số cục bộ, còn được biết như là LWR, là biến thể của hồi quy tuyến tính, nó sẽ đánh trọng số cho mỗi ví dụ huấn luyện trong cost function của nó bởi w(i)(x), đươc định nghĩa với tham số τ∈R như sau:
 
 <br>
 
@@ -180,19 +180,19 @@
 
 **31. Remark: there is no closed form solution for the case of logistic regressions.**
 
-&#10230;
+&#10230; Chú ý: không có giải pháp dạng đóng cho trường hợp của hồi quy logistic.
 
 <br>
 
 **32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
 
-&#10230;
+&#10230; Hồi quy Softmax - Hồi quy softmax, còn được gọi là hồi quy logistic đa lớp, được sử dụng để tổng quát hoá hồi quy logistic khi có nhiều hơn 2 lớp đầu ra. Theo quy ước, chúng ta thiết lập θK=0, làm cho tham số Bernoulii ϕi của mỗi lớp i bằng với:
 
 <br>
 
 **33. Generalized Linear Models**
 
-&#10230;
+&#10230; Mô hình tuyến tính tổng quát
 
 <br>
 
@@ -210,13 +210,13 @@
 
 **36. Here are the most common exponential distributions summed up in the following table:**
 
-&#10230;
+&#10230; Ở đây là các phân phối mũ phổ biến nhất được tổng kết ở bảng bên dưới:
 
 <br>
 
 **37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
 
-&#10230;
+&#10230; [Phân phối, Bernoulli, Gaussian, Poisson, Geometric]
 
 <br>
 
@@ -234,13 +234,13 @@
 
 **40. Support Vector Machines**
 
-&#10230;
+&#10230; Máy vector hỗ trợ
 
 <br>
 
 **41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
 
-&#10230;
+&#10230; Mục tiêu của máy vector hỗ trợ là tìm ra dòng tối đa hoá khoảng cách nhỏ nhất tới dòng.
 
 <br>
 
@@ -252,19 +252,19 @@
 
 **43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
 
-&#10230;
+&#10230; với (w,b)∈Rn×R là giải pháp cho vấn đề tối ưu hoá sau đây:
 
 <br>
 
 **44. such that**
 
-&#10230;
+&#10230; như là:
 
 <br>
 
 **45. support vectors**
 
-&#10230;
+&#10230; vector hỗ trợ
 
 <br>
 
@@ -438,13 +438,13 @@
 
 **74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
 
-&#10230;
+&#10230; Chú ý: Tham số k cao hơn, bias cao hơn, tham số k thấp hơn, phương sai cao hơn
 
 <br>
 
 **75. Learning Theory**
 
-&#10230;
+&#10230; Lý thuyết học
 
 <br>
 
@@ -472,21 +472,21 @@
 
 <br>
 
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
+**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions:**
 
 &#10230;
 
 <br>
 
-**81: the training and testing sets follow the same distribution **
+**81: the training and testing sets follow the same distribution**
 
-&#10230;
+&#10230; tập huấn luyện và test có cùng phân phối
 
 <br>
 
 **82. the training examples are drawn independently**
 
-&#10230;
+&#10230; ví dụ huấn luyện được tạo ra độc lập
 
 <br>
 
@@ -522,7 +522,7 @@
 
 **88. [Introduction, Type of prediction, Type of model]**
 
-&#10230;
+&#10230; [Giới thiệu, Loại dự đoán, Loại mô hình]
 
 <br>
 

From 5782d9c47bf5862fe9790d172833c081228baacd Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Wed, 12 Jun 2019 08:47:07 -0700
Subject: [PATCH 246/531] Update [vi] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index fb9ae9f47..dfe8edfc8 100644
--- a/README.md
+++ b/README.md
@@ -56,7 +56,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|done|done|
 |**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
-|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|not started|not started|not started|not started|
+|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/162)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|not started|not started|
 |**中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 
 ### CS 230 (Deep Learning)

From c28e3216a6326c3be1e1454a7151c73045bc43e3 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Thu, 13 Jun 2019 21:16:48 +0900
Subject: [PATCH 247/531] vi translation for deep learning

---
 vi/cheatsheet-deep-learning.md | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/vi/cheatsheet-deep-learning.md b/vi/cheatsheet-deep-learning.md
index da05af89b..45dad28fa 100644
--- a/vi/cheatsheet-deep-learning.md
+++ b/vi/cheatsheet-deep-learning.md
@@ -12,7 +12,7 @@
 
 **3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
 
-&#10230; Mạng Neural là 1 lớp của các models được xây dựng với các tầng (layers). Các loại mạng Neural thường được sử dụng bao gồm: Mạng Neural tích chập (Convolutional Neural Networks) và Mạng Neural hồi quy (Recurrent Neural Networks).
+&#10230; Mạng Neural là 1 lớp của các mô hình (models) được xây dựng với các tầng (layers). Các loại mạng Neural thường được sử dụng bao gồm: Mạng Neural tích chập (Convolutional Neural Networks) và Mạng Neural hồi quy (Recurrent Neural Networks).
 
 <br>
 
@@ -30,7 +30,7 @@
 
 **6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
 
-&#10230; Bằng việc kí hiệu i là tầng thứ i của mạng, j là đơn vị ẩn (hidden unit) thứ j của tầng, ta có:
+&#10230; Bằng việc kí hiệu i là tầng thứ i của mạng, j là hidden unit (đơn vị ẩn) thứ j của tầng, ta có:
 
 <br>
 
@@ -54,7 +54,7 @@
 
 **10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
 
-&#10230; Mất mát (loss) Cross-entropy - Trong bối cảnh của mạng neural, mất mát cross-entropy L(z, y) thường được sử dụng và định nghĩa như sau:
+&#10230; Lỗi (loss) Cross-entropy - Trong bối cảnh của mạng neural, hàm lỗi cross-entropy L(z, y) thường được sử dụng và định nghĩa như sau:
 
 <br>
 
@@ -90,13 +90,13 @@
 
 **16. Step 2: Perform forward propagation to obtain the corresponding loss.**
 
-&#10230; Bước 2: Thực thi lan truyền xuôi (forward propagation) để lấy được mất mát (loss) tương ứng.
+&#10230; Bước 2: Thực thi lan truyền tiến (forward propagation) để lấy được lỗi (loss) tương ứng.
 
 <br>
 
 **17. Step 3: Backpropagate the loss to get the gradients.**
 
-&#10230; Bước 3: Lan truyền ngược mất mát để lấy được gradients (độ dốc).
+&#10230; Bước 3: Lan truyền ngược lỗi để lấy được gradients (độ dốc).
 
 <br>
 
@@ -126,13 +126,13 @@
 
 **22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
 
-&#10230; Batch normalization (chuẩn hoá) - Đây là bước mà các hyperparameter γ,β chuẩn hoá batch (mẻ) {xi}. Bằng việc kí hiệu μB,σ2B là giá trị trung bình, phương sai mà ta muốn gán cho batch, nó được thực hiện như sau:
+&#10230; Batch normalization (chuẩn hoá) - Đây là bước mà các hyperparameter γ,β chuẩn hoá batch {xi}. Bằng việc kí hiệu μB,σ2B là giá trị trung bình, phương sai mà ta muốn gán cho batch, nó được thực hiện như sau:
 
 <br>
 
 **23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
 
-&#10230; Nó thường được hoàn thành sau fully connected/convolutional layer và trước non-linearity layer và mục tiêu là cho phép tốc độ học cao hơn cũng như giảm đi sự phụ thuộc mạnh mẽ vào việc khởi tạo.
+&#10230; Nó thường được tính sau fully connected/convolutional layer và trước non-linearity layer và mục tiêu là cho phép tốc độ học cao hơn cũng như giảm đi sự phụ thuộc mạnh mẽ vào việc khởi tạo.
 
 <br>
 
@@ -162,13 +162,13 @@
 
 **28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
 
-&#10230; LSTM - Mạng bộ nhớ ngắn dài (LSTM) là 1 loại RNN model tránh vấn đề vanishing gradient (độ dốc biến mất đột ngột) bằng cách thêm vào cổng 'quên' ('forget' gates).
+&#10230; LSTM - Mạng bộ nhớ ngắn dài (LSTM) là 1 loại RNN model tránh vấn đề vanishing gradient (gradient biến mất đột ngột) bằng cách thêm vào cổng 'quên' ('forget' gates).
 
 <br>
 
 **29. Reinforcement Learning and Control**
 
-&#10230; Reinforcement Learning và Control
+&#10230; Reinforcement Learning (Học tăng cường) và điều khiển
 
 <br>
 
@@ -216,7 +216,7 @@
 
 **37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
 
-&#10230; R:S×A⟶R hoặc R:S⟶R là reward function (hàm reward) mà giải thuật muốn tối đa hoá.
+&#10230; R:S×A⟶R hoặc R:S⟶R là reward function (hàm định nghĩa phần thưởng) mà giải thuật muốn tối đa hoá.
 
 <br>
 
@@ -306,7 +306,7 @@
 
 **52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
 
-&#10230; [Mạng neural tích chập, Tầng chập, Chuẩn hoá lô (batch)]
+&#10230; [Mạng neural tích chập, Tầng chập, Chuẩn hoá batch]
 
 <br>
 

From c8730c9037536e278919bce124a1f3660459ce6b Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sat, 15 Jun 2019 16:24:14 -0700
Subject: [PATCH 248/531] Update README.md

---
 README.md | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/README.md b/README.md
index dfe8edfc8..bee92648b 100644
--- a/README.md
+++ b/README.md
@@ -34,6 +34,22 @@ The translation process of each cheatsheet contains two steps:
 Please make sure to propose the translation of **only one** cheatsheet per pull request -- it simplifies a lot the review process.
 
 ## Progression 
+### CS 221 (Artificial Intelligence)
+| |Reflex|States|Variables|Logic|
+|:---|:---:|:---:|:---:|:---:|
+|**Deutsch**|not started|not started|not started|not started|
+|**Español**|not started|not started|not started|not started|
+|**فارسی**|not started|not started|not started|not started|
+|**Français**|not started|not started|not started|not started|
+|**עִבְרִית**|not started|not started|not started|not started|
+|**Italiano**|not started|not started|not started|not started|
+|**日本語**|not started|not started|not started|not started|
+|**한국어**|not started|not started|not started|not started|
+|**Português**|not started|not started|not started|not started|
+|**Türkçe**|not started|not started|not started|not started|
+|**Tiếng Việt**|not started|not started|not started|not started|
+|**中文**|not started|not started|not started|not started|
+
 ### CS 229 (Machine Learning)
 | |Deep learning|Supervised|Unsupervised|ML tips|Probabilities|Algebra|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|

From d095a618ca446ff09f8d27586098d923e19f4f19 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sat, 15 Jun 2019 16:25:10 -0700
Subject: [PATCH 249/531] Update README.md

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index bee92648b..6c2a2cadc 100644
--- a/README.md
+++ b/README.md
@@ -35,7 +35,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 ## Progression 
 ### CS 221 (Artificial Intelligence)
-| |Reflex|States|Variables|Logic|
+| |Reflex models|States models|Variables models|Logic models|
 |:---|:---:|:---:|:---:|:---:|
 |**Deutsch**|not started|not started|not started|not started|
 |**Español**|not started|not started|not started|not started|
@@ -76,7 +76,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 
 ### CS 230 (Deep Learning)
-| |Convolutional Neural Networks|Recurrent Neural Networks|DL tips|
+| |Convolutional Neural Networks|Recurrent Neural Networks|Deep Learning tips|
 |:---|:---:|:---:|:---:|
 |**العَرَبِيَّة**|not started|not started|not started|
 |**Català**|not started|not started|not started|

From 6313a7bb824eb3e9d27e5bbbf847ea7546603018 Mon Sep 17 00:00:00 2001
From: afshinea <afshine.amidi@centraliens.net>
Date: Sat, 15 Jun 2019 16:56:07 -0700
Subject: [PATCH 250/531] Add template

---
 template/reflex-models.md | 716 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 716 insertions(+)
 create mode 100644 template/reflex-models.md

diff --git a/template/reflex-models.md b/template/reflex-models.md
new file mode 100644
index 000000000..29621b5b7
--- /dev/null
+++ b/template/reflex-models.md
@@ -0,0 +1,716 @@
+**Reflex-based models translation**
+
+<br>
+
+**1. Reflex-based models with Machine Learning**
+
+&#10230;
+
+<br>
+
+
+**2. Linear predictors**
+
+&#10230;
+
+<br>
+
+
+**3. In this section, we will go through reflex-based models that can improve with experience, by going through samples that have input-output pairs.**
+
+&#10230;
+
+<br>
+
+
+**4. Feature vector ― The feature vector of an input x is noted ϕ(x) and is such that:**
+
+&#10230;
+
+<br>
+
+
+**5. Score ― The score s(x,w) of an example (ϕ(x),y)∈Rd×R associated to a linear model of weights w∈Rd is given by the inner product:**
+
+&#10230;
+
+<br>
+
+
+**6. Classification**
+
+&#10230;
+
+<br>
+
+
+**7. Linear classifier ― Given a weight vector w∈Rd and a feature vector ϕ(x)∈Rd, the binary linear classifier fw is given by:**
+
+&#10230;
+
+<br>
+
+
+**8. if**
+
+&#10230;
+
+<br>
+
+
+**9. Margin ― The margin m(x,y,w)∈R of an example (ϕ(x),y)∈Rd×{−1,+1} associated to a linear model of weights w∈Rd quantifies the confidence of the prediction: larger values are better. It is given by:**
+
+&#10230;
+
+<br>
+
+
+**10. Regression**
+
+&#10230;
+
+<br>
+
+
+**11. Linear regression ― Given a weight vector w∈Rd and a feature vector ϕ(x)∈Rd, the output of a linear regression of weights w denoted as fw is given by:**
+
+&#10230;
+
+<br>
+
+
+**12. Residual ― The residual res(x,y,w)∈R is defined as being the amount by which the prediction fw(x) overshoots the target y:**
+
+&#10230;
+
+<br>
+
+
+**13. Loss minimization**
+
+&#10230;
+
+<br>
+
+
+**14. Loss function ― A loss function Loss(x,y,w) quantifies how unhappy we are with the weights w of the model in the prediction task of output y from input x. It is a quantity we want to minimize during the training process.**
+
+&#10230;
+
+<br>
+
+
+**15. Classification case - The classification of a sample x of true label y∈{−1,+1} with a linear model of weights w can be done with the predictor fw(x)≜sign(s(x,w)). In this situation, a metric of interest quantifying the quality of the classification is given by the margin m(x,y,w), and can be used with the following loss functions:**
+
+&#10230;
+
+<br>
+
+
+**16. [Name, Illustration, Zero-one loss, Hinge loss, Logistic loss]**
+
+&#10230;
+
+<br>
+
+
+**17. Regression case - The prediction of a sample x of true label y∈R with a linear model of weights w can be done with the predictor fw(x)≜s(x,w). In this situation, a metric of interest quantifying the quality of the regression is given by the margin res(x,y,w) and can be used with the following loss functions:**
+
+&#10230;
+
+<br>
+
+
+**18. [Name, Squared loss, Absolute deviation loss, Illustration]**
+
+&#10230;
+
+<br>
+
+
+**19. Loss minimization framework ― In order to train a model, we want to minimize the training loss is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**20. Non-linear predictors**
+
+&#10230;
+
+<br>
+
+
+**21. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
+
+&#10230;
+
+<br>
+
+
+**22. Remark: the higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
+
+&#10230;
+
+<br>
+
+
+**23. Neural networks ― Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks. The vocabulary around neural networks architectures is described in the figure below:**
+
+&#10230;
+
+<br>
+
+
+**24. [Input layer, Hidden layer, Output layer]**
+
+&#10230;
+
+<br>
+
+
+**25. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
+
+&#10230;
+
+<br>
+
+
+**26. where we note w, b, x, z the weight, bias, input and non-activated output of the neuron respectively.**
+
+&#10230;
+
+<br>
+
+
+**27. For a more detailed overview of the concepts above, check out the Supervised Learning cheatsheets!**
+
+&#10230;
+
+<br>
+
+
+**28. Stochastic gradient descent**
+
+&#10230;
+
+<br>
+
+
+**29. Gradient descent ― By noting η∈R the learning rate (also called step size), the update rule for gradient descent is expressed with the learning rate and the loss function Loss(x,y,w) as follows:**
+
+&#10230;
+
+<br>
+
+
+**30. Stochastic updates ― Stochastic gradient descent (SGD) updates the parameters of the model one training example (ϕ(x),y)∈Dtrain at a time. This method leads to sometimes noisy, but fast updates.**
+
+&#10230;
+
+<br>
+
+
+**31. Batch updates ― Batch gradient descent (BGD) updates the parameters of the model one batch of examples (e.g. the entire training set) at a time. This method computes stable update directions, at a greater computational cost.**
+
+&#10230;
+
+<br>
+
+
+**32. Fine-tuning models**
+
+&#10230;
+
+<br>
+
+
+**33. Hypothesis class ― A hypothesis class F is the set of possible predictors with a fixed ϕ(x) and varying w:**
+
+&#10230;
+
+<br>
+
+
+**34. Logistic function ― The logistic function σ, also called the sigmoid function, is defined as:**
+
+&#10230;
+
+<br>
+
+
+**35. Remark: we have σ′(z)=σ(z)(1−σ(z)).**
+
+&#10230;
+
+<br>
+
+
+**36. Backpropagation ― The forward pass is done through fi, which is the value for the subexpression rooted at i, while the backward pass is done through gi=∂out∂fi and represents how fi influences the output.**
+
+&#10230;
+
+<br>
+
+
+**37. Approximation and estimation error ― The approximation error ϵapprox represents how far the entire hypothesis class F is from the target predictor g∗, while the estimation error ϵest quantifies how good the predictor ^f is with respect to the best predictor f∗ of the hypothesis class F.**
+
+&#10230;
+
+<br>
+
+
+**38. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230;
+
+<br>
+
+
+**39. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230;
+
+<br>
+
+
+**40. Hyperparameters ― Hyperparameters are the properties of the learning algorithm, and include features, regularization parameter λ, number of iterations T, step size η, etc.**
+
+&#10230;
+
+<br>
+
+
+**41. Sets vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230;
+
+<br>
+
+
+**42. [Training set, Validation set, Testing set]**
+
+&#10230;
+
+<br>
+
+
+**43. [Model is trained, Usually 80% of the dataset, Model is assessed, Usually 20% of the dataset, Also called hold-out or development set, Model gives predictions, Unseen data]**
+
+&#10230;
+
+<br>
+
+
+**44. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+&#10230;
+
+<br>
+
+
+**45. [Dataset, Unseen data, train, validation, test]**
+
+&#10230;
+
+<br>
+
+
+**46. For a more detailed overview of the concepts above, check out the Machine Learning tips and tricks cheatsheets!**
+
+&#10230;
+
+<br>
+
+
+**47. Unsupervised Learning**
+
+&#10230;
+
+<br>
+
+
+**48. The class of unsupervised learning methods aims at discovering the structure of the data, which may have of rich latent structures.**
+
+&#10230;
+
+<br>
+
+
+**49. k-means**
+
+&#10230;
+
+<br>
+
+
+**50. Clustering ― Given a training set of input points Dtrain, the goal of a clustering algorithm is to assign each point ϕ(xi) to a cluster zi∈{1,...,k}**
+
+&#10230;
+
+<br>
+
+
+**51. Objective function ― The loss function for one of the main clustering algorithms, k-means, is given by:**
+
+&#10230;
+
+<br>
+
+
+**52. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230;
+
+<br>
+
+
+**53. and**
+
+&#10230;
+
+<br>
+
+
+**54. [Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230;
+
+<br>
+
+
+**55. Principal Component Analysis**
+
+&#10230;
+
+<br>
+
+
+**56. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+
+<br>
+
+
+**57. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+
+<br>
+
+
+**58. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230;
+
+<br>
+
+
+**59. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:**
+
+&#10230;
+
+<br>
+
+
+**60. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230;
+
+<br>
+
+
+**61. [where, and]**
+
+&#10230;
+
+<br>
+
+
+**62. [Step 2: Compute Σ=1mm∑i=1ϕ(xi)ϕ(xi)T∈Rn×n, which is symmetric with real eigenvalues., Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues., Step 4: Project the data on spanR(u1,...,uk).]**
+
+&#10230;
+
+<br>
+
+
+**63. This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230;
+
+<br>
+
+
+**64. [Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230;
+
+<br>
+
+
+**65. For a more detailed overview of the concepts above, check out the Unsupervised Learning cheatsheets!**
+
+&#10230;
+
+<br>
+
+
+**66. [Linear predictors, Feature vector, Linear classifier/regression, Margin]**
+
+&#10230;
+
+<br>
+
+
+**67. [Loss minimization, Loss function, Framework]**
+
+&#10230;
+
+<br>
+
+
+**68. [Non-linear predictors, k-nearest neighbors, Neural networks]**
+
+&#10230;
+
+<br>
+
+
+**69. [Stochastic gradient descent, Gradient, Stochastic updates, Batch updates]**
+
+&#10230;
+
+<br>
+
+
+**70. [Fine-tuning models, Hypothesis class, Backpropagation, Regularization, Sets vocabulary]**
+
+&#10230;
+
+<br>
+
+
+**71. [Unsupervised Learning, k-means, Principal components analysis]**
+
+&#10230;
+
+<br>
+
+
+**72. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**73. Original authors**
+
+&#10230;
+
+<br>
+
+
+**74. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**75. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**76. By X and Y**
+
+&#10230;
+
+<br>
+
+
+**77. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+
+**78. **
+
+&#10230;
+
+<br>
+
+
+**79. **
+
+&#10230;
+
+<br>
+
+
+**80. **
+
+&#10230;
+
+<br>
+
+
+**81. **
+
+&#10230;
+
+<br>
+
+
+**82. **
+
+&#10230;
+
+<br>
+
+
+**83. **
+
+&#10230;
+
+<br>
+
+
+**84. **
+
+&#10230;
+
+<br>
+
+
+**85. **
+
+&#10230;
+
+<br>
+
+
+**86. **
+
+&#10230;
+
+<br>
+
+
+**87. **
+
+&#10230;
+
+<br>
+
+
+**88. **
+
+&#10230;
+
+<br>
+
+
+**89. **
+
+&#10230;
+
+<br>
+
+
+**90. **
+
+&#10230;
+
+<br>
+
+
+**91. **
+
+&#10230;
+
+<br>
+
+
+**92. **
+
+&#10230;
+
+<br>
+
+
+**93. **
+
+&#10230;
+
+<br>
+
+
+**94. **
+
+&#10230;
+
+<br>
+
+
+**95. **
+
+&#10230;
+
+<br>
+
+
+**96. **
+
+&#10230;
+
+<br>
+
+
+**97. **
+
+&#10230;
+
+<br>
+
+
+**98. **
+
+&#10230;
+
+<br>
+
+
+**99. **
+
+&#10230;
+
+<br>
+
+
+**100. **
+
+&#10230;
+
+<br>
+
+
+**101. **
+
+&#10230;
+
+<br>
+
+
+**102. **
+
+&#10230;
+
+<br>

From 9a5046bf370308257a77ca0a0c3853154caaa4ce Mon Sep 17 00:00:00 2001
From: afshinea <afshine.amidi@centraliens.net>
Date: Sat, 15 Jun 2019 16:57:17 -0700
Subject: [PATCH 251/531] Update template

---
 template/reflex-models.md | 177 --------------------------------------
 1 file changed, 177 deletions(-)

diff --git a/template/reflex-models.md b/template/reflex-models.md
index 29621b5b7..a6b338419 100644
--- a/template/reflex-models.md
+++ b/template/reflex-models.md
@@ -537,180 +537,3 @@
 **77. The Artificial Intelligence cheatsheets are now available in [target language].**
 
 &#10230;
-
-<br>
-
-
-**78. **
-
-&#10230;
-
-<br>
-
-
-**79. **
-
-&#10230;
-
-<br>
-
-
-**80. **
-
-&#10230;
-
-<br>
-
-
-**81. **
-
-&#10230;
-
-<br>
-
-
-**82. **
-
-&#10230;
-
-<br>
-
-
-**83. **
-
-&#10230;
-
-<br>
-
-
-**84. **
-
-&#10230;
-
-<br>
-
-
-**85. **
-
-&#10230;
-
-<br>
-
-
-**86. **
-
-&#10230;
-
-<br>
-
-
-**87. **
-
-&#10230;
-
-<br>
-
-
-**88. **
-
-&#10230;
-
-<br>
-
-
-**89. **
-
-&#10230;
-
-<br>
-
-
-**90. **
-
-&#10230;
-
-<br>
-
-
-**91. **
-
-&#10230;
-
-<br>
-
-
-**92. **
-
-&#10230;
-
-<br>
-
-
-**93. **
-
-&#10230;
-
-<br>
-
-
-**94. **
-
-&#10230;
-
-<br>
-
-
-**95. **
-
-&#10230;
-
-<br>
-
-
-**96. **
-
-&#10230;
-
-<br>
-
-
-**97. **
-
-&#10230;
-
-<br>
-
-
-**98. **
-
-&#10230;
-
-<br>
-
-
-**99. **
-
-&#10230;
-
-<br>
-
-
-**100. **
-
-&#10230;
-
-<br>
-
-
-**101. **
-
-&#10230;
-
-<br>
-
-
-**102. **
-
-&#10230;
-
-<br>

From d07037431b2caf772c4a664125e5ac7787d5023e Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 15 Jun 2019 19:19:12 -0700
Subject: [PATCH 252/531] Link cheatsheets to templates

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 6c2a2cadc..542ec1742 100644
--- a/README.md
+++ b/README.md
@@ -35,7 +35,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 ## Progression 
 ### CS 221 (Artificial Intelligence)
-| |Reflex models|States models|Variables models|Logic models|
+| |[Reflex models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/reflex-models.md)|States models|Variables models|Logic models|
 |:---|:---:|:---:|:---:|:---:|
 |**Deutsch**|not started|not started|not started|not started|
 |**Español**|not started|not started|not started|not started|
@@ -51,7 +51,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**中文**|not started|not started|not started|not started|
 
 ### CS 229 (Machine Learning)
-| |Deep learning|Supervised|Unsupervised|ML tips|Probabilities|Algebra|
+| |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cheatsheet-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cheatsheet-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cheatsheet-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cheatsheet-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/refresher-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/refresher-linear-algebra.md)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |**العَرَبِيَّة**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|
 |**Català**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|
@@ -76,7 +76,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 
 ### CS 230 (Deep Learning)
-| |Convolutional Neural Networks|Recurrent Neural Networks|Deep Learning tips|
+| |[Convolutional Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/convolutional-neural-networks.md)|[Recurrent Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/recurrent-neural-networks.md)|[Deep Learning tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/deep-learning-tips-and-tricks.md)|
 |:---|:---:|:---:|:---:|
 |**العَرَبِيَّة**|not started|not started|not started|
 |**Català**|not started|not started|not started|

From 50519ea6acb05373010f4733035ebdafbceffe03 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 15 Jun 2019 19:19:58 -0700
Subject: [PATCH 253/531] Update [fr] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 542ec1742..15127f416 100644
--- a/README.md
+++ b/README.md
@@ -40,7 +40,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Deutsch**|not started|not started|not started|not started|
 |**Español**|not started|not started|not started|not started|
 |**فارسی**|not started|not started|not started|not started|
-|**Français**|not started|not started|not started|not started|
+|**Français**|done|done|done|done|
 |**עִבְרִית**|not started|not started|not started|not started|
 |**Italiano**|not started|not started|not started|not started|
 |**日本語**|not started|not started|not started|not started|

From daedd779cb7d5a034d8eaf8e2fd854ebfe74b60e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=AD=E4=BA=95=E5=96=9C=E4=B9=8B?= <fpnz@airman.local>
Date: Thu, 27 Jun 2019 09:41:22 +0900
Subject: [PATCH 254/531] Reviewed and edited 21. to 26.

---
 ja/deep-learning-tips-and-tricks.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/deep-learning-tips-and-tricks.md
index 7733d3cbd..e465698e6 100644
--- a/ja/deep-learning-tips-and-tricks.md
+++ b/ja/deep-learning-tips-and-tricks.md
@@ -144,28 +144,28 @@
 
 **21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
 
-&#10230;エポック - モデル学習においては、エポックとはモデルが全データで学習した一つのイテレーションのことを指します。
+&#10230;エポック - モデル学習においてエポックとは学習の繰り返しの中の1回を指す用語で、1エポックの間にモデルは全学習データからその重みを更新します。
 
 <br>
 
 
 **22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
 
-&#10230;ミニバッチの勾配降下法 - 学習時には、計算量が多いため、基本的には全データに基づいて重みを更新しません。また、ノイズの影響のため、1個のデータでも更新しません。それよりむしろ、ミニバッチで重みを更新し、ミニバッチの大きさはチューニングできるハイパーパラメータの一つです。	
+&#10230;ミニバッチ勾配降下法 - 学習段階では、計算が複雑になりすぎるため通常は全データを一度に使って重みを更新することはありません。またノイズが問題になるため1つのデータポイントだけを使って重みを更新することもありません。代わりに、更新はミニバッチごとに行われます。各バッチに含まれるデータポイントの数は調整可能なハイパーパラメータです。
 
 <br>
 
 
 **23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
 
-&#10230;損失関数 - モデルの精度・良さを数値化するために、基本的には損失関数Lでモデルの出力zがどれくらい正解zを推測するか評価します。
+&#10230;損失関数 - 得られたモデルの性能を数値化するために、モデルの出力zが実際の出力yをどの程度正確に予測できているかを評価する損失関数Lが通常使われます。
 
 <br>
 
 
 **24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
 
-&#10230;交差エントロピー誤差 - ニューラルネットワークにおける二項分類では、交差エントロピー誤差L(z,y)は多用されており、以下のように定義されています。
+&#10230;交差エントロピー誤差 - ニューラルネットワークにおける二項分類では、交差エントロピー誤差L(z,y)が一般的に使用されており、以下のように定義されています。
 
 <br>
 
@@ -179,7 +179,7 @@
 
 **26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
 
-&#10230;誤差逆伝播法 - 実際の出力と期待の出力の差に基づいてニューラルネットワークの重みを更新する手法です。チェーンルールを用いて各重みで微分をとります。
+&#10230;誤差逆伝播法 - 実際の出力と期待される出力の差に基づいてニューラルネットワークの重みを更新する手法です。各重みwに関する微分は連鎖律を用いて計算されます。
 
 <br>
 

From 1efc4d7676652e20705f9097586ea862adcb7a95 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 29 Jun 2019 15:49:45 -0700
Subject: [PATCH 255/531] Add AI

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 15127f416..74756c284 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # Translation of VIP Cheatsheets
 ## Goal
-This repository aims at collaboratively translating our [Machine Learning](https://github.com/afshinea/stanford-cs-229-machine-learning) and [Deep Learning](https://github.com/afshinea/stanford-cs-230-deep-learning) cheatsheets into a ton of languages, so that this content can be enjoyed by anyone from any part of the world!
+This repository aims at collaboratively translating our [Machine Learning](https://github.com/afshinea/stanford-cs-229-machine-learning), [Deep Learning](https://github.com/afshinea/stanford-cs-230-deep-learning) and [Artificial Intelligence](https://github.com/afshinea/stanford-cs-221-artificial-intelligence) cheatsheets into a ton of languages, so that this content can be enjoyed by anyone from any part of the world!
 
 ## Contribution guidelines
 The translation process of each cheatsheet contains two steps:

From 4fb83102b686a565ab04c46e0c56c71c7de73077 Mon Sep 17 00:00:00 2001
From: afshinea <afshine.amidi@centraliens.net>
Date: Sun, 30 Jun 2019 11:10:45 -0700
Subject: [PATCH 256/531] Change notations

---
 template/{reflex-models.md => cs-221-reflex-models.md}            | 0
 template/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} | 0
 .../{refresher-linear-algebra.md => cs-229-linear-algebra.md}     | 0
 ...s-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} | 0
 template/{refresher-probability.md => cs-229-probability.md}      | 0
 ...sheet-supervised-learning.md => cs-229-supervised-learning.md} | 0
 ...t-unsupervised-learning.md => cs-229-unsupervised-learning.md} | 0
 ...neural-networks.md => cs-230-convolutional-neural-networks.md} | 0
 ...tips-and-tricks.md => cs-230-deep-learning-tips-and-tricks.md} | 0
 ...ent-neural-networks.md => cs-230-recurrent-neural-networks.md} | 0
 10 files changed, 0 insertions(+), 0 deletions(-)
 rename template/{reflex-models.md => cs-221-reflex-models.md} (100%)
 rename template/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (100%)
 rename template/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (100%)
 rename template/{cheatsheet-machine-learning-tips-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} (100%)
 rename template/{refresher-probability.md => cs-229-probability.md} (100%)
 rename template/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)
 rename template/{cheatsheet-unsupervised-learning.md => cs-229-unsupervised-learning.md} (100%)
 rename template/{convolutional-neural-networks.md => cs-230-convolutional-neural-networks.md} (100%)
 rename template/{deep-learning-tips-and-tricks.md => cs-230-deep-learning-tips-and-tricks.md} (100%)
 rename template/{recurrent-neural-networks.md => cs-230-recurrent-neural-networks.md} (100%)

diff --git a/template/reflex-models.md b/template/cs-221-reflex-models.md
similarity index 100%
rename from template/reflex-models.md
rename to template/cs-221-reflex-models.md
diff --git a/template/cheatsheet-deep-learning.md b/template/cs-229-deep-learning.md
similarity index 100%
rename from template/cheatsheet-deep-learning.md
rename to template/cs-229-deep-learning.md
diff --git a/template/refresher-linear-algebra.md b/template/cs-229-linear-algebra.md
similarity index 100%
rename from template/refresher-linear-algebra.md
rename to template/cs-229-linear-algebra.md
diff --git a/template/cheatsheet-machine-learning-tips-and-tricks.md b/template/cs-229-machine-learning-tips-and-tricks.md
similarity index 100%
rename from template/cheatsheet-machine-learning-tips-and-tricks.md
rename to template/cs-229-machine-learning-tips-and-tricks.md
diff --git a/template/refresher-probability.md b/template/cs-229-probability.md
similarity index 100%
rename from template/refresher-probability.md
rename to template/cs-229-probability.md
diff --git a/template/cheatsheet-supervised-learning.md b/template/cs-229-supervised-learning.md
similarity index 100%
rename from template/cheatsheet-supervised-learning.md
rename to template/cs-229-supervised-learning.md
diff --git a/template/cheatsheet-unsupervised-learning.md b/template/cs-229-unsupervised-learning.md
similarity index 100%
rename from template/cheatsheet-unsupervised-learning.md
rename to template/cs-229-unsupervised-learning.md
diff --git a/template/convolutional-neural-networks.md b/template/cs-230-convolutional-neural-networks.md
similarity index 100%
rename from template/convolutional-neural-networks.md
rename to template/cs-230-convolutional-neural-networks.md
diff --git a/template/deep-learning-tips-and-tricks.md b/template/cs-230-deep-learning-tips-and-tricks.md
similarity index 100%
rename from template/deep-learning-tips-and-tricks.md
rename to template/cs-230-deep-learning-tips-and-tricks.md
diff --git a/template/recurrent-neural-networks.md b/template/cs-230-recurrent-neural-networks.md
similarity index 100%
rename from template/recurrent-neural-networks.md
rename to template/cs-230-recurrent-neural-networks.md

From fe3a3d90df921712e49f099f5cd3ca8875f7ca72 Mon Sep 17 00:00:00 2001
From: afshinea <afshine.amidi@centraliens.net>
Date: Sun, 30 Jun 2019 11:14:20 -0700
Subject: [PATCH 257/531] Clean up folder

---
 ...tsheet-machine-learning-tips-and-tricks.md | 285 ---------
 ar/cheatsheet-supervised-learning.md          | 567 ------------------
 ar/cheatsheet-unsupervised-learning.md        | 340 -----------
 ar/refresher-probability.md                   | 381 ------------
 4 files changed, 1573 deletions(-)
 delete mode 100644 ar/cheatsheet-machine-learning-tips-and-tricks.md
 delete mode 100644 ar/cheatsheet-supervised-learning.md
 delete mode 100644 ar/cheatsheet-unsupervised-learning.md
 delete mode 100644 ar/refresher-probability.md

diff --git a/ar/cheatsheet-machine-learning-tips-and-tricks.md b/ar/cheatsheet-machine-learning-tips-and-tricks.md
deleted file mode 100644
index 9712297b8..000000000
--- a/ar/cheatsheet-machine-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,285 +0,0 @@
-**1. Machine Learning tips and tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Classification metrics**
-
-&#10230;
-
-<br>
-
-**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
-
-&#10230;
-
-<br>
-
-**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**5. [Predicted class, Actual class]**
-
-&#10230;
-
-<br>
-
-**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
-
-&#10230;
-
-<br>
-
-**7. [Metric, Formula, Interpretation]**
-
-&#10230;
-
-<br>
-
-**8. Overall performance of model**
-
-&#10230;
-
-<br>
-
-**9. How accurate the positive predictions are**
-
-&#10230;
-
-<br>
-
-**10. Coverage of actual positive sample**
-
-&#10230;
-
-<br>
-
-**11. Coverage of actual negative sample**
-
-&#10230;
-
-<br>
-
-**12. Hybrid metric useful for unbalanced classes**
-
-&#10230;
-
-<br>
-
-**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**14. [Metric, Formula, Equivalent]**
-
-&#10230;
-
-<br>
-
-**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
-
-&#10230;
-
-<br>
-
-**16. [Actual, Predicted]**
-
-&#10230;
-
-<br>
-
-**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
-
-&#10230;
-
-<br>
-
-**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
-
-&#10230;
-
-<br>
-
-**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
-
-&#10230;
-
-<br>
-
-**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
-
-&#10230;
-
-<br>
-
-**22. Model selection**
-
-&#10230;
-
-<br>
-
-**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
-
-&#10230;
-
-<br>
-
-**24. [Training set, Validation set, Testing set]**
-
-&#10230;
-
-<br>
-
-**25. [Model is trained, Model is assessed, Model gives predictions]**
-
-&#10230;
-
-<br>
-
-**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
-
-&#10230;
-
-<br>
-
-**27. [Also called hold-out or development set, Unseen data]**
-
-&#10230;
-
-<br>
-
-**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
-
-&#10230;
-
-<br>
-
-**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
-
-&#10230;
-
-<br>
-
-**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
-
-&#10230;
-
-<br>
-
-**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
-
-&#10230;
-
-<br>
-
-**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
-
-&#10230;
-
-<br>
-
-**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-**35. Diagnostics**
-
-&#10230;
-
-<br>
-
-**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
-
-&#10230;
-
-<br>
-
-**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
-
-&#10230;
-
-<br>
-
-**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
-
-&#10230;
-
-<br>
-
-**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
-
-&#10230;
-
-<br>
-
-**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
-
-&#10230;
-
-<br>
-
-**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
-
-&#10230;
-
-<br>
-
-**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
-
-&#10230;
-
-<br>
-
-**44. Regression metrics**
-
-&#10230;
-
-<br>
-
-**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-**47. [Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/ar/cheatsheet-supervised-learning.md b/ar/cheatsheet-supervised-learning.md
deleted file mode 100644
index a6b19ea1c..000000000
--- a/ar/cheatsheet-supervised-learning.md
+++ /dev/null
@@ -1,567 +0,0 @@
-**1. Supervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Supervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
-
-&#10230;
-
-<br>
-
-**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**5. [Regression, Classifier, Outcome, Examples]**
-
-&#10230;
-
-<br>
-
-**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**7. Type of model ― The different models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
-
-&#10230;
-
-<br>
-
-**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**10. Notations and general concepts**
-
-&#10230;
-
-<br>
-
-**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
-
-&#10230;
-
-<br>
-
-**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
-
-&#10230;
-
-<br>
-
-**14. [Linear regression, Logistic regression, SVM, Neural Network]**
-
-&#10230;
-
-<br>
-
-**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
-
-&#10230;
-
-<br>
-
-**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
-
-&#10230;
-
-<br>
-
-**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
-
-&#10230;
-
-<br>
-
-**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
-
-&#10230;
-
-<br>
-
-**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
-
-&#10230;
-
-<br>
-
-**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
-
-&#10230;
-
-<br>
-
-**21. Linear models**
-
-&#10230;
-
-<br>
-
-**22. Linear regression**
-
-&#10230;
-
-<br>
-
-**23. We assume here that y|x;θ∼N(μ,σ2)**
-
-&#10230;
-
-<br>
-
-**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
-
-&#10230;
-
-<br>
-
-**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
-
-&#10230;
-
-<br>
-
-**26. Remark: the update rule is a particular case of the gradient ascent.**
-
-&#10230;
-
-<br>
-
-**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
-
-&#10230;
-
-<br>
-
-**28. Classification and logistic regression**
-
-&#10230;
-
-<br>
-
-**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
-
-&#10230;
-
-<br>
-
-**31. Remark: there is no closed form solution for the case of logistic regressions.**
-
-&#10230;
-
-<br>
-
-**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
-
-&#10230;
-
-<br>
-
-**33. Generalized Linear Models**
-
-&#10230;
-
-<br>
-
-**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
-
-&#10230;
-
-<br>
-
-**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
-
-&#10230;
-
-<br>
-
-**36. Here are the most common exponential distributions summed up in the following table:**
-
-&#10230;
-
-<br>
-
-**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
-
-&#10230;
-
-<br>
-
-**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
-
-&#10230;
-
-<br>
-
-**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
-
-&#10230;
-
-<br>
-
-**40. Support Vector Machines**
-
-&#10230;
-
-<br>
-
-**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
-
-&#10230;
-
-<br>
-
-**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
-
-&#10230;
-
-<br>
-
-**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
-
-&#10230;
-
-<br>
-
-**44. such that**
-
-&#10230;
-
-<br>
-
-**45. support vectors**
-
-&#10230;
-
-<br>
-
-**46. Remark: the line is defined as wTx−b=0.**
-
-&#10230;
-
-<br>
-
-**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
-
-&#10230;
-
-<br>
-
-**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
-
-&#10230;
-
-<br>
-
-**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
-
-&#10230;
-
-<br>
-
-**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
-
-&#10230;
-
-<br>
-
-**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the coefficients βi are called the Lagrange multipliers.**
-
-&#10230;
-
-<br>
-
-**54. Generative Learning**
-
-&#10230;
-
-<br>
-
-**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
-
-&#10230;
-
-<br>
-
-**56. Gaussian Discriminant Analysis**
-
-&#10230;
-
-<br>
-
-**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
-
-&#10230;
-
-<br>
-
-**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
-
-&#10230;
-
-<br>
-
-**59. Naive Bayes**
-
-&#10230;
-
-<br>
-
-**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
-
-&#10230;
-
-<br>
-
-**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
-
-&#10230;
-
-<br>
-
-**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
-
-&#10230;
-
-<br>
-
-**63. Tree-based and ensemble methods**
-
-&#10230;
-
-<br>
-
-**64. These methods can be used for both regression and classification problems.**
-
-&#10230;
-
-<br>
-
-**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
-
-&#10230;
-
-<br>
-
-**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
-
-&#10230;
-
-<br>
-
-**67. Remark: random forests are a type of ensemble methods.**
-
-&#10230;
-
-<br>
-
-**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**69. [Adaptive boosting, Gradient boosting]**
-
-&#10230;
-
-<br>
-
-**70. High weights are put on errors to improve at the next boosting step**
-
-&#10230;
-
-<br>
-
-**71. Weak learners trained on remaining errors**
-
-&#10230;
-
-<br>
-
-**72. Other non-parametric approaches**
-
-&#10230;
-
-<br>
-
-**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
-
-&#10230;
-
-<br>
-
-**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**75. Learning Theory**
-
-&#10230;
-
-<br>
-
-**76. Union bound ― Let A1,...,Ak be k events. We have:**
-
-&#10230;
-
-<br>
-
-**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
-
-&#10230;
-
-<br>
-
-**78. Remark: this inequality is also known as the Chernoff bound.**
-
-&#10230;
-
-<br>
-
-**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
-
-&#10230;
-
-<br>
-
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
-
-&#10230;
-
-<br>
-
-**81: the training and testing sets follow the same distribution **
-
-&#10230;
-
-<br>
-
-**82. the training examples are drawn independently**
-
-&#10230;
-
-<br>
-
-**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
-
-&#10230;
-
-<br>
-
-**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
-
-&#10230;
-
-<br>
-
-**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
-
-&#10230;
-
-<br>
-
-**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**88. [Introduction, Type of prediction, Type of model]**
-
-&#10230;
-
-<br>
-
-**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
-
-&#10230;
-
-<br>
-
-**90. [Linear models, linear regression, logistic regression, generalized linear models]**
-
-&#10230;
-
-<br>
-
-**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
-
-&#10230;
-
-<br>
-
-**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
-
-&#10230;
-
-<br>
-
-**94. [Other methods, k-NN]**
-
-&#10230;
-
-<br>
-
-**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
-
-&#10230;
diff --git a/ar/cheatsheet-unsupervised-learning.md b/ar/cheatsheet-unsupervised-learning.md
deleted file mode 100644
index 1d80c47b5..000000000
--- a/ar/cheatsheet-unsupervised-learning.md
+++ /dev/null
@@ -1,340 +0,0 @@
-**1. Unsupervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Unsupervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
-
-&#10230;
-
-<br>
-
-**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
-
-&#10230;
-
-<br>
-
-**5. Clustering**
-
-&#10230;
-
-<br>
-
-**6. Expectation-Maximization**
-
-&#10230;
-
-<br>
-
-**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
-
-&#10230;
-
-<br>
-
-**8. [Setting, Latent variable z, Comments]**
-
-&#10230;
-
-<br>
-
-**9. [Mixture of k Gaussians, Factor analysis]**
-
-&#10230;
-
-<br>
-
-**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
-
-&#10230;
-
-<br>
-
-**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
-
-&#10230;
-
-<br>
-
-**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
-
-&#10230;
-
-<br>
-
-**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
-
-&#10230;
-
-<br>
-
-**14. k-means clustering**
-
-&#10230;
-
-<br>
-
-**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
-
-&#10230;
-
-<br>
-
-**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
-
-&#10230;
-
-<br>
-
-**17. [Means initialization, Cluster assignment, Means update, Convergence]**
-
-&#10230;
-
-<br>
-
-**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
-
-&#10230;
-
-<br>
-
-**19. Hierarchical clustering**
-
-&#10230;
-
-<br>
-
-**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
-
-&#10230;
-
-<br>
-
-**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**22. [Ward linkage, Average linkage, Complete linkage]**
-
-&#10230;
-
-<br>
-
-**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
-
-&#10230;
-
-<br>
-
-**24. Clustering assessment metrics**
-
-&#10230;
-
-<br>
-
-**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
-
-&#10230;
-
-<br>
-
-**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
-
-&#10230;
-
-<br>
-
-**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
-
-&#10230;
-
-<br>
-
-**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Dimension reduction**
-
-&#10230;
-
-<br>
-
-**30. Principal component analysis**
-
-&#10230;
-
-<br>
-
-**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
-
-&#10230;
-
-<br>
-
-**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**34. diagonal**
-
-&#10230;
-
-<br>
-
-**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
-
-&#10230;
-
-<br>
-
-**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
-dimensions by maximizing the variance of the data as follows:**
-
-&#10230;
-
-<br>
-
-**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
-
-&#10230;
-
-<br>
-
-**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
-
-&#10230;
-
-<br>
-
-**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
-
-&#10230;
-
-<br>
-
-**40. Step 4: Project the data on spanR(u1,...,uk).**
-
-&#10230;
-
-<br>
-
-**41. This procedure maximizes the variance among all k-dimensional spaces.**
-
-&#10230;
-
-<br>
-
-**42. [Data in feature space, Find principal components, Data in principal components space]**
-
-&#10230;
-
-<br>
-
-**43. Independent component analysis**
-
-&#10230;
-
-<br>
-
-**44. It is a technique meant to find the underlying generating sources.**
-
-&#10230;
-
-<br>
-
-**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
-
-&#10230;
-
-<br>
-
-**46. The goal is to find the unmixing matrix W=A−1.**
-
-&#10230;
-
-<br>
-
-**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
-
-&#10230;
-
-<br>
-
-**48. Write the probability of x=As=W−1s as:**
-
-&#10230;
-
-<br>
-
-**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
-
-&#10230;
-
-<br>
-
-**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
-
-&#10230;
-
-<br>
-
-**51. The Machine Learning cheatsheets are now available in Arabic.**
-
-&#10230;
-
-<br>
-
-**52. Original authors**
-
-&#10230;
-
-<br>
-
-**53. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**54. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**55. [Introduction, Motivation, Jensen's inequality]**
-
-&#10230;
-
-<br>
-
-**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
-
-&#10230;
-
-<br>
-
-**57. [Dimension reduction, PCA, ICA]**
-
-&#10230;
diff --git a/ar/refresher-probability.md b/ar/refresher-probability.md
deleted file mode 100644
index 5c9b34656..000000000
--- a/ar/refresher-probability.md
+++ /dev/null
@@ -1,381 +0,0 @@
-**1. Probabilities and Statistics refresher**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Probability and Combinatorics**
-
-&#10230;
-
-<br>
-
-**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-
-<br>
-
-**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-
-<br>
-
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-
-<br>
-
-**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-
-<br>
-
-**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-
-<br>
-
-**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-
-<br>
-
-**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-
-<br>
-
-**12. Conditional Probability**
-
-&#10230;
-
-<br>
-
-**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-
-<br>
-
-**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-
-<br>
-
-**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-
-<br>
-
-**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-
-<br>
-
-**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-
-<br>
-
-**18. Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-
-<br>
-
-**19. Random Variables**
-
-&#10230;
-
-<br>
-
-**20. Definitions**
-
-&#10230;
-
-<br>
-
-**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-
-<br>
-
-**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-
-<br>
-
-**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-
-<br>
-
-**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-
-<br>
-
-**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-
-<br>
-
-**26. [Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-
-<br>
-
-**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-
-<br>
-
-**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-
-<br>
-
-**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-
-<br>
-
-**32. Probability Distributions**
-
-&#10230;
-
-<br>
-
-**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-
-<br>
-
-**34. Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-
-<br>
-
-**35. [Type, Distribution]**
-
-&#10230;
-
-<br>
-
-**36. Jointly Distributed Random Variables**
-
-&#10230;
-
-<br>
-
-**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-
-<br>
-
-**38. [Case, Marginal density, Cumulative function]**
-
-&#10230;
-
-<br>
-
-**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**40. Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-
-<br>
-
-**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-
-<br>
-
-**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-
-<br>
-
-**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-
-<br>
-
-**44. Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-
-<br>
-
-**45. Parameter estimation**
-
-&#10230;
-
-<br>
-
-**46. Definitions**
-
-&#10230;
-
-<br>
-
-**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-
-<br>
-
-**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-
-<br>
-
-**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-
-<br>
-
-**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-
-<br>
-
-**51. Estimating the mean**
-
-&#10230;
-
-<br>
-
-**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
-
-&#10230;
-
-<br>
-
-**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-
-<br>
-
-**55. Estimating the variance**
-
-&#10230;
-
-<br>
-
-**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-
-<br>
-
-**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-
-<br>
-
-**59. [Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-
-<br>
-
-**60. [Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-
-<br>
-
-**61. [Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-
-<br>
-
-**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-
-<br>
-
-**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-
-<br>
-
-**64. [Parameter estimation, Mean, Variance]**
-
-&#10230;

From 9b27ee37d02e44c79124e1fce149393125ca9ece Mon Sep 17 00:00:00 2001
From: afshinea <afshine.amidi@centraliens.net>
Date: Sun, 30 Jun 2019 11:22:05 -0700
Subject: [PATCH 258/531] Clean up folder

---
 de/cheatsheet-deep-learning.md                | 321 ----------
 ...tsheet-machine-learning-tips-and-tricks.md | 285 ---------
 de/cheatsheet-supervised-learning.md          | 567 ------------------
 de/cheatsheet-unsupervised-learning.md        | 340 -----------
 de/refresher-linear-algebra.md                | 339 -----------
 de/refresher-probability.md                   | 381 ------------
 he/cheatsheet-deep-learning.md                | 321 ----------
 ...tsheet-machine-learning-tips-and-tricks.md | 285 ---------
 he/cheatsheet-supervised-learning.md          | 567 ------------------
 he/cheatsheet-unsupervised-learning.md        | 340 -----------
 he/refresher-linear-algebra.md                | 339 -----------
 he/refresher-probability.md                   | 381 ------------
 hi/cheatsheet-deep-learning.md                | 321 ----------
 ...tsheet-machine-learning-tips-and-tricks.md | 285 ---------
 hi/cheatsheet-supervised-learning.md          | 567 ------------------
 hi/cheatsheet-unsupervised-learning.md        | 340 -----------
 hi/refresher-linear-algebra.md                | 339 -----------
 hi/refresher-probability.md                   | 381 ------------
 ru/cheatsheet-deep-learning.md                | 321 ----------
 ...tsheet-machine-learning-tips-and-tricks.md | 285 ---------
 ru/cheatsheet-supervised-learning.md          | 567 ------------------
 ru/cheatsheet-unsupervised-learning.md        | 340 -----------
 ru/refresher-linear-algebra.md                | 339 -----------
 ru/refresher-probability.md                   | 381 ------------
 zh/cheatsheet-deep-learning.md                | 321 ----------
 ...tsheet-machine-learning-tips-and-tricks.md | 285 ---------
 zh/cheatsheet-unsupervised-learning.md        | 339 -----------
 zh/refresher-linear-algebra.md                | 339 -----------
 zh/refresher-probability.md                   | 381 ------------
 29 files changed, 10597 deletions(-)
 delete mode 100644 de/cheatsheet-deep-learning.md
 delete mode 100644 de/cheatsheet-machine-learning-tips-and-tricks.md
 delete mode 100644 de/cheatsheet-supervised-learning.md
 delete mode 100644 de/cheatsheet-unsupervised-learning.md
 delete mode 100644 de/refresher-linear-algebra.md
 delete mode 100644 de/refresher-probability.md
 delete mode 100644 he/cheatsheet-deep-learning.md
 delete mode 100644 he/cheatsheet-machine-learning-tips-and-tricks.md
 delete mode 100644 he/cheatsheet-supervised-learning.md
 delete mode 100644 he/cheatsheet-unsupervised-learning.md
 delete mode 100644 he/refresher-linear-algebra.md
 delete mode 100644 he/refresher-probability.md
 delete mode 100644 hi/cheatsheet-deep-learning.md
 delete mode 100644 hi/cheatsheet-machine-learning-tips-and-tricks.md
 delete mode 100644 hi/cheatsheet-supervised-learning.md
 delete mode 100644 hi/cheatsheet-unsupervised-learning.md
 delete mode 100644 hi/refresher-linear-algebra.md
 delete mode 100644 hi/refresher-probability.md
 delete mode 100644 ru/cheatsheet-deep-learning.md
 delete mode 100644 ru/cheatsheet-machine-learning-tips-and-tricks.md
 delete mode 100644 ru/cheatsheet-supervised-learning.md
 delete mode 100644 ru/cheatsheet-unsupervised-learning.md
 delete mode 100644 ru/refresher-linear-algebra.md
 delete mode 100644 ru/refresher-probability.md
 delete mode 100644 zh/cheatsheet-deep-learning.md
 delete mode 100644 zh/cheatsheet-machine-learning-tips-and-tricks.md
 delete mode 100644 zh/cheatsheet-unsupervised-learning.md
 delete mode 100644 zh/refresher-linear-algebra.md
 delete mode 100644 zh/refresher-probability.md

diff --git a/de/cheatsheet-deep-learning.md b/de/cheatsheet-deep-learning.md
deleted file mode 100644
index a5aa3756c..000000000
--- a/de/cheatsheet-deep-learning.md
+++ /dev/null
@@ -1,321 +0,0 @@
-**1. Deep Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Neural Networks**
-
-&#10230;
-
-<br>
-
-**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
-
-&#10230;
-
-<br>
-
-**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
-
-&#10230;
-
-<br>
-
-**5. [Input layer, hidden layer, output layer]**
-
-&#10230;
-
-<br>
-
-**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
-
-&#10230;
-
-<br>
-
-**7. where we note w, b, z the weight, bias and output respectively.**
-
-&#10230;
-
-<br>
-
-**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
-
-&#10230;
-
-<br>
-
-**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
-
-&#10230;
-
-<br>
-
-**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
-
-&#10230;
-
-<br>
-
-**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
-
-&#10230;
-
-<br>
-
-**13. As a result, the weight is updated as follows:**
-
-&#10230;
-
-<br>
-
-**14. Updating weights ― In a neural network, weights are updated as follows:**
-
-&#10230;
-
-<br>
-
-**15. Step 1: Take a batch of training data.**
-
-&#10230;
-
-<br>
-
-**16. Step 2: Perform forward propagation to obtain the corresponding loss.**
-
-&#10230;
-
-<br>
-
-**17. Step 3: Backpropagate the loss to get the gradients.**
-
-&#10230;
-
-<br>
-
-**18. Step 4: Use the gradients to update the weights of the network.**
-
-&#10230;
-
-<br>
-
-**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
-
-&#10230;
-
-<br>
-
-**20. Convolutional Neural Networks**
-
-&#10230;
-
-<br>
-
-**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
-
-&#10230;
-
-<br>
-
-**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
-
-&#10230;
-
-<br>
-
-**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
-
-&#10230;
-
-<br>
-
-**24. Recurrent Neural Networks**
-
-&#10230;
-
-<br>
-
-**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
-
-&#10230;
-
-<br>
-
-**26. [Input gate, forget gate, gate, output gate]**
-
-&#10230;
-
-<br>
-
-**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
-
-&#10230;
-
-<br>
-
-**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
-
-&#10230;
-
-<br>
-
-**29. Reinforcement Learning and Control**
-
-&#10230;
-
-<br>
-
-**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
-
-&#10230;
-
-<br>
-
-**33. S is the set of states**
-
-&#10230;
-
-<br>
-
-**34. A is the set of actions**
-
-&#10230;
-
-<br>
-
-**35. {Psa} are the state transition probabilities for s∈S and a∈A**
-
-&#10230;
-
-<br>
-
-**36. γ∈[0,1[ is the discount factor**
-
-&#10230;
-
-<br>
-
-**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
-
-&#10230;
-
-<br>
-
-**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
-
-&#10230;
-
-<br>
-
-**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
-
-&#10230;
-
-<br>
-
-**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
-
-&#10230;
-
-<br>
-
-**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
-
-&#10230;
-
-<br>
-
-**42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
-
-&#10230;
-
-<br>
-
-**43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
-
-&#10230;
-
-<br>
-
-**44. 1) We initialize the value:**
-
-&#10230;
-
-<br>
-
-**45. 2) We iterate the value based on the values before:**
-
-&#10230;
-
-<br>
-
-**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
-
-&#10230;
-
-<br>
-
-**47. times took action a in state s and got to s′**
-
-&#10230;
-
-<br>
-
-**48. times took action a in state s**
-
-&#10230;
-
-<br>
-
-**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
-
-&#10230;
-
-<br>
-
-**50. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-**51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
-
-&#10230;
-
-<br>
-
-**52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
-
-&#10230;
-
-<br>
-
-**53. [Recurrent Neural Networks, Gates, LSTM]**
-
-&#10230;
-
-<br>
-
-**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
-
-&#10230;
diff --git a/de/cheatsheet-machine-learning-tips-and-tricks.md b/de/cheatsheet-machine-learning-tips-and-tricks.md
deleted file mode 100644
index 9712297b8..000000000
--- a/de/cheatsheet-machine-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,285 +0,0 @@
-**1. Machine Learning tips and tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Classification metrics**
-
-&#10230;
-
-<br>
-
-**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
-
-&#10230;
-
-<br>
-
-**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**5. [Predicted class, Actual class]**
-
-&#10230;
-
-<br>
-
-**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
-
-&#10230;
-
-<br>
-
-**7. [Metric, Formula, Interpretation]**
-
-&#10230;
-
-<br>
-
-**8. Overall performance of model**
-
-&#10230;
-
-<br>
-
-**9. How accurate the positive predictions are**
-
-&#10230;
-
-<br>
-
-**10. Coverage of actual positive sample**
-
-&#10230;
-
-<br>
-
-**11. Coverage of actual negative sample**
-
-&#10230;
-
-<br>
-
-**12. Hybrid metric useful for unbalanced classes**
-
-&#10230;
-
-<br>
-
-**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**14. [Metric, Formula, Equivalent]**
-
-&#10230;
-
-<br>
-
-**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
-
-&#10230;
-
-<br>
-
-**16. [Actual, Predicted]**
-
-&#10230;
-
-<br>
-
-**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
-
-&#10230;
-
-<br>
-
-**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
-
-&#10230;
-
-<br>
-
-**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
-
-&#10230;
-
-<br>
-
-**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
-
-&#10230;
-
-<br>
-
-**22. Model selection**
-
-&#10230;
-
-<br>
-
-**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
-
-&#10230;
-
-<br>
-
-**24. [Training set, Validation set, Testing set]**
-
-&#10230;
-
-<br>
-
-**25. [Model is trained, Model is assessed, Model gives predictions]**
-
-&#10230;
-
-<br>
-
-**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
-
-&#10230;
-
-<br>
-
-**27. [Also called hold-out or development set, Unseen data]**
-
-&#10230;
-
-<br>
-
-**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
-
-&#10230;
-
-<br>
-
-**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
-
-&#10230;
-
-<br>
-
-**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
-
-&#10230;
-
-<br>
-
-**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
-
-&#10230;
-
-<br>
-
-**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
-
-&#10230;
-
-<br>
-
-**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-**35. Diagnostics**
-
-&#10230;
-
-<br>
-
-**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
-
-&#10230;
-
-<br>
-
-**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
-
-&#10230;
-
-<br>
-
-**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
-
-&#10230;
-
-<br>
-
-**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
-
-&#10230;
-
-<br>
-
-**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
-
-&#10230;
-
-<br>
-
-**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
-
-&#10230;
-
-<br>
-
-**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
-
-&#10230;
-
-<br>
-
-**44. Regression metrics**
-
-&#10230;
-
-<br>
-
-**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-**47. [Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/de/cheatsheet-supervised-learning.md b/de/cheatsheet-supervised-learning.md
deleted file mode 100644
index a6b19ea1c..000000000
--- a/de/cheatsheet-supervised-learning.md
+++ /dev/null
@@ -1,567 +0,0 @@
-**1. Supervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Supervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
-
-&#10230;
-
-<br>
-
-**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**5. [Regression, Classifier, Outcome, Examples]**
-
-&#10230;
-
-<br>
-
-**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**7. Type of model ― The different models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
-
-&#10230;
-
-<br>
-
-**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**10. Notations and general concepts**
-
-&#10230;
-
-<br>
-
-**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
-
-&#10230;
-
-<br>
-
-**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
-
-&#10230;
-
-<br>
-
-**14. [Linear regression, Logistic regression, SVM, Neural Network]**
-
-&#10230;
-
-<br>
-
-**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
-
-&#10230;
-
-<br>
-
-**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
-
-&#10230;
-
-<br>
-
-**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
-
-&#10230;
-
-<br>
-
-**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
-
-&#10230;
-
-<br>
-
-**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
-
-&#10230;
-
-<br>
-
-**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
-
-&#10230;
-
-<br>
-
-**21. Linear models**
-
-&#10230;
-
-<br>
-
-**22. Linear regression**
-
-&#10230;
-
-<br>
-
-**23. We assume here that y|x;θ∼N(μ,σ2)**
-
-&#10230;
-
-<br>
-
-**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
-
-&#10230;
-
-<br>
-
-**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
-
-&#10230;
-
-<br>
-
-**26. Remark: the update rule is a particular case of the gradient ascent.**
-
-&#10230;
-
-<br>
-
-**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
-
-&#10230;
-
-<br>
-
-**28. Classification and logistic regression**
-
-&#10230;
-
-<br>
-
-**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
-
-&#10230;
-
-<br>
-
-**31. Remark: there is no closed form solution for the case of logistic regressions.**
-
-&#10230;
-
-<br>
-
-**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
-
-&#10230;
-
-<br>
-
-**33. Generalized Linear Models**
-
-&#10230;
-
-<br>
-
-**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
-
-&#10230;
-
-<br>
-
-**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
-
-&#10230;
-
-<br>
-
-**36. Here are the most common exponential distributions summed up in the following table:**
-
-&#10230;
-
-<br>
-
-**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
-
-&#10230;
-
-<br>
-
-**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
-
-&#10230;
-
-<br>
-
-**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
-
-&#10230;
-
-<br>
-
-**40. Support Vector Machines**
-
-&#10230;
-
-<br>
-
-**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
-
-&#10230;
-
-<br>
-
-**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
-
-&#10230;
-
-<br>
-
-**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
-
-&#10230;
-
-<br>
-
-**44. such that**
-
-&#10230;
-
-<br>
-
-**45. support vectors**
-
-&#10230;
-
-<br>
-
-**46. Remark: the line is defined as wTx−b=0.**
-
-&#10230;
-
-<br>
-
-**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
-
-&#10230;
-
-<br>
-
-**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
-
-&#10230;
-
-<br>
-
-**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
-
-&#10230;
-
-<br>
-
-**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
-
-&#10230;
-
-<br>
-
-**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the coefficients βi are called the Lagrange multipliers.**
-
-&#10230;
-
-<br>
-
-**54. Generative Learning**
-
-&#10230;
-
-<br>
-
-**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
-
-&#10230;
-
-<br>
-
-**56. Gaussian Discriminant Analysis**
-
-&#10230;
-
-<br>
-
-**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
-
-&#10230;
-
-<br>
-
-**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
-
-&#10230;
-
-<br>
-
-**59. Naive Bayes**
-
-&#10230;
-
-<br>
-
-**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
-
-&#10230;
-
-<br>
-
-**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
-
-&#10230;
-
-<br>
-
-**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
-
-&#10230;
-
-<br>
-
-**63. Tree-based and ensemble methods**
-
-&#10230;
-
-<br>
-
-**64. These methods can be used for both regression and classification problems.**
-
-&#10230;
-
-<br>
-
-**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
-
-&#10230;
-
-<br>
-
-**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
-
-&#10230;
-
-<br>
-
-**67. Remark: random forests are a type of ensemble methods.**
-
-&#10230;
-
-<br>
-
-**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**69. [Adaptive boosting, Gradient boosting]**
-
-&#10230;
-
-<br>
-
-**70. High weights are put on errors to improve at the next boosting step**
-
-&#10230;
-
-<br>
-
-**71. Weak learners trained on remaining errors**
-
-&#10230;
-
-<br>
-
-**72. Other non-parametric approaches**
-
-&#10230;
-
-<br>
-
-**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
-
-&#10230;
-
-<br>
-
-**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**75. Learning Theory**
-
-&#10230;
-
-<br>
-
-**76. Union bound ― Let A1,...,Ak be k events. We have:**
-
-&#10230;
-
-<br>
-
-**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
-
-&#10230;
-
-<br>
-
-**78. Remark: this inequality is also known as the Chernoff bound.**
-
-&#10230;
-
-<br>
-
-**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
-
-&#10230;
-
-<br>
-
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
-
-&#10230;
-
-<br>
-
-**81: the training and testing sets follow the same distribution **
-
-&#10230;
-
-<br>
-
-**82. the training examples are drawn independently**
-
-&#10230;
-
-<br>
-
-**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
-
-&#10230;
-
-<br>
-
-**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
-
-&#10230;
-
-<br>
-
-**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
-
-&#10230;
-
-<br>
-
-**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**88. [Introduction, Type of prediction, Type of model]**
-
-&#10230;
-
-<br>
-
-**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
-
-&#10230;
-
-<br>
-
-**90. [Linear models, linear regression, logistic regression, generalized linear models]**
-
-&#10230;
-
-<br>
-
-**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
-
-&#10230;
-
-<br>
-
-**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
-
-&#10230;
-
-<br>
-
-**94. [Other methods, k-NN]**
-
-&#10230;
-
-<br>
-
-**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
-
-&#10230;
diff --git a/de/cheatsheet-unsupervised-learning.md b/de/cheatsheet-unsupervised-learning.md
deleted file mode 100644
index 1bf117d72..000000000
--- a/de/cheatsheet-unsupervised-learning.md
+++ /dev/null
@@ -1,340 +0,0 @@
-**1. Unsupervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Unsupervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
-
-&#10230;
-
-<br>
-
-**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
-
-&#10230;
-
-<br>
-
-**5. Clustering**
-
-&#10230;
-
-<br>
-
-**6. Expectation-Maximization**
-
-&#10230;
-
-<br>
-
-**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
-
-&#10230;
-
-<br>
-
-**8. [Setting, Latent variable z, Comments]**
-
-&#10230;
-
-<br>
-
-**9. [Mixture of k Gaussians, Factor analysis]**
-
-&#10230;
-
-<br>
-
-**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
-
-&#10230;
-
-<br>
-
-**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
-
-&#10230;
-
-<br>
-
-**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
-
-&#10230;
-
-<br>
-
-**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
-
-&#10230;
-
-<br>
-
-**14. k-means clustering**
-
-&#10230;
-
-<br>
-
-**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
-
-&#10230;
-
-<br>
-
-**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
-
-&#10230;
-
-<br>
-
-**17. [Means initialization, Cluster assignment, Means update, Convergence]**
-
-&#10230;
-
-<br>
-
-**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
-
-&#10230;
-
-<br>
-
-**19. Hierarchical clustering**
-
-&#10230;
-
-<br>
-
-**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
-
-&#10230;
-
-<br>
-
-**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**22. [Ward linkage, Average linkage, Complete linkage]**
-
-&#10230;
-
-<br>
-
-**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
-
-&#10230;
-
-<br>
-
-**24. Clustering assessment metrics**
-
-&#10230;
-
-<br>
-
-**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
-
-&#10230;
-
-<br>
-
-**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
-
-&#10230;
-
-<br>
-
-**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
-
-&#10230;
-
-<br>
-
-**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Dimension reduction**
-
-&#10230;
-
-<br>
-
-**30. Principal component analysis**
-
-&#10230;
-
-<br>
-
-**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
-
-&#10230;
-
-<br>
-
-**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**34. diagonal**
-
-&#10230;
-
-<br>
-
-**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
-
-&#10230;
-
-<br>
-
-**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
-dimensions by maximizing the variance of the data as follows:**
-
-&#10230;
-
-<br>
-
-**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
-
-&#10230;
-
-<br>
-
-**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
-
-&#10230;
-
-<br>
-
-**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
-
-&#10230;
-
-<br>
-
-**40. Step 4: Project the data on spanR(u1,...,uk).**
-
-&#10230;
-
-<br>
-
-**41. This procedure maximizes the variance among all k-dimensional spaces.**
-
-&#10230;
-
-<br>
-
-**42. [Data in feature space, Find principal components, Data in principal components space]**
-
-&#10230;
-
-<br>
-
-**43. Independent component analysis**
-
-&#10230;
-
-<br>
-
-**44. It is a technique meant to find the underlying generating sources.**
-
-&#10230;
-
-<br>
-
-**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
-
-&#10230;
-
-<br>
-
-**46. The goal is to find the unmixing matrix W=A−1.**
-
-&#10230;
-
-<br>
-
-**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
-
-&#10230;
-
-<br>
-
-**48. Write the probability of x=As=W−1s as:**
-
-&#10230;
-
-<br>
-
-**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
-
-&#10230;
-
-<br>
-
-**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
-
-&#10230;
-
-<br>
-
-**51. The Machine Learning cheatsheets are now available in German.**
-
-&#10230;
-
-<br>
-
-**52. Original authors**
-
-&#10230;
-
-<br>
-
-**53. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**54. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**55. [Introduction, Motivation, Jensen's inequality]**
-
-&#10230;
-
-<br>
-
-**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
-
-&#10230;
-
-<br>
-
-**57. [Dimension reduction, PCA, ICA]**
-
-&#10230;
diff --git a/de/refresher-linear-algebra.md b/de/refresher-linear-algebra.md
deleted file mode 100644
index a6b440d1e..000000000
--- a/de/refresher-linear-algebra.md
+++ /dev/null
@@ -1,339 +0,0 @@
-**1. Linear Algebra and Calculus refresher**
-
-&#10230;
-
-<br>
-
-**2. General notations**
-
-&#10230;
-
-<br>
-
-**3. Definitions**
-
-&#10230;
-
-<br>
-
-**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
-
-&#10230;
-
-<br>
-
-**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
-
-&#10230;
-
-<br>
-
-**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
-
-&#10230;
-
-<br>
-
-**7. Main matrices**
-
-&#10230;
-
-<br>
-
-**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
-
-&#10230;
-
-<br>
-
-**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we also note D as diag(d1,...,dn).**
-
-&#10230;
-
-<br>
-
-**12. Matrix operations**
-
-&#10230;
-
-<br>
-
-**13. Multiplication**
-
-&#10230;
-
-<br>
-
-**14. Vector-vector ― There are two types of vector-vector products:**
-
-&#10230;
-
-<br>
-
-**15. inner product: for x,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**16. outer product: for x∈Rm,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
-
-&#10230;
-
-<br>
-
-**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
-
-&#10230;
-
-<br>
-
-**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
-
-&#10230;
-
-<br>
-
-**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
-
-&#10230;
-
-<br>
-
-**21. Other operations**
-
-&#10230;
-
-<br>
-
-**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
-
-&#10230;
-
-<br>
-
-**23. Remark: for matrices A,B, we have (AB)T=BTAT**
-
-&#10230;
-
-<br>
-
-**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
-
-&#10230;
-
-<br>
-
-**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
-
-&#10230;
-
-<br>
-
-**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
-
-&#10230;
-
-<br>
-
-**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
-
-&#10230;
-
-<br>
-
-**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
-
-&#10230;
-
-<br>
-
-**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
-
-&#10230;
-
-<br>
-
-**30. Matrix properties**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
-
-&#10230;
-
-<br>
-
-**33. [Symmetric, Antisymmetric]**
-
-&#10230;
-
-<br>
-
-**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
-
-&#10230;
-
-<br>
-
-**35. N(ax)=|a|N(x) for a scalar**
-
-&#10230;
-
-<br>
-
-**36. if N(x)=0, then x=0**
-
-&#10230;
-
-<br>
-
-**37. For x∈V, the most commonly used norms are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**38. [Norm, Notation, Definition, Use case]**
-
-&#10230;
-
-<br>
-
-**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
-
-&#10230;
-
-<br>
-
-**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
-
-&#10230;
-
-<br>
-
-**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
-
-&#10230;
-
-<br>
-
-**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
-
-&#10230;
-
-<br>
-
-**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
-
-&#10230;
-
-<br>
-
-**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**46. diagonal**
-
-&#10230;
-
-<br>
-
-**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
-
-&#10230;
-
-<br>
-
-**48. Matrix calculus**
-
-&#10230;
-
-<br>
-
-**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
-
-&#10230;
-
-<br>
-
-**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
-
-&#10230;
-
-<br>
-
-**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
-
-&#10230;
-
-<br>
-
-**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
-
-&#10230;
-
-<br>
-
-**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
-
-&#10230;
-
-<br>
-
-**54. [General notations, Definitions, Main matrices]**
-
-&#10230;
-
-<br>
-
-**55. [Matrix operations, Multiplication, Other operations]**
-
-&#10230;
-
-<br>
-
-**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
-
-&#10230;
-
-<br>
-
-**57. [Matrix calculus, Gradient, Hessian, Operations]**
-
-&#10230;
diff --git a/de/refresher-probability.md b/de/refresher-probability.md
deleted file mode 100644
index 5c9b34656..000000000
--- a/de/refresher-probability.md
+++ /dev/null
@@ -1,381 +0,0 @@
-**1. Probabilities and Statistics refresher**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Probability and Combinatorics**
-
-&#10230;
-
-<br>
-
-**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-
-<br>
-
-**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-
-<br>
-
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-
-<br>
-
-**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-
-<br>
-
-**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-
-<br>
-
-**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-
-<br>
-
-**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-
-<br>
-
-**12. Conditional Probability**
-
-&#10230;
-
-<br>
-
-**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-
-<br>
-
-**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-
-<br>
-
-**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-
-<br>
-
-**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-
-<br>
-
-**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-
-<br>
-
-**18. Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-
-<br>
-
-**19. Random Variables**
-
-&#10230;
-
-<br>
-
-**20. Definitions**
-
-&#10230;
-
-<br>
-
-**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-
-<br>
-
-**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-
-<br>
-
-**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-
-<br>
-
-**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-
-<br>
-
-**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-
-<br>
-
-**26. [Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-
-<br>
-
-**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-
-<br>
-
-**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-
-<br>
-
-**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-
-<br>
-
-**32. Probability Distributions**
-
-&#10230;
-
-<br>
-
-**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-
-<br>
-
-**34. Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-
-<br>
-
-**35. [Type, Distribution]**
-
-&#10230;
-
-<br>
-
-**36. Jointly Distributed Random Variables**
-
-&#10230;
-
-<br>
-
-**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-
-<br>
-
-**38. [Case, Marginal density, Cumulative function]**
-
-&#10230;
-
-<br>
-
-**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**40. Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-
-<br>
-
-**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-
-<br>
-
-**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-
-<br>
-
-**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-
-<br>
-
-**44. Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-
-<br>
-
-**45. Parameter estimation**
-
-&#10230;
-
-<br>
-
-**46. Definitions**
-
-&#10230;
-
-<br>
-
-**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-
-<br>
-
-**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-
-<br>
-
-**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-
-<br>
-
-**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-
-<br>
-
-**51. Estimating the mean**
-
-&#10230;
-
-<br>
-
-**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
-
-&#10230;
-
-<br>
-
-**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-
-<br>
-
-**55. Estimating the variance**
-
-&#10230;
-
-<br>
-
-**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-
-<br>
-
-**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-
-<br>
-
-**59. [Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-
-<br>
-
-**60. [Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-
-<br>
-
-**61. [Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-
-<br>
-
-**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-
-<br>
-
-**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-
-<br>
-
-**64. [Parameter estimation, Mean, Variance]**
-
-&#10230;
diff --git a/he/cheatsheet-deep-learning.md b/he/cheatsheet-deep-learning.md
deleted file mode 100644
index a5aa3756c..000000000
--- a/he/cheatsheet-deep-learning.md
+++ /dev/null
@@ -1,321 +0,0 @@
-**1. Deep Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Neural Networks**
-
-&#10230;
-
-<br>
-
-**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
-
-&#10230;
-
-<br>
-
-**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
-
-&#10230;
-
-<br>
-
-**5. [Input layer, hidden layer, output layer]**
-
-&#10230;
-
-<br>
-
-**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
-
-&#10230;
-
-<br>
-
-**7. where we note w, b, z the weight, bias and output respectively.**
-
-&#10230;
-
-<br>
-
-**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
-
-&#10230;
-
-<br>
-
-**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
-
-&#10230;
-
-<br>
-
-**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
-
-&#10230;
-
-<br>
-
-**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
-
-&#10230;
-
-<br>
-
-**13. As a result, the weight is updated as follows:**
-
-&#10230;
-
-<br>
-
-**14. Updating weights ― In a neural network, weights are updated as follows:**
-
-&#10230;
-
-<br>
-
-**15. Step 1: Take a batch of training data.**
-
-&#10230;
-
-<br>
-
-**16. Step 2: Perform forward propagation to obtain the corresponding loss.**
-
-&#10230;
-
-<br>
-
-**17. Step 3: Backpropagate the loss to get the gradients.**
-
-&#10230;
-
-<br>
-
-**18. Step 4: Use the gradients to update the weights of the network.**
-
-&#10230;
-
-<br>
-
-**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
-
-&#10230;
-
-<br>
-
-**20. Convolutional Neural Networks**
-
-&#10230;
-
-<br>
-
-**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
-
-&#10230;
-
-<br>
-
-**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
-
-&#10230;
-
-<br>
-
-**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
-
-&#10230;
-
-<br>
-
-**24. Recurrent Neural Networks**
-
-&#10230;
-
-<br>
-
-**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
-
-&#10230;
-
-<br>
-
-**26. [Input gate, forget gate, gate, output gate]**
-
-&#10230;
-
-<br>
-
-**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
-
-&#10230;
-
-<br>
-
-**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
-
-&#10230;
-
-<br>
-
-**29. Reinforcement Learning and Control**
-
-&#10230;
-
-<br>
-
-**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
-
-&#10230;
-
-<br>
-
-**33. S is the set of states**
-
-&#10230;
-
-<br>
-
-**34. A is the set of actions**
-
-&#10230;
-
-<br>
-
-**35. {Psa} are the state transition probabilities for s∈S and a∈A**
-
-&#10230;
-
-<br>
-
-**36. γ∈[0,1[ is the discount factor**
-
-&#10230;
-
-<br>
-
-**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
-
-&#10230;
-
-<br>
-
-**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
-
-&#10230;
-
-<br>
-
-**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
-
-&#10230;
-
-<br>
-
-**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
-
-&#10230;
-
-<br>
-
-**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
-
-&#10230;
-
-<br>
-
-**42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
-
-&#10230;
-
-<br>
-
-**43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
-
-&#10230;
-
-<br>
-
-**44. 1) We initialize the value:**
-
-&#10230;
-
-<br>
-
-**45. 2) We iterate the value based on the values before:**
-
-&#10230;
-
-<br>
-
-**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
-
-&#10230;
-
-<br>
-
-**47. times took action a in state s and got to s′**
-
-&#10230;
-
-<br>
-
-**48. times took action a in state s**
-
-&#10230;
-
-<br>
-
-**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
-
-&#10230;
-
-<br>
-
-**50. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-**51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
-
-&#10230;
-
-<br>
-
-**52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
-
-&#10230;
-
-<br>
-
-**53. [Recurrent Neural Networks, Gates, LSTM]**
-
-&#10230;
-
-<br>
-
-**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
-
-&#10230;
diff --git a/he/cheatsheet-machine-learning-tips-and-tricks.md b/he/cheatsheet-machine-learning-tips-and-tricks.md
deleted file mode 100644
index 9712297b8..000000000
--- a/he/cheatsheet-machine-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,285 +0,0 @@
-**1. Machine Learning tips and tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Classification metrics**
-
-&#10230;
-
-<br>
-
-**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
-
-&#10230;
-
-<br>
-
-**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**5. [Predicted class, Actual class]**
-
-&#10230;
-
-<br>
-
-**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
-
-&#10230;
-
-<br>
-
-**7. [Metric, Formula, Interpretation]**
-
-&#10230;
-
-<br>
-
-**8. Overall performance of model**
-
-&#10230;
-
-<br>
-
-**9. How accurate the positive predictions are**
-
-&#10230;
-
-<br>
-
-**10. Coverage of actual positive sample**
-
-&#10230;
-
-<br>
-
-**11. Coverage of actual negative sample**
-
-&#10230;
-
-<br>
-
-**12. Hybrid metric useful for unbalanced classes**
-
-&#10230;
-
-<br>
-
-**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**14. [Metric, Formula, Equivalent]**
-
-&#10230;
-
-<br>
-
-**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
-
-&#10230;
-
-<br>
-
-**16. [Actual, Predicted]**
-
-&#10230;
-
-<br>
-
-**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
-
-&#10230;
-
-<br>
-
-**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
-
-&#10230;
-
-<br>
-
-**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
-
-&#10230;
-
-<br>
-
-**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
-
-&#10230;
-
-<br>
-
-**22. Model selection**
-
-&#10230;
-
-<br>
-
-**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
-
-&#10230;
-
-<br>
-
-**24. [Training set, Validation set, Testing set]**
-
-&#10230;
-
-<br>
-
-**25. [Model is trained, Model is assessed, Model gives predictions]**
-
-&#10230;
-
-<br>
-
-**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
-
-&#10230;
-
-<br>
-
-**27. [Also called hold-out or development set, Unseen data]**
-
-&#10230;
-
-<br>
-
-**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
-
-&#10230;
-
-<br>
-
-**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
-
-&#10230;
-
-<br>
-
-**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
-
-&#10230;
-
-<br>
-
-**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
-
-&#10230;
-
-<br>
-
-**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
-
-&#10230;
-
-<br>
-
-**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-**35. Diagnostics**
-
-&#10230;
-
-<br>
-
-**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
-
-&#10230;
-
-<br>
-
-**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
-
-&#10230;
-
-<br>
-
-**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
-
-&#10230;
-
-<br>
-
-**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
-
-&#10230;
-
-<br>
-
-**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
-
-&#10230;
-
-<br>
-
-**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
-
-&#10230;
-
-<br>
-
-**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
-
-&#10230;
-
-<br>
-
-**44. Regression metrics**
-
-&#10230;
-
-<br>
-
-**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-**47. [Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/he/cheatsheet-supervised-learning.md b/he/cheatsheet-supervised-learning.md
deleted file mode 100644
index a6b19ea1c..000000000
--- a/he/cheatsheet-supervised-learning.md
+++ /dev/null
@@ -1,567 +0,0 @@
-**1. Supervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Supervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
-
-&#10230;
-
-<br>
-
-**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**5. [Regression, Classifier, Outcome, Examples]**
-
-&#10230;
-
-<br>
-
-**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**7. Type of model ― The different models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
-
-&#10230;
-
-<br>
-
-**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**10. Notations and general concepts**
-
-&#10230;
-
-<br>
-
-**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
-
-&#10230;
-
-<br>
-
-**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
-
-&#10230;
-
-<br>
-
-**14. [Linear regression, Logistic regression, SVM, Neural Network]**
-
-&#10230;
-
-<br>
-
-**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
-
-&#10230;
-
-<br>
-
-**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
-
-&#10230;
-
-<br>
-
-**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
-
-&#10230;
-
-<br>
-
-**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
-
-&#10230;
-
-<br>
-
-**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
-
-&#10230;
-
-<br>
-
-**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
-
-&#10230;
-
-<br>
-
-**21. Linear models**
-
-&#10230;
-
-<br>
-
-**22. Linear regression**
-
-&#10230;
-
-<br>
-
-**23. We assume here that y|x;θ∼N(μ,σ2)**
-
-&#10230;
-
-<br>
-
-**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
-
-&#10230;
-
-<br>
-
-**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
-
-&#10230;
-
-<br>
-
-**26. Remark: the update rule is a particular case of the gradient ascent.**
-
-&#10230;
-
-<br>
-
-**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
-
-&#10230;
-
-<br>
-
-**28. Classification and logistic regression**
-
-&#10230;
-
-<br>
-
-**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
-
-&#10230;
-
-<br>
-
-**31. Remark: there is no closed form solution for the case of logistic regressions.**
-
-&#10230;
-
-<br>
-
-**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
-
-&#10230;
-
-<br>
-
-**33. Generalized Linear Models**
-
-&#10230;
-
-<br>
-
-**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
-
-&#10230;
-
-<br>
-
-**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
-
-&#10230;
-
-<br>
-
-**36. Here are the most common exponential distributions summed up in the following table:**
-
-&#10230;
-
-<br>
-
-**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
-
-&#10230;
-
-<br>
-
-**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
-
-&#10230;
-
-<br>
-
-**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
-
-&#10230;
-
-<br>
-
-**40. Support Vector Machines**
-
-&#10230;
-
-<br>
-
-**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
-
-&#10230;
-
-<br>
-
-**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
-
-&#10230;
-
-<br>
-
-**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
-
-&#10230;
-
-<br>
-
-**44. such that**
-
-&#10230;
-
-<br>
-
-**45. support vectors**
-
-&#10230;
-
-<br>
-
-**46. Remark: the line is defined as wTx−b=0.**
-
-&#10230;
-
-<br>
-
-**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
-
-&#10230;
-
-<br>
-
-**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
-
-&#10230;
-
-<br>
-
-**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
-
-&#10230;
-
-<br>
-
-**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
-
-&#10230;
-
-<br>
-
-**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the coefficients βi are called the Lagrange multipliers.**
-
-&#10230;
-
-<br>
-
-**54. Generative Learning**
-
-&#10230;
-
-<br>
-
-**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
-
-&#10230;
-
-<br>
-
-**56. Gaussian Discriminant Analysis**
-
-&#10230;
-
-<br>
-
-**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
-
-&#10230;
-
-<br>
-
-**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
-
-&#10230;
-
-<br>
-
-**59. Naive Bayes**
-
-&#10230;
-
-<br>
-
-**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
-
-&#10230;
-
-<br>
-
-**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
-
-&#10230;
-
-<br>
-
-**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
-
-&#10230;
-
-<br>
-
-**63. Tree-based and ensemble methods**
-
-&#10230;
-
-<br>
-
-**64. These methods can be used for both regression and classification problems.**
-
-&#10230;
-
-<br>
-
-**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
-
-&#10230;
-
-<br>
-
-**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
-
-&#10230;
-
-<br>
-
-**67. Remark: random forests are a type of ensemble methods.**
-
-&#10230;
-
-<br>
-
-**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**69. [Adaptive boosting, Gradient boosting]**
-
-&#10230;
-
-<br>
-
-**70. High weights are put on errors to improve at the next boosting step**
-
-&#10230;
-
-<br>
-
-**71. Weak learners trained on remaining errors**
-
-&#10230;
-
-<br>
-
-**72. Other non-parametric approaches**
-
-&#10230;
-
-<br>
-
-**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
-
-&#10230;
-
-<br>
-
-**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**75. Learning Theory**
-
-&#10230;
-
-<br>
-
-**76. Union bound ― Let A1,...,Ak be k events. We have:**
-
-&#10230;
-
-<br>
-
-**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
-
-&#10230;
-
-<br>
-
-**78. Remark: this inequality is also known as the Chernoff bound.**
-
-&#10230;
-
-<br>
-
-**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
-
-&#10230;
-
-<br>
-
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
-
-&#10230;
-
-<br>
-
-**81: the training and testing sets follow the same distribution **
-
-&#10230;
-
-<br>
-
-**82. the training examples are drawn independently**
-
-&#10230;
-
-<br>
-
-**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
-
-&#10230;
-
-<br>
-
-**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
-
-&#10230;
-
-<br>
-
-**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
-
-&#10230;
-
-<br>
-
-**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**88. [Introduction, Type of prediction, Type of model]**
-
-&#10230;
-
-<br>
-
-**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
-
-&#10230;
-
-<br>
-
-**90. [Linear models, linear regression, logistic regression, generalized linear models]**
-
-&#10230;
-
-<br>
-
-**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
-
-&#10230;
-
-<br>
-
-**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
-
-&#10230;
-
-<br>
-
-**94. [Other methods, k-NN]**
-
-&#10230;
-
-<br>
-
-**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
-
-&#10230;
diff --git a/he/cheatsheet-unsupervised-learning.md b/he/cheatsheet-unsupervised-learning.md
deleted file mode 100644
index 40724eb28..000000000
--- a/he/cheatsheet-unsupervised-learning.md
+++ /dev/null
@@ -1,340 +0,0 @@
-**1. Unsupervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Unsupervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
-
-&#10230;
-
-<br>
-
-**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
-
-&#10230;
-
-<br>
-
-**5. Clustering**
-
-&#10230;
-
-<br>
-
-**6. Expectation-Maximization**
-
-&#10230;
-
-<br>
-
-**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
-
-&#10230;
-
-<br>
-
-**8. [Setting, Latent variable z, Comments]**
-
-&#10230;
-
-<br>
-
-**9. [Mixture of k Gaussians, Factor analysis]**
-
-&#10230;
-
-<br>
-
-**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
-
-&#10230;
-
-<br>
-
-**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
-
-&#10230;
-
-<br>
-
-**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
-
-&#10230;
-
-<br>
-
-**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
-
-&#10230;
-
-<br>
-
-**14. k-means clustering**
-
-&#10230;
-
-<br>
-
-**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
-
-&#10230;
-
-<br>
-
-**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
-
-&#10230;
-
-<br>
-
-**17. [Means initialization, Cluster assignment, Means update, Convergence]**
-
-&#10230;
-
-<br>
-
-**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
-
-&#10230;
-
-<br>
-
-**19. Hierarchical clustering**
-
-&#10230;
-
-<br>
-
-**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
-
-&#10230;
-
-<br>
-
-**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**22. [Ward linkage, Average linkage, Complete linkage]**
-
-&#10230;
-
-<br>
-
-**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
-
-&#10230;
-
-<br>
-
-**24. Clustering assessment metrics**
-
-&#10230;
-
-<br>
-
-**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
-
-&#10230;
-
-<br>
-
-**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
-
-&#10230;
-
-<br>
-
-**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
-
-&#10230;
-
-<br>
-
-**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Dimension reduction**
-
-&#10230;
-
-<br>
-
-**30. Principal component analysis**
-
-&#10230;
-
-<br>
-
-**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
-
-&#10230;
-
-<br>
-
-**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**34. diagonal**
-
-&#10230;
-
-<br>
-
-**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
-
-&#10230;
-
-<br>
-
-**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
-dimensions by maximizing the variance of the data as follows:**
-
-&#10230;
-
-<br>
-
-**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
-
-&#10230;
-
-<br>
-
-**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
-
-&#10230;
-
-<br>
-
-**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
-
-&#10230;
-
-<br>
-
-**40. Step 4: Project the data on spanR(u1,...,uk).**
-
-&#10230;
-
-<br>
-
-**41. This procedure maximizes the variance among all k-dimensional spaces.**
-
-&#10230;
-
-<br>
-
-**42. [Data in feature space, Find principal components, Data in principal components space]**
-
-&#10230;
-
-<br>
-
-**43. Independent component analysis**
-
-&#10230;
-
-<br>
-
-**44. It is a technique meant to find the underlying generating sources.**
-
-&#10230;
-
-<br>
-
-**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
-
-&#10230;
-
-<br>
-
-**46. The goal is to find the unmixing matrix W=A−1.**
-
-&#10230;
-
-<br>
-
-**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
-
-&#10230;
-
-<br>
-
-**48. Write the probability of x=As=W−1s as:**
-
-&#10230;
-
-<br>
-
-**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
-
-&#10230;
-
-<br>
-
-**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
-
-&#10230;
-
-<br>
-
-**51. The Machine Learning cheatsheets are now available in Hebrew.**
-
-&#10230;
-
-<br>
-
-**52. Original authors**
-
-&#10230;
-
-<br>
-
-**53. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**54. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**55. [Introduction, Motivation, Jensen's inequality]**
-
-&#10230;
-
-<br>
-
-**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
-
-&#10230;
-
-<br>
-
-**57. [Dimension reduction, PCA, ICA]**
-
-&#10230;
diff --git a/he/refresher-linear-algebra.md b/he/refresher-linear-algebra.md
deleted file mode 100644
index a6b440d1e..000000000
--- a/he/refresher-linear-algebra.md
+++ /dev/null
@@ -1,339 +0,0 @@
-**1. Linear Algebra and Calculus refresher**
-
-&#10230;
-
-<br>
-
-**2. General notations**
-
-&#10230;
-
-<br>
-
-**3. Definitions**
-
-&#10230;
-
-<br>
-
-**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
-
-&#10230;
-
-<br>
-
-**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
-
-&#10230;
-
-<br>
-
-**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
-
-&#10230;
-
-<br>
-
-**7. Main matrices**
-
-&#10230;
-
-<br>
-
-**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
-
-&#10230;
-
-<br>
-
-**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we also note D as diag(d1,...,dn).**
-
-&#10230;
-
-<br>
-
-**12. Matrix operations**
-
-&#10230;
-
-<br>
-
-**13. Multiplication**
-
-&#10230;
-
-<br>
-
-**14. Vector-vector ― There are two types of vector-vector products:**
-
-&#10230;
-
-<br>
-
-**15. inner product: for x,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**16. outer product: for x∈Rm,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
-
-&#10230;
-
-<br>
-
-**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
-
-&#10230;
-
-<br>
-
-**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
-
-&#10230;
-
-<br>
-
-**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
-
-&#10230;
-
-<br>
-
-**21. Other operations**
-
-&#10230;
-
-<br>
-
-**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
-
-&#10230;
-
-<br>
-
-**23. Remark: for matrices A,B, we have (AB)T=BTAT**
-
-&#10230;
-
-<br>
-
-**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
-
-&#10230;
-
-<br>
-
-**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
-
-&#10230;
-
-<br>
-
-**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
-
-&#10230;
-
-<br>
-
-**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
-
-&#10230;
-
-<br>
-
-**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
-
-&#10230;
-
-<br>
-
-**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
-
-&#10230;
-
-<br>
-
-**30. Matrix properties**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
-
-&#10230;
-
-<br>
-
-**33. [Symmetric, Antisymmetric]**
-
-&#10230;
-
-<br>
-
-**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
-
-&#10230;
-
-<br>
-
-**35. N(ax)=|a|N(x) for a scalar**
-
-&#10230;
-
-<br>
-
-**36. if N(x)=0, then x=0**
-
-&#10230;
-
-<br>
-
-**37. For x∈V, the most commonly used norms are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**38. [Norm, Notation, Definition, Use case]**
-
-&#10230;
-
-<br>
-
-**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
-
-&#10230;
-
-<br>
-
-**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
-
-&#10230;
-
-<br>
-
-**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
-
-&#10230;
-
-<br>
-
-**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
-
-&#10230;
-
-<br>
-
-**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
-
-&#10230;
-
-<br>
-
-**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**46. diagonal**
-
-&#10230;
-
-<br>
-
-**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
-
-&#10230;
-
-<br>
-
-**48. Matrix calculus**
-
-&#10230;
-
-<br>
-
-**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
-
-&#10230;
-
-<br>
-
-**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
-
-&#10230;
-
-<br>
-
-**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
-
-&#10230;
-
-<br>
-
-**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
-
-&#10230;
-
-<br>
-
-**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
-
-&#10230;
-
-<br>
-
-**54. [General notations, Definitions, Main matrices]**
-
-&#10230;
-
-<br>
-
-**55. [Matrix operations, Multiplication, Other operations]**
-
-&#10230;
-
-<br>
-
-**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
-
-&#10230;
-
-<br>
-
-**57. [Matrix calculus, Gradient, Hessian, Operations]**
-
-&#10230;
diff --git a/he/refresher-probability.md b/he/refresher-probability.md
deleted file mode 100644
index 5c9b34656..000000000
--- a/he/refresher-probability.md
+++ /dev/null
@@ -1,381 +0,0 @@
-**1. Probabilities and Statistics refresher**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Probability and Combinatorics**
-
-&#10230;
-
-<br>
-
-**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-
-<br>
-
-**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-
-<br>
-
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-
-<br>
-
-**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-
-<br>
-
-**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-
-<br>
-
-**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-
-<br>
-
-**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-
-<br>
-
-**12. Conditional Probability**
-
-&#10230;
-
-<br>
-
-**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-
-<br>
-
-**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-
-<br>
-
-**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-
-<br>
-
-**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-
-<br>
-
-**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-
-<br>
-
-**18. Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-
-<br>
-
-**19. Random Variables**
-
-&#10230;
-
-<br>
-
-**20. Definitions**
-
-&#10230;
-
-<br>
-
-**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-
-<br>
-
-**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-
-<br>
-
-**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-
-<br>
-
-**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-
-<br>
-
-**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-
-<br>
-
-**26. [Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-
-<br>
-
-**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-
-<br>
-
-**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-
-<br>
-
-**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-
-<br>
-
-**32. Probability Distributions**
-
-&#10230;
-
-<br>
-
-**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-
-<br>
-
-**34. Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-
-<br>
-
-**35. [Type, Distribution]**
-
-&#10230;
-
-<br>
-
-**36. Jointly Distributed Random Variables**
-
-&#10230;
-
-<br>
-
-**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-
-<br>
-
-**38. [Case, Marginal density, Cumulative function]**
-
-&#10230;
-
-<br>
-
-**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**40. Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-
-<br>
-
-**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-
-<br>
-
-**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-
-<br>
-
-**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-
-<br>
-
-**44. Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-
-<br>
-
-**45. Parameter estimation**
-
-&#10230;
-
-<br>
-
-**46. Definitions**
-
-&#10230;
-
-<br>
-
-**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-
-<br>
-
-**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-
-<br>
-
-**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-
-<br>
-
-**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-
-<br>
-
-**51. Estimating the mean**
-
-&#10230;
-
-<br>
-
-**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
-
-&#10230;
-
-<br>
-
-**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-
-<br>
-
-**55. Estimating the variance**
-
-&#10230;
-
-<br>
-
-**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-
-<br>
-
-**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-
-<br>
-
-**59. [Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-
-<br>
-
-**60. [Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-
-<br>
-
-**61. [Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-
-<br>
-
-**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-
-<br>
-
-**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-
-<br>
-
-**64. [Parameter estimation, Mean, Variance]**
-
-&#10230;
diff --git a/hi/cheatsheet-deep-learning.md b/hi/cheatsheet-deep-learning.md
deleted file mode 100644
index a5aa3756c..000000000
--- a/hi/cheatsheet-deep-learning.md
+++ /dev/null
@@ -1,321 +0,0 @@
-**1. Deep Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Neural Networks**
-
-&#10230;
-
-<br>
-
-**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
-
-&#10230;
-
-<br>
-
-**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
-
-&#10230;
-
-<br>
-
-**5. [Input layer, hidden layer, output layer]**
-
-&#10230;
-
-<br>
-
-**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
-
-&#10230;
-
-<br>
-
-**7. where we note w, b, z the weight, bias and output respectively.**
-
-&#10230;
-
-<br>
-
-**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
-
-&#10230;
-
-<br>
-
-**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
-
-&#10230;
-
-<br>
-
-**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
-
-&#10230;
-
-<br>
-
-**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
-
-&#10230;
-
-<br>
-
-**13. As a result, the weight is updated as follows:**
-
-&#10230;
-
-<br>
-
-**14. Updating weights ― In a neural network, weights are updated as follows:**
-
-&#10230;
-
-<br>
-
-**15. Step 1: Take a batch of training data.**
-
-&#10230;
-
-<br>
-
-**16. Step 2: Perform forward propagation to obtain the corresponding loss.**
-
-&#10230;
-
-<br>
-
-**17. Step 3: Backpropagate the loss to get the gradients.**
-
-&#10230;
-
-<br>
-
-**18. Step 4: Use the gradients to update the weights of the network.**
-
-&#10230;
-
-<br>
-
-**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
-
-&#10230;
-
-<br>
-
-**20. Convolutional Neural Networks**
-
-&#10230;
-
-<br>
-
-**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
-
-&#10230;
-
-<br>
-
-**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
-
-&#10230;
-
-<br>
-
-**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
-
-&#10230;
-
-<br>
-
-**24. Recurrent Neural Networks**
-
-&#10230;
-
-<br>
-
-**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
-
-&#10230;
-
-<br>
-
-**26. [Input gate, forget gate, gate, output gate]**
-
-&#10230;
-
-<br>
-
-**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
-
-&#10230;
-
-<br>
-
-**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
-
-&#10230;
-
-<br>
-
-**29. Reinforcement Learning and Control**
-
-&#10230;
-
-<br>
-
-**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
-
-&#10230;
-
-<br>
-
-**33. S is the set of states**
-
-&#10230;
-
-<br>
-
-**34. A is the set of actions**
-
-&#10230;
-
-<br>
-
-**35. {Psa} are the state transition probabilities for s∈S and a∈A**
-
-&#10230;
-
-<br>
-
-**36. γ∈[0,1[ is the discount factor**
-
-&#10230;
-
-<br>
-
-**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
-
-&#10230;
-
-<br>
-
-**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
-
-&#10230;
-
-<br>
-
-**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
-
-&#10230;
-
-<br>
-
-**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
-
-&#10230;
-
-<br>
-
-**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
-
-&#10230;
-
-<br>
-
-**42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
-
-&#10230;
-
-<br>
-
-**43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
-
-&#10230;
-
-<br>
-
-**44. 1) We initialize the value:**
-
-&#10230;
-
-<br>
-
-**45. 2) We iterate the value based on the values before:**
-
-&#10230;
-
-<br>
-
-**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
-
-&#10230;
-
-<br>
-
-**47. times took action a in state s and got to s′**
-
-&#10230;
-
-<br>
-
-**48. times took action a in state s**
-
-&#10230;
-
-<br>
-
-**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
-
-&#10230;
-
-<br>
-
-**50. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-**51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
-
-&#10230;
-
-<br>
-
-**52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
-
-&#10230;
-
-<br>
-
-**53. [Recurrent Neural Networks, Gates, LSTM]**
-
-&#10230;
-
-<br>
-
-**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
-
-&#10230;
diff --git a/hi/cheatsheet-machine-learning-tips-and-tricks.md b/hi/cheatsheet-machine-learning-tips-and-tricks.md
deleted file mode 100644
index 9712297b8..000000000
--- a/hi/cheatsheet-machine-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,285 +0,0 @@
-**1. Machine Learning tips and tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Classification metrics**
-
-&#10230;
-
-<br>
-
-**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
-
-&#10230;
-
-<br>
-
-**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**5. [Predicted class, Actual class]**
-
-&#10230;
-
-<br>
-
-**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
-
-&#10230;
-
-<br>
-
-**7. [Metric, Formula, Interpretation]**
-
-&#10230;
-
-<br>
-
-**8. Overall performance of model**
-
-&#10230;
-
-<br>
-
-**9. How accurate the positive predictions are**
-
-&#10230;
-
-<br>
-
-**10. Coverage of actual positive sample**
-
-&#10230;
-
-<br>
-
-**11. Coverage of actual negative sample**
-
-&#10230;
-
-<br>
-
-**12. Hybrid metric useful for unbalanced classes**
-
-&#10230;
-
-<br>
-
-**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**14. [Metric, Formula, Equivalent]**
-
-&#10230;
-
-<br>
-
-**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
-
-&#10230;
-
-<br>
-
-**16. [Actual, Predicted]**
-
-&#10230;
-
-<br>
-
-**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
-
-&#10230;
-
-<br>
-
-**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
-
-&#10230;
-
-<br>
-
-**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
-
-&#10230;
-
-<br>
-
-**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
-
-&#10230;
-
-<br>
-
-**22. Model selection**
-
-&#10230;
-
-<br>
-
-**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
-
-&#10230;
-
-<br>
-
-**24. [Training set, Validation set, Testing set]**
-
-&#10230;
-
-<br>
-
-**25. [Model is trained, Model is assessed, Model gives predictions]**
-
-&#10230;
-
-<br>
-
-**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
-
-&#10230;
-
-<br>
-
-**27. [Also called hold-out or development set, Unseen data]**
-
-&#10230;
-
-<br>
-
-**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
-
-&#10230;
-
-<br>
-
-**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
-
-&#10230;
-
-<br>
-
-**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
-
-&#10230;
-
-<br>
-
-**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
-
-&#10230;
-
-<br>
-
-**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
-
-&#10230;
-
-<br>
-
-**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-**35. Diagnostics**
-
-&#10230;
-
-<br>
-
-**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
-
-&#10230;
-
-<br>
-
-**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
-
-&#10230;
-
-<br>
-
-**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
-
-&#10230;
-
-<br>
-
-**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
-
-&#10230;
-
-<br>
-
-**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
-
-&#10230;
-
-<br>
-
-**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
-
-&#10230;
-
-<br>
-
-**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
-
-&#10230;
-
-<br>
-
-**44. Regression metrics**
-
-&#10230;
-
-<br>
-
-**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-**47. [Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/hi/cheatsheet-supervised-learning.md b/hi/cheatsheet-supervised-learning.md
deleted file mode 100644
index a6b19ea1c..000000000
--- a/hi/cheatsheet-supervised-learning.md
+++ /dev/null
@@ -1,567 +0,0 @@
-**1. Supervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Supervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
-
-&#10230;
-
-<br>
-
-**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**5. [Regression, Classifier, Outcome, Examples]**
-
-&#10230;
-
-<br>
-
-**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**7. Type of model ― The different models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
-
-&#10230;
-
-<br>
-
-**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**10. Notations and general concepts**
-
-&#10230;
-
-<br>
-
-**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
-
-&#10230;
-
-<br>
-
-**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
-
-&#10230;
-
-<br>
-
-**14. [Linear regression, Logistic regression, SVM, Neural Network]**
-
-&#10230;
-
-<br>
-
-**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
-
-&#10230;
-
-<br>
-
-**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
-
-&#10230;
-
-<br>
-
-**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
-
-&#10230;
-
-<br>
-
-**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
-
-&#10230;
-
-<br>
-
-**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
-
-&#10230;
-
-<br>
-
-**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
-
-&#10230;
-
-<br>
-
-**21. Linear models**
-
-&#10230;
-
-<br>
-
-**22. Linear regression**
-
-&#10230;
-
-<br>
-
-**23. We assume here that y|x;θ∼N(μ,σ2)**
-
-&#10230;
-
-<br>
-
-**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
-
-&#10230;
-
-<br>
-
-**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
-
-&#10230;
-
-<br>
-
-**26. Remark: the update rule is a particular case of the gradient ascent.**
-
-&#10230;
-
-<br>
-
-**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
-
-&#10230;
-
-<br>
-
-**28. Classification and logistic regression**
-
-&#10230;
-
-<br>
-
-**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
-
-&#10230;
-
-<br>
-
-**31. Remark: there is no closed form solution for the case of logistic regressions.**
-
-&#10230;
-
-<br>
-
-**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
-
-&#10230;
-
-<br>
-
-**33. Generalized Linear Models**
-
-&#10230;
-
-<br>
-
-**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
-
-&#10230;
-
-<br>
-
-**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
-
-&#10230;
-
-<br>
-
-**36. Here are the most common exponential distributions summed up in the following table:**
-
-&#10230;
-
-<br>
-
-**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
-
-&#10230;
-
-<br>
-
-**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
-
-&#10230;
-
-<br>
-
-**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
-
-&#10230;
-
-<br>
-
-**40. Support Vector Machines**
-
-&#10230;
-
-<br>
-
-**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
-
-&#10230;
-
-<br>
-
-**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
-
-&#10230;
-
-<br>
-
-**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
-
-&#10230;
-
-<br>
-
-**44. such that**
-
-&#10230;
-
-<br>
-
-**45. support vectors**
-
-&#10230;
-
-<br>
-
-**46. Remark: the line is defined as wTx−b=0.**
-
-&#10230;
-
-<br>
-
-**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
-
-&#10230;
-
-<br>
-
-**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
-
-&#10230;
-
-<br>
-
-**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
-
-&#10230;
-
-<br>
-
-**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
-
-&#10230;
-
-<br>
-
-**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the coefficients βi are called the Lagrange multipliers.**
-
-&#10230;
-
-<br>
-
-**54. Generative Learning**
-
-&#10230;
-
-<br>
-
-**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
-
-&#10230;
-
-<br>
-
-**56. Gaussian Discriminant Analysis**
-
-&#10230;
-
-<br>
-
-**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
-
-&#10230;
-
-<br>
-
-**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
-
-&#10230;
-
-<br>
-
-**59. Naive Bayes**
-
-&#10230;
-
-<br>
-
-**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
-
-&#10230;
-
-<br>
-
-**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
-
-&#10230;
-
-<br>
-
-**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
-
-&#10230;
-
-<br>
-
-**63. Tree-based and ensemble methods**
-
-&#10230;
-
-<br>
-
-**64. These methods can be used for both regression and classification problems.**
-
-&#10230;
-
-<br>
-
-**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
-
-&#10230;
-
-<br>
-
-**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
-
-&#10230;
-
-<br>
-
-**67. Remark: random forests are a type of ensemble methods.**
-
-&#10230;
-
-<br>
-
-**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**69. [Adaptive boosting, Gradient boosting]**
-
-&#10230;
-
-<br>
-
-**70. High weights are put on errors to improve at the next boosting step**
-
-&#10230;
-
-<br>
-
-**71. Weak learners trained on remaining errors**
-
-&#10230;
-
-<br>
-
-**72. Other non-parametric approaches**
-
-&#10230;
-
-<br>
-
-**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
-
-&#10230;
-
-<br>
-
-**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**75. Learning Theory**
-
-&#10230;
-
-<br>
-
-**76. Union bound ― Let A1,...,Ak be k events. We have:**
-
-&#10230;
-
-<br>
-
-**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
-
-&#10230;
-
-<br>
-
-**78. Remark: this inequality is also known as the Chernoff bound.**
-
-&#10230;
-
-<br>
-
-**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
-
-&#10230;
-
-<br>
-
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
-
-&#10230;
-
-<br>
-
-**81: the training and testing sets follow the same distribution **
-
-&#10230;
-
-<br>
-
-**82. the training examples are drawn independently**
-
-&#10230;
-
-<br>
-
-**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
-
-&#10230;
-
-<br>
-
-**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
-
-&#10230;
-
-<br>
-
-**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
-
-&#10230;
-
-<br>
-
-**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**88. [Introduction, Type of prediction, Type of model]**
-
-&#10230;
-
-<br>
-
-**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
-
-&#10230;
-
-<br>
-
-**90. [Linear models, linear regression, logistic regression, generalized linear models]**
-
-&#10230;
-
-<br>
-
-**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
-
-&#10230;
-
-<br>
-
-**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
-
-&#10230;
-
-<br>
-
-**94. [Other methods, k-NN]**
-
-&#10230;
-
-<br>
-
-**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
-
-&#10230;
diff --git a/hi/cheatsheet-unsupervised-learning.md b/hi/cheatsheet-unsupervised-learning.md
deleted file mode 100644
index d07b74750..000000000
--- a/hi/cheatsheet-unsupervised-learning.md
+++ /dev/null
@@ -1,340 +0,0 @@
-**1. Unsupervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Unsupervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
-
-&#10230;
-
-<br>
-
-**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
-
-&#10230;
-
-<br>
-
-**5. Clustering**
-
-&#10230;
-
-<br>
-
-**6. Expectation-Maximization**
-
-&#10230;
-
-<br>
-
-**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
-
-&#10230;
-
-<br>
-
-**8. [Setting, Latent variable z, Comments]**
-
-&#10230;
-
-<br>
-
-**9. [Mixture of k Gaussians, Factor analysis]**
-
-&#10230;
-
-<br>
-
-**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
-
-&#10230;
-
-<br>
-
-**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
-
-&#10230;
-
-<br>
-
-**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
-
-&#10230;
-
-<br>
-
-**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
-
-&#10230;
-
-<br>
-
-**14. k-means clustering**
-
-&#10230;
-
-<br>
-
-**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
-
-&#10230;
-
-<br>
-
-**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
-
-&#10230;
-
-<br>
-
-**17. [Means initialization, Cluster assignment, Means update, Convergence]**
-
-&#10230;
-
-<br>
-
-**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
-
-&#10230;
-
-<br>
-
-**19. Hierarchical clustering**
-
-&#10230;
-
-<br>
-
-**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
-
-&#10230;
-
-<br>
-
-**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**22. [Ward linkage, Average linkage, Complete linkage]**
-
-&#10230;
-
-<br>
-
-**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
-
-&#10230;
-
-<br>
-
-**24. Clustering assessment metrics**
-
-&#10230;
-
-<br>
-
-**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
-
-&#10230;
-
-<br>
-
-**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
-
-&#10230;
-
-<br>
-
-**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
-
-&#10230;
-
-<br>
-
-**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Dimension reduction**
-
-&#10230;
-
-<br>
-
-**30. Principal component analysis**
-
-&#10230;
-
-<br>
-
-**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
-
-&#10230;
-
-<br>
-
-**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**34. diagonal**
-
-&#10230;
-
-<br>
-
-**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
-
-&#10230;
-
-<br>
-
-**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
-dimensions by maximizing the variance of the data as follows:**
-
-&#10230;
-
-<br>
-
-**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
-
-&#10230;
-
-<br>
-
-**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
-
-&#10230;
-
-<br>
-
-**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
-
-&#10230;
-
-<br>
-
-**40. Step 4: Project the data on spanR(u1,...,uk).**
-
-&#10230;
-
-<br>
-
-**41. This procedure maximizes the variance among all k-dimensional spaces.**
-
-&#10230;
-
-<br>
-
-**42. [Data in feature space, Find principal components, Data in principal components space]**
-
-&#10230;
-
-<br>
-
-**43. Independent component analysis**
-
-&#10230;
-
-<br>
-
-**44. It is a technique meant to find the underlying generating sources.**
-
-&#10230;
-
-<br>
-
-**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
-
-&#10230;
-
-<br>
-
-**46. The goal is to find the unmixing matrix W=A−1.**
-
-&#10230;
-
-<br>
-
-**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
-
-&#10230;
-
-<br>
-
-**48. Write the probability of x=As=W−1s as:**
-
-&#10230;
-
-<br>
-
-**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
-
-&#10230;
-
-<br>
-
-**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
-
-&#10230;
-
-<br>
-
-**51. The Machine Learning cheatsheets are now available in Hindi.**
-
-&#10230;
-
-<br>
-
-**52. Original authors**
-
-&#10230;
-
-<br>
-
-**53. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**54. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**55. [Introduction, Motivation, Jensen's inequality]**
-
-&#10230;
-
-<br>
-
-**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
-
-&#10230;
-
-<br>
-
-**57. [Dimension reduction, PCA, ICA]**
-
-&#10230;
diff --git a/hi/refresher-linear-algebra.md b/hi/refresher-linear-algebra.md
deleted file mode 100644
index a6b440d1e..000000000
--- a/hi/refresher-linear-algebra.md
+++ /dev/null
@@ -1,339 +0,0 @@
-**1. Linear Algebra and Calculus refresher**
-
-&#10230;
-
-<br>
-
-**2. General notations**
-
-&#10230;
-
-<br>
-
-**3. Definitions**
-
-&#10230;
-
-<br>
-
-**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
-
-&#10230;
-
-<br>
-
-**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
-
-&#10230;
-
-<br>
-
-**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
-
-&#10230;
-
-<br>
-
-**7. Main matrices**
-
-&#10230;
-
-<br>
-
-**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
-
-&#10230;
-
-<br>
-
-**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we also note D as diag(d1,...,dn).**
-
-&#10230;
-
-<br>
-
-**12. Matrix operations**
-
-&#10230;
-
-<br>
-
-**13. Multiplication**
-
-&#10230;
-
-<br>
-
-**14. Vector-vector ― There are two types of vector-vector products:**
-
-&#10230;
-
-<br>
-
-**15. inner product: for x,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**16. outer product: for x∈Rm,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
-
-&#10230;
-
-<br>
-
-**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
-
-&#10230;
-
-<br>
-
-**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
-
-&#10230;
-
-<br>
-
-**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
-
-&#10230;
-
-<br>
-
-**21. Other operations**
-
-&#10230;
-
-<br>
-
-**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
-
-&#10230;
-
-<br>
-
-**23. Remark: for matrices A,B, we have (AB)T=BTAT**
-
-&#10230;
-
-<br>
-
-**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
-
-&#10230;
-
-<br>
-
-**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
-
-&#10230;
-
-<br>
-
-**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
-
-&#10230;
-
-<br>
-
-**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
-
-&#10230;
-
-<br>
-
-**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
-
-&#10230;
-
-<br>
-
-**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
-
-&#10230;
-
-<br>
-
-**30. Matrix properties**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
-
-&#10230;
-
-<br>
-
-**33. [Symmetric, Antisymmetric]**
-
-&#10230;
-
-<br>
-
-**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
-
-&#10230;
-
-<br>
-
-**35. N(ax)=|a|N(x) for a scalar**
-
-&#10230;
-
-<br>
-
-**36. if N(x)=0, then x=0**
-
-&#10230;
-
-<br>
-
-**37. For x∈V, the most commonly used norms are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**38. [Norm, Notation, Definition, Use case]**
-
-&#10230;
-
-<br>
-
-**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
-
-&#10230;
-
-<br>
-
-**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
-
-&#10230;
-
-<br>
-
-**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
-
-&#10230;
-
-<br>
-
-**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
-
-&#10230;
-
-<br>
-
-**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
-
-&#10230;
-
-<br>
-
-**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**46. diagonal**
-
-&#10230;
-
-<br>
-
-**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
-
-&#10230;
-
-<br>
-
-**48. Matrix calculus**
-
-&#10230;
-
-<br>
-
-**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
-
-&#10230;
-
-<br>
-
-**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
-
-&#10230;
-
-<br>
-
-**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
-
-&#10230;
-
-<br>
-
-**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
-
-&#10230;
-
-<br>
-
-**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
-
-&#10230;
-
-<br>
-
-**54. [General notations, Definitions, Main matrices]**
-
-&#10230;
-
-<br>
-
-**55. [Matrix operations, Multiplication, Other operations]**
-
-&#10230;
-
-<br>
-
-**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
-
-&#10230;
-
-<br>
-
-**57. [Matrix calculus, Gradient, Hessian, Operations]**
-
-&#10230;
diff --git a/hi/refresher-probability.md b/hi/refresher-probability.md
deleted file mode 100644
index 5c9b34656..000000000
--- a/hi/refresher-probability.md
+++ /dev/null
@@ -1,381 +0,0 @@
-**1. Probabilities and Statistics refresher**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Probability and Combinatorics**
-
-&#10230;
-
-<br>
-
-**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-
-<br>
-
-**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-
-<br>
-
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-
-<br>
-
-**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-
-<br>
-
-**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-
-<br>
-
-**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-
-<br>
-
-**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-
-<br>
-
-**12. Conditional Probability**
-
-&#10230;
-
-<br>
-
-**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-
-<br>
-
-**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-
-<br>
-
-**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-
-<br>
-
-**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-
-<br>
-
-**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-
-<br>
-
-**18. Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-
-<br>
-
-**19. Random Variables**
-
-&#10230;
-
-<br>
-
-**20. Definitions**
-
-&#10230;
-
-<br>
-
-**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-
-<br>
-
-**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-
-<br>
-
-**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-
-<br>
-
-**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-
-<br>
-
-**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-
-<br>
-
-**26. [Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-
-<br>
-
-**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-
-<br>
-
-**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-
-<br>
-
-**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-
-<br>
-
-**32. Probability Distributions**
-
-&#10230;
-
-<br>
-
-**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-
-<br>
-
-**34. Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-
-<br>
-
-**35. [Type, Distribution]**
-
-&#10230;
-
-<br>
-
-**36. Jointly Distributed Random Variables**
-
-&#10230;
-
-<br>
-
-**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-
-<br>
-
-**38. [Case, Marginal density, Cumulative function]**
-
-&#10230;
-
-<br>
-
-**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**40. Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-
-<br>
-
-**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-
-<br>
-
-**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-
-<br>
-
-**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-
-<br>
-
-**44. Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-
-<br>
-
-**45. Parameter estimation**
-
-&#10230;
-
-<br>
-
-**46. Definitions**
-
-&#10230;
-
-<br>
-
-**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-
-<br>
-
-**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-
-<br>
-
-**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-
-<br>
-
-**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-
-<br>
-
-**51. Estimating the mean**
-
-&#10230;
-
-<br>
-
-**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
-
-&#10230;
-
-<br>
-
-**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-
-<br>
-
-**55. Estimating the variance**
-
-&#10230;
-
-<br>
-
-**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-
-<br>
-
-**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-
-<br>
-
-**59. [Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-
-<br>
-
-**60. [Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-
-<br>
-
-**61. [Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-
-<br>
-
-**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-
-<br>
-
-**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-
-<br>
-
-**64. [Parameter estimation, Mean, Variance]**
-
-&#10230;
diff --git a/ru/cheatsheet-deep-learning.md b/ru/cheatsheet-deep-learning.md
deleted file mode 100644
index a5aa3756c..000000000
--- a/ru/cheatsheet-deep-learning.md
+++ /dev/null
@@ -1,321 +0,0 @@
-**1. Deep Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Neural Networks**
-
-&#10230;
-
-<br>
-
-**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
-
-&#10230;
-
-<br>
-
-**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
-
-&#10230;
-
-<br>
-
-**5. [Input layer, hidden layer, output layer]**
-
-&#10230;
-
-<br>
-
-**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
-
-&#10230;
-
-<br>
-
-**7. where we note w, b, z the weight, bias and output respectively.**
-
-&#10230;
-
-<br>
-
-**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
-
-&#10230;
-
-<br>
-
-**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
-
-&#10230;
-
-<br>
-
-**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
-
-&#10230;
-
-<br>
-
-**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
-
-&#10230;
-
-<br>
-
-**13. As a result, the weight is updated as follows:**
-
-&#10230;
-
-<br>
-
-**14. Updating weights ― In a neural network, weights are updated as follows:**
-
-&#10230;
-
-<br>
-
-**15. Step 1: Take a batch of training data.**
-
-&#10230;
-
-<br>
-
-**16. Step 2: Perform forward propagation to obtain the corresponding loss.**
-
-&#10230;
-
-<br>
-
-**17. Step 3: Backpropagate the loss to get the gradients.**
-
-&#10230;
-
-<br>
-
-**18. Step 4: Use the gradients to update the weights of the network.**
-
-&#10230;
-
-<br>
-
-**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
-
-&#10230;
-
-<br>
-
-**20. Convolutional Neural Networks**
-
-&#10230;
-
-<br>
-
-**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
-
-&#10230;
-
-<br>
-
-**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
-
-&#10230;
-
-<br>
-
-**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
-
-&#10230;
-
-<br>
-
-**24. Recurrent Neural Networks**
-
-&#10230;
-
-<br>
-
-**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
-
-&#10230;
-
-<br>
-
-**26. [Input gate, forget gate, gate, output gate]**
-
-&#10230;
-
-<br>
-
-**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
-
-&#10230;
-
-<br>
-
-**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
-
-&#10230;
-
-<br>
-
-**29. Reinforcement Learning and Control**
-
-&#10230;
-
-<br>
-
-**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
-
-&#10230;
-
-<br>
-
-**33. S is the set of states**
-
-&#10230;
-
-<br>
-
-**34. A is the set of actions**
-
-&#10230;
-
-<br>
-
-**35. {Psa} are the state transition probabilities for s∈S and a∈A**
-
-&#10230;
-
-<br>
-
-**36. γ∈[0,1[ is the discount factor**
-
-&#10230;
-
-<br>
-
-**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
-
-&#10230;
-
-<br>
-
-**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
-
-&#10230;
-
-<br>
-
-**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
-
-&#10230;
-
-<br>
-
-**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
-
-&#10230;
-
-<br>
-
-**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
-
-&#10230;
-
-<br>
-
-**42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
-
-&#10230;
-
-<br>
-
-**43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
-
-&#10230;
-
-<br>
-
-**44. 1) We initialize the value:**
-
-&#10230;
-
-<br>
-
-**45. 2) We iterate the value based on the values before:**
-
-&#10230;
-
-<br>
-
-**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
-
-&#10230;
-
-<br>
-
-**47. times took action a in state s and got to s′**
-
-&#10230;
-
-<br>
-
-**48. times took action a in state s**
-
-&#10230;
-
-<br>
-
-**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
-
-&#10230;
-
-<br>
-
-**50. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-**51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
-
-&#10230;
-
-<br>
-
-**52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
-
-&#10230;
-
-<br>
-
-**53. [Recurrent Neural Networks, Gates, LSTM]**
-
-&#10230;
-
-<br>
-
-**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
-
-&#10230;
diff --git a/ru/cheatsheet-machine-learning-tips-and-tricks.md b/ru/cheatsheet-machine-learning-tips-and-tricks.md
deleted file mode 100644
index 9712297b8..000000000
--- a/ru/cheatsheet-machine-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,285 +0,0 @@
-**1. Machine Learning tips and tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Classification metrics**
-
-&#10230;
-
-<br>
-
-**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
-
-&#10230;
-
-<br>
-
-**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**5. [Predicted class, Actual class]**
-
-&#10230;
-
-<br>
-
-**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
-
-&#10230;
-
-<br>
-
-**7. [Metric, Formula, Interpretation]**
-
-&#10230;
-
-<br>
-
-**8. Overall performance of model**
-
-&#10230;
-
-<br>
-
-**9. How accurate the positive predictions are**
-
-&#10230;
-
-<br>
-
-**10. Coverage of actual positive sample**
-
-&#10230;
-
-<br>
-
-**11. Coverage of actual negative sample**
-
-&#10230;
-
-<br>
-
-**12. Hybrid metric useful for unbalanced classes**
-
-&#10230;
-
-<br>
-
-**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**14. [Metric, Formula, Equivalent]**
-
-&#10230;
-
-<br>
-
-**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
-
-&#10230;
-
-<br>
-
-**16. [Actual, Predicted]**
-
-&#10230;
-
-<br>
-
-**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
-
-&#10230;
-
-<br>
-
-**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
-
-&#10230;
-
-<br>
-
-**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
-
-&#10230;
-
-<br>
-
-**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
-
-&#10230;
-
-<br>
-
-**22. Model selection**
-
-&#10230;
-
-<br>
-
-**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
-
-&#10230;
-
-<br>
-
-**24. [Training set, Validation set, Testing set]**
-
-&#10230;
-
-<br>
-
-**25. [Model is trained, Model is assessed, Model gives predictions]**
-
-&#10230;
-
-<br>
-
-**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
-
-&#10230;
-
-<br>
-
-**27. [Also called hold-out or development set, Unseen data]**
-
-&#10230;
-
-<br>
-
-**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
-
-&#10230;
-
-<br>
-
-**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
-
-&#10230;
-
-<br>
-
-**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
-
-&#10230;
-
-<br>
-
-**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
-
-&#10230;
-
-<br>
-
-**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
-
-&#10230;
-
-<br>
-
-**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-**35. Diagnostics**
-
-&#10230;
-
-<br>
-
-**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
-
-&#10230;
-
-<br>
-
-**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
-
-&#10230;
-
-<br>
-
-**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
-
-&#10230;
-
-<br>
-
-**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
-
-&#10230;
-
-<br>
-
-**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
-
-&#10230;
-
-<br>
-
-**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
-
-&#10230;
-
-<br>
-
-**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
-
-&#10230;
-
-<br>
-
-**44. Regression metrics**
-
-&#10230;
-
-<br>
-
-**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-**47. [Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/ru/cheatsheet-supervised-learning.md b/ru/cheatsheet-supervised-learning.md
deleted file mode 100644
index a6b19ea1c..000000000
--- a/ru/cheatsheet-supervised-learning.md
+++ /dev/null
@@ -1,567 +0,0 @@
-**1. Supervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Supervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
-
-&#10230;
-
-<br>
-
-**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**5. [Regression, Classifier, Outcome, Examples]**
-
-&#10230;
-
-<br>
-
-**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**7. Type of model ― The different models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
-
-&#10230;
-
-<br>
-
-**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**10. Notations and general concepts**
-
-&#10230;
-
-<br>
-
-**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
-
-&#10230;
-
-<br>
-
-**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
-
-&#10230;
-
-<br>
-
-**14. [Linear regression, Logistic regression, SVM, Neural Network]**
-
-&#10230;
-
-<br>
-
-**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
-
-&#10230;
-
-<br>
-
-**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
-
-&#10230;
-
-<br>
-
-**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
-
-&#10230;
-
-<br>
-
-**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
-
-&#10230;
-
-<br>
-
-**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
-
-&#10230;
-
-<br>
-
-**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
-
-&#10230;
-
-<br>
-
-**21. Linear models**
-
-&#10230;
-
-<br>
-
-**22. Linear regression**
-
-&#10230;
-
-<br>
-
-**23. We assume here that y|x;θ∼N(μ,σ2)**
-
-&#10230;
-
-<br>
-
-**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
-
-&#10230;
-
-<br>
-
-**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
-
-&#10230;
-
-<br>
-
-**26. Remark: the update rule is a particular case of the gradient ascent.**
-
-&#10230;
-
-<br>
-
-**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
-
-&#10230;
-
-<br>
-
-**28. Classification and logistic regression**
-
-&#10230;
-
-<br>
-
-**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
-
-&#10230;
-
-<br>
-
-**31. Remark: there is no closed form solution for the case of logistic regressions.**
-
-&#10230;
-
-<br>
-
-**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
-
-&#10230;
-
-<br>
-
-**33. Generalized Linear Models**
-
-&#10230;
-
-<br>
-
-**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
-
-&#10230;
-
-<br>
-
-**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
-
-&#10230;
-
-<br>
-
-**36. Here are the most common exponential distributions summed up in the following table:**
-
-&#10230;
-
-<br>
-
-**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
-
-&#10230;
-
-<br>
-
-**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
-
-&#10230;
-
-<br>
-
-**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
-
-&#10230;
-
-<br>
-
-**40. Support Vector Machines**
-
-&#10230;
-
-<br>
-
-**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
-
-&#10230;
-
-<br>
-
-**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
-
-&#10230;
-
-<br>
-
-**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
-
-&#10230;
-
-<br>
-
-**44. such that**
-
-&#10230;
-
-<br>
-
-**45. support vectors**
-
-&#10230;
-
-<br>
-
-**46. Remark: the line is defined as wTx−b=0.**
-
-&#10230;
-
-<br>
-
-**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
-
-&#10230;
-
-<br>
-
-**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
-
-&#10230;
-
-<br>
-
-**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
-
-&#10230;
-
-<br>
-
-**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
-
-&#10230;
-
-<br>
-
-**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the coefficients βi are called the Lagrange multipliers.**
-
-&#10230;
-
-<br>
-
-**54. Generative Learning**
-
-&#10230;
-
-<br>
-
-**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
-
-&#10230;
-
-<br>
-
-**56. Gaussian Discriminant Analysis**
-
-&#10230;
-
-<br>
-
-**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
-
-&#10230;
-
-<br>
-
-**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
-
-&#10230;
-
-<br>
-
-**59. Naive Bayes**
-
-&#10230;
-
-<br>
-
-**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
-
-&#10230;
-
-<br>
-
-**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
-
-&#10230;
-
-<br>
-
-**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
-
-&#10230;
-
-<br>
-
-**63. Tree-based and ensemble methods**
-
-&#10230;
-
-<br>
-
-**64. These methods can be used for both regression and classification problems.**
-
-&#10230;
-
-<br>
-
-**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
-
-&#10230;
-
-<br>
-
-**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
-
-&#10230;
-
-<br>
-
-**67. Remark: random forests are a type of ensemble methods.**
-
-&#10230;
-
-<br>
-
-**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**69. [Adaptive boosting, Gradient boosting]**
-
-&#10230;
-
-<br>
-
-**70. High weights are put on errors to improve at the next boosting step**
-
-&#10230;
-
-<br>
-
-**71. Weak learners trained on remaining errors**
-
-&#10230;
-
-<br>
-
-**72. Other non-parametric approaches**
-
-&#10230;
-
-<br>
-
-**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
-
-&#10230;
-
-<br>
-
-**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**75. Learning Theory**
-
-&#10230;
-
-<br>
-
-**76. Union bound ― Let A1,...,Ak be k events. We have:**
-
-&#10230;
-
-<br>
-
-**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
-
-&#10230;
-
-<br>
-
-**78. Remark: this inequality is also known as the Chernoff bound.**
-
-&#10230;
-
-<br>
-
-**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
-
-&#10230;
-
-<br>
-
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
-
-&#10230;
-
-<br>
-
-**81: the training and testing sets follow the same distribution **
-
-&#10230;
-
-<br>
-
-**82. the training examples are drawn independently**
-
-&#10230;
-
-<br>
-
-**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
-
-&#10230;
-
-<br>
-
-**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
-
-&#10230;
-
-<br>
-
-**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
-
-&#10230;
-
-<br>
-
-**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**88. [Introduction, Type of prediction, Type of model]**
-
-&#10230;
-
-<br>
-
-**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
-
-&#10230;
-
-<br>
-
-**90. [Linear models, linear regression, logistic regression, generalized linear models]**
-
-&#10230;
-
-<br>
-
-**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
-
-&#10230;
-
-<br>
-
-**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
-
-&#10230;
-
-<br>
-
-**94. [Other methods, k-NN]**
-
-&#10230;
-
-<br>
-
-**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
-
-&#10230;
diff --git a/ru/cheatsheet-unsupervised-learning.md b/ru/cheatsheet-unsupervised-learning.md
deleted file mode 100644
index e18b3f50f..000000000
--- a/ru/cheatsheet-unsupervised-learning.md
+++ /dev/null
@@ -1,340 +0,0 @@
-**1. Unsupervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Unsupervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
-
-&#10230;
-
-<br>
-
-**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
-
-&#10230;
-
-<br>
-
-**5. Clustering**
-
-&#10230;
-
-<br>
-
-**6. Expectation-Maximization**
-
-&#10230;
-
-<br>
-
-**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
-
-&#10230;
-
-<br>
-
-**8. [Setting, Latent variable z, Comments]**
-
-&#10230;
-
-<br>
-
-**9. [Mixture of k Gaussians, Factor analysis]**
-
-&#10230;
-
-<br>
-
-**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
-
-&#10230;
-
-<br>
-
-**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
-
-&#10230;
-
-<br>
-
-**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
-
-&#10230;
-
-<br>
-
-**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
-
-&#10230;
-
-<br>
-
-**14. k-means clustering**
-
-&#10230;
-
-<br>
-
-**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
-
-&#10230;
-
-<br>
-
-**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
-
-&#10230;
-
-<br>
-
-**17. [Means initialization, Cluster assignment, Means update, Convergence]**
-
-&#10230;
-
-<br>
-
-**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
-
-&#10230;
-
-<br>
-
-**19. Hierarchical clustering**
-
-&#10230;
-
-<br>
-
-**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
-
-&#10230;
-
-<br>
-
-**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**22. [Ward linkage, Average linkage, Complete linkage]**
-
-&#10230;
-
-<br>
-
-**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
-
-&#10230;
-
-<br>
-
-**24. Clustering assessment metrics**
-
-&#10230;
-
-<br>
-
-**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
-
-&#10230;
-
-<br>
-
-**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
-
-&#10230;
-
-<br>
-
-**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
-
-&#10230;
-
-<br>
-
-**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Dimension reduction**
-
-&#10230;
-
-<br>
-
-**30. Principal component analysis**
-
-&#10230;
-
-<br>
-
-**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
-
-&#10230;
-
-<br>
-
-**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**34. diagonal**
-
-&#10230;
-
-<br>
-
-**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
-
-&#10230;
-
-<br>
-
-**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
-dimensions by maximizing the variance of the data as follows:**
-
-&#10230;
-
-<br>
-
-**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
-
-&#10230;
-
-<br>
-
-**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
-
-&#10230;
-
-<br>
-
-**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
-
-&#10230;
-
-<br>
-
-**40. Step 4: Project the data on spanR(u1,...,uk).**
-
-&#10230;
-
-<br>
-
-**41. This procedure maximizes the variance among all k-dimensional spaces.**
-
-&#10230;
-
-<br>
-
-**42. [Data in feature space, Find principal components, Data in principal components space]**
-
-&#10230;
-
-<br>
-
-**43. Independent component analysis**
-
-&#10230;
-
-<br>
-
-**44. It is a technique meant to find the underlying generating sources.**
-
-&#10230;
-
-<br>
-
-**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
-
-&#10230;
-
-<br>
-
-**46. The goal is to find the unmixing matrix W=A−1.**
-
-&#10230;
-
-<br>
-
-**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
-
-&#10230;
-
-<br>
-
-**48. Write the probability of x=As=W−1s as:**
-
-&#10230;
-
-<br>
-
-**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
-
-&#10230;
-
-<br>
-
-**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
-
-&#10230;
-
-<br>
-
-**51. The Machine Learning cheatsheets are now available in Russian.**
-
-&#10230;
-
-<br>
-
-**52. Original authors**
-
-&#10230;
-
-<br>
-
-**53. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**54. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**55. [Introduction, Motivation, Jensen's inequality]**
-
-&#10230;
-
-<br>
-
-**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
-
-&#10230;
-
-<br>
-
-**57. [Dimension reduction, PCA, ICA]**
-
-&#10230;
diff --git a/ru/refresher-linear-algebra.md b/ru/refresher-linear-algebra.md
deleted file mode 100644
index a6b440d1e..000000000
--- a/ru/refresher-linear-algebra.md
+++ /dev/null
@@ -1,339 +0,0 @@
-**1. Linear Algebra and Calculus refresher**
-
-&#10230;
-
-<br>
-
-**2. General notations**
-
-&#10230;
-
-<br>
-
-**3. Definitions**
-
-&#10230;
-
-<br>
-
-**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
-
-&#10230;
-
-<br>
-
-**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
-
-&#10230;
-
-<br>
-
-**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
-
-&#10230;
-
-<br>
-
-**7. Main matrices**
-
-&#10230;
-
-<br>
-
-**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
-
-&#10230;
-
-<br>
-
-**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we also note D as diag(d1,...,dn).**
-
-&#10230;
-
-<br>
-
-**12. Matrix operations**
-
-&#10230;
-
-<br>
-
-**13. Multiplication**
-
-&#10230;
-
-<br>
-
-**14. Vector-vector ― There are two types of vector-vector products:**
-
-&#10230;
-
-<br>
-
-**15. inner product: for x,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**16. outer product: for x∈Rm,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
-
-&#10230;
-
-<br>
-
-**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
-
-&#10230;
-
-<br>
-
-**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
-
-&#10230;
-
-<br>
-
-**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
-
-&#10230;
-
-<br>
-
-**21. Other operations**
-
-&#10230;
-
-<br>
-
-**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
-
-&#10230;
-
-<br>
-
-**23. Remark: for matrices A,B, we have (AB)T=BTAT**
-
-&#10230;
-
-<br>
-
-**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
-
-&#10230;
-
-<br>
-
-**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
-
-&#10230;
-
-<br>
-
-**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
-
-&#10230;
-
-<br>
-
-**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
-
-&#10230;
-
-<br>
-
-**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
-
-&#10230;
-
-<br>
-
-**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
-
-&#10230;
-
-<br>
-
-**30. Matrix properties**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
-
-&#10230;
-
-<br>
-
-**33. [Symmetric, Antisymmetric]**
-
-&#10230;
-
-<br>
-
-**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
-
-&#10230;
-
-<br>
-
-**35. N(ax)=|a|N(x) for a scalar**
-
-&#10230;
-
-<br>
-
-**36. if N(x)=0, then x=0**
-
-&#10230;
-
-<br>
-
-**37. For x∈V, the most commonly used norms are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**38. [Norm, Notation, Definition, Use case]**
-
-&#10230;
-
-<br>
-
-**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
-
-&#10230;
-
-<br>
-
-**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
-
-&#10230;
-
-<br>
-
-**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
-
-&#10230;
-
-<br>
-
-**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
-
-&#10230;
-
-<br>
-
-**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
-
-&#10230;
-
-<br>
-
-**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**46. diagonal**
-
-&#10230;
-
-<br>
-
-**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
-
-&#10230;
-
-<br>
-
-**48. Matrix calculus**
-
-&#10230;
-
-<br>
-
-**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
-
-&#10230;
-
-<br>
-
-**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
-
-&#10230;
-
-<br>
-
-**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
-
-&#10230;
-
-<br>
-
-**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
-
-&#10230;
-
-<br>
-
-**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
-
-&#10230;
-
-<br>
-
-**54. [General notations, Definitions, Main matrices]**
-
-&#10230;
-
-<br>
-
-**55. [Matrix operations, Multiplication, Other operations]**
-
-&#10230;
-
-<br>
-
-**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
-
-&#10230;
-
-<br>
-
-**57. [Matrix calculus, Gradient, Hessian, Operations]**
-
-&#10230;
diff --git a/ru/refresher-probability.md b/ru/refresher-probability.md
deleted file mode 100644
index 5c9b34656..000000000
--- a/ru/refresher-probability.md
+++ /dev/null
@@ -1,381 +0,0 @@
-**1. Probabilities and Statistics refresher**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Probability and Combinatorics**
-
-&#10230;
-
-<br>
-
-**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-
-<br>
-
-**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-
-<br>
-
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-
-<br>
-
-**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-
-<br>
-
-**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-
-<br>
-
-**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-
-<br>
-
-**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-
-<br>
-
-**12. Conditional Probability**
-
-&#10230;
-
-<br>
-
-**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-
-<br>
-
-**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-
-<br>
-
-**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-
-<br>
-
-**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-
-<br>
-
-**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-
-<br>
-
-**18. Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-
-<br>
-
-**19. Random Variables**
-
-&#10230;
-
-<br>
-
-**20. Definitions**
-
-&#10230;
-
-<br>
-
-**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-
-<br>
-
-**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-
-<br>
-
-**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-
-<br>
-
-**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-
-<br>
-
-**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-
-<br>
-
-**26. [Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-
-<br>
-
-**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-
-<br>
-
-**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-
-<br>
-
-**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-
-<br>
-
-**32. Probability Distributions**
-
-&#10230;
-
-<br>
-
-**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-
-<br>
-
-**34. Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-
-<br>
-
-**35. [Type, Distribution]**
-
-&#10230;
-
-<br>
-
-**36. Jointly Distributed Random Variables**
-
-&#10230;
-
-<br>
-
-**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-
-<br>
-
-**38. [Case, Marginal density, Cumulative function]**
-
-&#10230;
-
-<br>
-
-**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**40. Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-
-<br>
-
-**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-
-<br>
-
-**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-
-<br>
-
-**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-
-<br>
-
-**44. Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-
-<br>
-
-**45. Parameter estimation**
-
-&#10230;
-
-<br>
-
-**46. Definitions**
-
-&#10230;
-
-<br>
-
-**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-
-<br>
-
-**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-
-<br>
-
-**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-
-<br>
-
-**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-
-<br>
-
-**51. Estimating the mean**
-
-&#10230;
-
-<br>
-
-**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
-
-&#10230;
-
-<br>
-
-**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-
-<br>
-
-**55. Estimating the variance**
-
-&#10230;
-
-<br>
-
-**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-
-<br>
-
-**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-
-<br>
-
-**59. [Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-
-<br>
-
-**60. [Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-
-<br>
-
-**61. [Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-
-<br>
-
-**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-
-<br>
-
-**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-
-<br>
-
-**64. [Parameter estimation, Mean, Variance]**
-
-&#10230;
diff --git a/zh/cheatsheet-deep-learning.md b/zh/cheatsheet-deep-learning.md
deleted file mode 100644
index a7604ccc6..000000000
--- a/zh/cheatsheet-deep-learning.md
+++ /dev/null
@@ -1,321 +0,0 @@
-1. **Deep Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-2. **Neural Networks**
-
-&#10230;
-
-<br>
-
-3. **Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
-
-&#10230;
-
-<br>
-
-4. **Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
-
-&#10230;
-
-<br>
-
-5. **[Input layer, hidden layer, output layer]**
-
-&#10230;
-
-<br>
-
-6. **By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
-
-&#10230;
-
-<br>
-
-7. **where we note w, b, z the weight, bias and output respectively.**
-
-&#10230;
-
-<br>
-
-8. **Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
-
-&#10230;
-
-<br>
-
-9. **[Sigmoid, Tanh, ReLU, Leaky ReLU]**
-
-&#10230;
-
-<br>
-
-10. **Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
-
-&#10230;
-
-<br>
-
-11. **Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
-
-&#10230;
-
-<br>
-
-12. **Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
-
-&#10230;
-
-<br>
-
-13. **As a result, the weight is updated as follows:**
-
-&#10230;
-
-<br>
-
-14. **Updating weights ― In a neural network, weights are updated as follows:**
-
-&#10230;
-
-<br>
-
-15. **Step 1: Take a batch of training data.**
-
-&#10230;
-
-<br>
-
-16. **Step 2: Perform forward propagation to obtain the corresponding loss.**
-
-&#10230;
-
-<br>
-
-17. **Step 3: Backpropagate the loss to get the gradients.**
-
-&#10230;
-
-<br>
-
-18. **Step 4: Use the gradients to update the weights of the network.**
-
-&#10230;
-
-<br>
-
-19. **Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
-
-&#10230;
-
-<br>
-
-20. **Convolutional Neural Networks**
-
-&#10230;
-
-<br>
-
-21. **Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
-
-&#10230;
-
-<br>
-
-22. **Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
-
-&#10230;
-
-<br>
-
-23. **It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
-
-&#10230;
-
-<br>
-
-24. **Recurrent Neural Networks**
-
-&#10230;
-
-<br>
-
-25. **Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
-
-&#10230;
-
-<br>
-
-26. **[Input gate, forget gate, gate, output gate]**
-
-&#10230;
-
-<br>
-
-27. **[Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
-
-&#10230;
-
-<br>
-
-28. **LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
-
-&#10230;
-
-<br>
-
-29. **Reinforcement Learning and Control**
-
-&#10230;
-
-<br>
-
-30. **The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
-
-&#10230;
-
-<br>
-
-31. **Definitions**
-
-&#10230;
-
-<br>
-
-32. **Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
-
-&#10230;
-
-<br>
-
-33. **S is the set of states**
-
-&#10230;
-
-<br>
-
-34. **A is the set of actions**
-
-&#10230;
-
-<br>
-
-35. **{Psa} are the state transition probabilities for s∈S and a∈A**
-
-&#10230;
-
-<br>
-
-36. **γ∈[0,1[ is the discount factor**
-
-&#10230;
-
-<br>
-
-37. **R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
-
-&#10230;
-
-<br>
-
-38. **Policy ― A policy π is a function π:S⟶A that maps states to actions.**
-
-&#10230;
-
-<br>
-
-39. **Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
-
-&#10230;
-
-<br>
-
-40. **Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
-
-&#10230;
-
-<br>
-
-41. **Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
-
-&#10230;
-
-<br>
-
-42. **Remark: we note that the optimal policy π∗ for a given state s is such that:**
-
-&#10230;
-
-<br>
-
-43. **Value iteration algorithm ― The value iteration algorithm is in two steps:**
-
-&#10230;
-
-<br>
-
-44. **1) We initialize the value:**
-
-&#10230;
-
-<br>
-
-45. **2) We iterate the value based on the values before:**
-
-&#10230;
-
-<br>
-
-46. **Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
-
-&#10230;
-
-<br>
-
-47. **times took action a in state s and got to s′**
-
-&#10230;
-
-<br>
-
-48. **times took action a in state s**
-
-&#10230;
-
-<br>
-
-49. **Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
-
-&#10230;
-
-<br>
-
-50. **View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-51. **[Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
-
-&#10230;
-
-<br>
-
-52. **[Convolutional Neural Networks, Convolutional layer, Batch normalization]**
-
-&#10230;
-
-<br>
-
-53. **[Recurrent Neural Networks, Gates, LSTM]**
-
-&#10230;
-
-<br>
-
-54. **[Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
-
-&#10230;
diff --git a/zh/cheatsheet-machine-learning-tips-and-tricks.md b/zh/cheatsheet-machine-learning-tips-and-tricks.md
deleted file mode 100644
index 61fab788c..000000000
--- a/zh/cheatsheet-machine-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,285 +0,0 @@
-1. **Machine Learning tips and tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-2. **Classification metrics**
-
-&#10230;
-
-<br>
-
-3. **In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
-
-&#10230;
-
-<br>
-
-4. **Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-5. **[Predicted class, Actual class]**
-
-&#10230;
-
-<br>
-
-6. **Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
-
-&#10230;
-
-<br>
-
-7. **[Metric, Formula, Interpretation]**
-
-&#10230;
-
-<br>
-
-8. **Overall performance of model**
-
-&#10230;
-
-<br>
-
-9. **How accurate the positive predictions are**
-
-&#10230;
-
-<br>
-
-10. **Coverage of actual positive sample**
-
-&#10230;
-
-<br>
-
-11. **Coverage of actual negative sample**
-
-&#10230;
-
-<br>
-
-12. **Hybrid metric useful for unbalanced classes**
-
-&#10230;
-
-<br>
-
-13. **ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-14. **[Metric, Formula, Equivalent]**
-
-&#10230;
-
-<br>
-
-15. **AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
-
-&#10230;
-
-<br>
-
-16. **[Actual, Predicted]**
-
-&#10230;
-
-<br>
-
-17. **Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
-
-&#10230;
-
-<br>
-
-18. **[Total sum of squares, Explained sum of squares, Residual sum of squares]**
-
-&#10230;
-
-<br>
-
-19. **Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
-
-&#10230;
-
-<br>
-
-20. **Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
-
-&#10230;
-
-<br>
-
-21. **where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
-
-&#10230;
-
-<br>
-
-22. **Model selection**
-
-&#10230;
-
-<br>
-
-23. **Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
-
-&#10230;
-
-<br>
-
-24. **[Training set, Validation set, Testing set]**
-
-&#10230;
-
-<br>
-
-25. **[Model is trained, Model is assessed, Model gives predictions]**
-
-&#10230;
-
-<br>
-
-26. **[Usually 80% of the dataset, Usually 20% of the dataset]**
-
-&#10230;
-
-<br>
-
-27. **[Also called hold-out or development set, Unseen data]**
-
-&#10230;
-
-<br>
-
-28. **Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
-
-&#10230;
-
-<br>
-
-29. **Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-30. [**Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
-
-&#10230;
-
-<br>
-
-31. **[Generally k=5 or 10, Case p=1 is called leave-one-out]**
-
-&#10230;
-
-<br>
-
-32. **The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
-
-&#10230;
-
-<br>
-
-33. **Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
-
-&#10230;
-
-<br>
-
-34. **[Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-35. **Diagnostics**
-
-&#10230;
-
-<br>
-
-36. **Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
-
-&#10230;
-
-<br>
-
-37. **Variance ― The variance of a model is the variability of the model prediction for given data points.**
-
-&#10230;
-
-<br>
-
-38. **Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
-
-&#10230;
-
-<br>
-
-39. **[Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
-
-&#10230;
-
-<br>
-
-40. **[High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
-
-&#10230;
-
-<br>
-
-41. **[Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
-
-&#10230;
-
-<br>
-
-42. **Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
-
-&#10230;
-
-<br>
-
-43. **Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
-
-&#10230;
-
-<br>
-
-44. **Regression metrics**
-
-&#10230;
-
-<br>
-
-45. **[Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-46. **[Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-47. **[Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-48. **[Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/zh/cheatsheet-unsupervised-learning.md b/zh/cheatsheet-unsupervised-learning.md
deleted file mode 100644
index 93708b826..000000000
--- a/zh/cheatsheet-unsupervised-learning.md
+++ /dev/null
@@ -1,339 +0,0 @@
-1. **Unsupervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-2. **Introduction to Unsupervised Learning**
-
-&#10230;
-
-<br>
-
-3. **Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
-
-&#10230;
-
-<br>
-
-4. **Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
-
-&#10230;
-
-<br>
-
-5. **Clustering**
-
-&#10230;
-
-<br>
-
-6. **Expectation-Maximization**
-
-&#10230;
-
-<br>
-
-7. **Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
-
-&#10230;
-
-<br>
-
-8. **[Setting, Latent variable z, Comments]**
-
-&#10230;
-
-<br>
-
-9. **[Mixture of k Gaussians, Factor analysis]**
-
-&#10230;
-
-<br>
-
-10. **Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
-
-&#10230;
-
-<br>
-
-11. **E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
-
-&#10230;
-
-<br>
-
-12. **M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
-
-&#10230;
-
-<br>
-
-13. **[Gaussians initialization, Expectation step, Maximization step, Convergence]**
-
-&#10230;
-
-<br>
-
-14. **k-means clustering**
-
-&#10230;
-
-<br>
-
-15. **We note c(i) the cluster of data point i and μj the center of cluster j.**
-
-&#10230;
-
-<br>
-
-16. **Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
-
-&#10230;
-
-<br>
-
-17. **[Means initialization, Cluster assignment, Means update, Convergence]**
-
-&#10230;
-
-<br>
-
-18. **Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
-
-&#10230;
-
-<br>
-
-19. **Hierarchical clustering**
-
-&#10230;
-
-<br>
-
-20. **Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
-
-&#10230;
-
-<br>
-
-21. **Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
-
-&#10230;
-
-<br>
-
-22. **[Ward linkage, Average linkage, Complete linkage]**
-
-&#10230;
-
-<br>
-
-23. **[Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
-
-&#10230;
-
-<br>
-
-24. **Clustering assessment metrics**
-
-&#10230;
-
-<br>
-
-25. **In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
-
-&#10230;
-
-<br>
-
-26. **Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
-
-&#10230;
-
-<br>
-
-27. **Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
-
-&#10230;
-
-<br>
-
-28. **the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-29. **Dimension reduction**
-
-&#10230;
-
-<br>
-
-30. **Principal component analysis**
-
-&#10230;
-
-<br>
-
-31. **It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
-
-&#10230;
-
-<br>
-
-32. **Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-33. **Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-34. **diagonal**
-
-&#10230;
-
-<br>
-
-35. **Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
-
-&#10230;
-
-<br>
-
-36. **Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:**
-
-&#10230;
-
-<br>
-
-37. **Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
-
-&#10230;
-
-<br>
-
-38. **Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
-
-&#10230;
-
-<br>
-
-39. **Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
-
-&#10230;
-
-<br>
-
-40. **Step 4: Project the data on spanR(u1,...,uk).**
-
-&#10230;
-
-<br>
-
-41. **This procedure maximizes the variance among all k-dimensional spaces.**
-
-&#10230;
-
-<br>
-
-42. **[Data in feature space, Find principal components, Data in principal components space]**
-
-&#10230;
-
-<br>
-
-43. **Independent component analysis**
-
-&#10230;
-
-<br>
-
-44. **It is a technique meant to find the underlying generating sources.**
-
-&#10230;
-
-<br>
-
-45. **Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
-
-&#10230;
-
-<br>
-
-46. **The goal is to find the unmixing matrix W=A−1.**
-
-&#10230;
-
-<br>
-
-47. **Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
-
-&#10230;
-
-<br>
-
-48. **Write the probability of x=As=W−1s as:**
-
-&#10230;
-
-<br>
-
-49. **Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
-
-&#10230;
-
-<br>
-
-50. **Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
-
-&#10230;
-
-<br>
-
-51. **The Machine Learning cheatsheets are now available in Mandarin.**
-
-&#10230;
-
-<br>
-
-52. **Original authors**
-
-&#10230;
-
-<br>
-
-53. **Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-54. **Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-55. **[Introduction, Motivation, Jensen's inequality]**
-
-&#10230;
-
-<br>
-
-56. **[Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
-
-&#10230;
-
-<br>
-
-57. **[Dimension reduction, PCA, ICA]**
-
-&#10230;
diff --git a/zh/refresher-linear-algebra.md b/zh/refresher-linear-algebra.md
deleted file mode 100644
index 6cef234fe..000000000
--- a/zh/refresher-linear-algebra.md
+++ /dev/null
@@ -1,339 +0,0 @@
-1. **Linear Algebra and Calculus refresher**
-
-&#10230;
-
-<br>
-
-2. **General notations**
-
-&#10230;
-
-<br>
-
-3. **Definitions**
-
-&#10230;
-
-<br>
-
-4. **Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
-
-&#10230;
-
-<br>
-
-5. **Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
-
-&#10230;
-
-<br>
-
-6. **Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
-
-&#10230;
-
-<br>
-
-7. **Main matrices**
-
-&#10230;
-
-<br>
-
-8. **Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-9. **Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
-
-&#10230;
-
-<br>
-
-10. **Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-11. **Remark: we also note D as diag(d1,...,dn).**
-
-&#10230;
-
-<br>
-
-12. **Matrix operations**
-
-&#10230;
-
-<br>
-
-13. **Multiplication**
-
-&#10230;
-
-<br>
-
-14. **Vector-vector ― There are two types of vector-vector products:**
-
-&#10230;
-
-<br>
-
-15. **inner product: for x,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-16. **outer product: for x∈Rm,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-17. **Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
-
-&#10230;
-
-<br>
-
-18. **where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
-
-&#10230;
-
-<br>
-
-19. **Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
-
-&#10230;
-
-<br>
-
-20. **where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
-
-&#10230;
-
-<br>
-
-21. **Other operations**
-
-&#10230;
-
-<br>
-
-22. **Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
-
-&#10230;
-
-<br>
-
-23. **Remark: for matrices A,B, we have (AB)T=BTAT**
-
-&#10230;
-
-<br>
-
-24. **Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
-
-&#10230;
-
-<br>
-
-25. **Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
-
-&#10230;
-
-<br>
-
-26. **Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
-
-&#10230;
-
-<br>
-
-27. **Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
-
-&#10230;
-
-<br>
-
-28. **Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
-
-&#10230;
-
-<br>
-
-29. **Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
-
-&#10230;
-
-<br>
-
-30. **Matrix properties**
-
-&#10230;
-
-<br>
-
-31. **Definitions**
-
-&#10230;
-
-<br>
-
-32. **Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
-
-&#10230;
-
-<br>
-
-33. **[Symmetric, Antisymmetric]**
-
-&#10230;
-
-<br>
-
-34. **Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
-
-&#10230;
-
-<br>
-
-35. **N(ax)=|a|N(x) for a scalar**
-
-&#10230;
-
-<br>
-
-36. **if N(x)=0, then x=0**
-
-&#10230;
-
-<br>
-
-37. **For x∈V, the most commonly used norms are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-38. **[Norm, Notation, Definition, Use case]**
-
-&#10230;
-
-<br>
-
-39. **Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
-
-&#10230;
-
-<br>
-
-40. **Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
-
-&#10230;
-
-<br>
-
-41. **Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
-
-&#10230;
-
-<br>
-
-42. **Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
-
-&#10230;
-
-<br>
-
-43. **Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
-
-&#10230;
-
-<br>
-
-44. **Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-45. **Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-46. **diagonal**
-
-&#10230;
-
-<br>
-
-47. **Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
-
-&#10230;
-
-<br>
-
-48. **Matrix calculus**
-
-&#10230;
-
-<br>
-
-49. **Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
-
-&#10230;
-
-<br>
-
-50. **Remark: the gradient of f is only defined when f is a function that returns a scalar.**
-
-&#10230;
-
-<br>
-
-51. **Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
-
-&#10230;
-
-<br>
-
-52. **Remark: the hessian of f is only defined when f is a function that returns a scalar**
-
-&#10230;
-
-<br>
-
-53. **Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
-
-&#10230;
-
-<br>
-
-54. **[General notations, Definitions, Main matrices]**
-
-&#10230;
-
-<br>
-
-55. **[Matrix operations, Multiplication, Other operations]**
-
-&#10230;
-
-<br>
-
-56. **[Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
-
-&#10230;
-
-<br>
-
-57. **[Matrix calculus, Gradient, Hessian, Operations]**
-
-&#10230;
diff --git a/zh/refresher-probability.md b/zh/refresher-probability.md
deleted file mode 100644
index 52e0056e0..000000000
--- a/zh/refresher-probability.md
+++ /dev/null
@@ -1,381 +0,0 @@
-1. **Probabilities and Statistics refresher**
-
-&#10230;
-
-<br>
-
-2. **Introduction to Probability and Combinatorics**
-
-&#10230;
-
-<br>
-
-3. **Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-
-<br>
-
-4. **Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-
-<br>
-
-5. **Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-
-<br>
-
-6. **Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-
-<br>
-
-7. **Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-
-<br>
-
-8. **Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-
-<br>
-
-9. **Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-10. **Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-11. **Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-
-<br>
-
-12. **Conditional Probability**
-
-&#10230;
-
-<br>
-
-13. **Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-
-<br>
-
-14. **Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-
-<br>
-
-15. **Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-
-<br>
-
-16. **Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-
-<br>
-
-17. **Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-
-<br>
-
-18. **Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-
-<br>
-
-19. **Random Variables**
-
-&#10230;
-
-<br>
-
-20. **Definitions**
-
-&#10230;
-
-<br>
-
-21. **Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-
-<br>
-
-22. **Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-
-<br>
-
-23. **Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-
-<br>
-
-24. **Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-
-<br>
-
-25. **Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-
-<br>
-
-26. **[Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-
-<br>
-
-27. **Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-
-<br>
-
-28. **Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-29. **Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-30. **Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-
-<br>
-
-31. **Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-
-<br>
-
-32. **Probability Distributions**
-
-&#10230;
-
-<br>
-
-33. **Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-
-<br>
-
-34. **Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-
-<br>
-
-35. **[Type, Distribution]**
-
-&#10230;
-
-<br>
-
-36. **Jointly Distributed Random Variables**
-
-&#10230;
-
-<br>
-
-37. **Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-
-<br>
-
-38. **[Case, Marginal density, Cumulative function]**
-
-&#10230;
-
-<br>
-
-39. **Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-
-<br>
-
-40. **Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-
-<br>
-
-41. **Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-
-<br>
-
-42. **Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-
-<br>
-
-43. **Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-
-<br>
-
-44. **Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-
-<br>
-
-45. **Parameter estimation**
-
-&#10230;
-
-<br>
-
-46. **Definitions**
-
-&#10230;
-
-<br>
-
-47. **Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-
-<br>
-
-48. **Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-
-<br>
-
-49. **Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-
-<br>
-
-50. **Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-
-<br>
-
-51. **Estimating the mean**
-
-&#10230;
-
-<br>
-
-52. **Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯X and is defined as follows:**
-
-&#10230;
-
-<br>
-
-53. **Remark: the sample mean is unbiased, i.e E[¯X]=μ.**
-
-&#10230;
-
-<br>
-
-54. **Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-
-<br>
-
-55. **Estimating the variance**
-
-&#10230;
-
-<br>
-
-56. **Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-
-<br>
-
-57. **Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-
-<br>
-
-58. **Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-
-<br>
-
-59. **[Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-
-<br>
-
-60. **[Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-
-<br>
-
-61. **[Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-
-<br>
-
-62. **[Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-
-<br>
-
-63. **[Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-
-<br>
-
-64. **[Parameter estimation, Mean, Variance]**
-
-&#10230;

From c5f71e056dacbd2223331fe60d944ab49f8344e5 Mon Sep 17 00:00:00 2001
From: afshinea <afshine.amidi@centraliens.net>
Date: Sun, 30 Jun 2019 11:30:57 -0700
Subject: [PATCH 259/531] Update notations

---
 ar/{cheatsheet-deep-learning.md => cs-229-deep-learning.md}       | 0
 ar/{refresher-linear-algebra.md => cs-229-linear-algebra.md}      | 0
 es/{cheatsheet-deep-learning.md => cs-229-deep-learning.md}       | 0
 es/{refresher-linear-algebra.md => cs-229-linear-algebra.md}      | 0
 ...s-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} | 0
 es/{refresher-probability.md => cs-229-probability.md}            | 0
 ...sheet-supervised-learning.md => cs-229-supervised-learning.md} | 0
 ...t-unsupervised-learning.md => cs-229-unsupervised-learning.md} | 0
 fa/{cheatsheet-deep-learning.md => cs-229-deep-learning.md}       | 0
 fa/{refresher-linear-algebra.md => cs-229-linear-algebra.md}      | 0
 ...s-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} | 0
 fa/{refresher-probability.md => cs-229-probability.md}            | 0
 ...sheet-supervised-learning.md => cs-229-supervised-learning.md} | 0
 ...t-unsupervised-learning.md => cs-229-unsupervised-learning.md} | 0
 ...neural-networks.md => cs-230-convolutional-neural-networks.md} | 0
 ...tips-and-tricks.md => cs-230-deep-learning-tips-and-tricks.md} | 0
 ...ent-neural-networks.md => cs-230-recurrent-neural-networks.md} | 0
 fr/{cheatsheet-deep-learning.md => cs-229-deep-learning.md}       | 0
 fr/{refresher-linear-algebra.md => cs-229-linear-algebra.md}      | 0
 ...s-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} | 0
 fr/{refresher-probability.md => cs-229-probability.md}            | 0
 ...sheet-supervised-learning.md => cs-229-supervised-learning.md} | 0
 ...t-unsupervised-learning.md => cs-229-unsupervised-learning.md} | 0
 ...neural-networks.md => cs-230-convolutional-neural-networks.md} | 0
 ...tips-and-tricks.md => cs-230-deep-learning-tips-and-tricks.md} | 0
 ...ent-neural-networks.md => cs-230-recurrent-neural-networks.md} | 0
 ...tips-and-tricks.md => cs-230-deep-learning-tips-and-tricks.md} | 0
 ko/{refresher-linear-algebra.md => cs-229-linear-algebra.md}      | 0
 ...s-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} | 0
 ko/{refresher-probability.md => cs-229-probability.md}            | 0
 ...t-unsupervised-learning.md => cs-229-unsupervised-learning.md} | 0
 pt/{cheatsheet-deep-learning.md => cs-229-deep-learning.md}       | 0
 pt/{refresher-linear-algebra.md => cs-229-linear-algebra.md}      | 0
 ...s-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} | 0
 pt/{refresher-probability.md => cs-229-probability.md}            | 0
 ...sheet-supervised-learning.md => cs-229-supervised-learning.md} | 0
 ...t-unsupervised-learning.md => cs-229-unsupervised-learning.md} | 0
 ...neural-networks.md => cs-230-convolutional-neural-networks.md} | 0
 tr/{cheatsheet-deep-learning.md => cs-229-deep-learning.md}       | 0
 tr/{refresher-linear-algebra.md => cs-229-linear-algebra.md}      | 0
 ...s-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} | 0
 tr/{refresher-probability.md => cs-229-probability.md}            | 0
 ...sheet-supervised-learning.md => cs-229-supervised-learning.md} | 0
 ...t-unsupervised-learning.md => cs-229-unsupervised-learning.md} | 0
 ...neural-networks.md => cs-230-convolutional-neural-networks.md} | 0
 ...tips-and-tricks.md => cs-230-deep-learning-tips-and-tricks.md} | 0
 ...ent-neural-networks.md => cs-230-recurrent-neural-networks.md} | 0
 uk/{refresher-probability.md => cs-229-probability.md}            | 0
 zh-tw/{cheatsheet-deep-learning.md => cs-229-deep-learning.md}    | 0
 zh-tw/{refresher-linear-algebra.md => cs-229-linear-algebra.md}   | 0
 zh-tw/{refresher-probability.md => cs-229-probability.md}         | 0
 ...sheet-supervised-learning.md => cs-229-supervised-learning.md} | 0
 ...t-unsupervised-learning.md => cs-229-unsupervised-learning.md} | 0
 ...sheet-supervised-learning.md => cs-229-supervised-learning.md} | 0
 54 files changed, 0 insertions(+), 0 deletions(-)
 rename ar/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (100%)
 rename ar/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (100%)
 rename es/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (100%)
 rename es/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (100%)
 rename es/{cheatsheet-machine-learning-tips-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} (100%)
 rename es/{refresher-probability.md => cs-229-probability.md} (100%)
 rename es/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)
 rename es/{cheatsheet-unsupervised-learning.md => cs-229-unsupervised-learning.md} (100%)
 rename fa/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (100%)
 rename fa/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (100%)
 rename fa/{cheatsheet-machine-learning-tips-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} (100%)
 rename fa/{refresher-probability.md => cs-229-probability.md} (100%)
 rename fa/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)
 rename fa/{cheatsheet-unsupervised-learning.md => cs-229-unsupervised-learning.md} (100%)
 rename fa/{convolutional-neural-networks.md => cs-230-convolutional-neural-networks.md} (100%)
 rename fa/{deep-learning-tips-and-tricks.md => cs-230-deep-learning-tips-and-tricks.md} (100%)
 rename fa/{recurrent-neural-networks.md => cs-230-recurrent-neural-networks.md} (100%)
 rename fr/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (100%)
 rename fr/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (100%)
 rename fr/{cheatsheet-machine-learning-tips-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} (100%)
 rename fr/{refresher-probability.md => cs-229-probability.md} (100%)
 rename fr/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)
 rename fr/{cheatsheet-unsupervised-learning.md => cs-229-unsupervised-learning.md} (100%)
 rename fr/{convolutional-neural-networks.md => cs-230-convolutional-neural-networks.md} (100%)
 rename fr/{deep-learning-tips-and-tricks.md => cs-230-deep-learning-tips-and-tricks.md} (100%)
 rename fr/{recurrent-neural-networks.md => cs-230-recurrent-neural-networks.md} (100%)
 rename ja/{deep-learning-tips-and-tricks.md => cs-230-deep-learning-tips-and-tricks.md} (100%)
 rename ko/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (100%)
 rename ko/{cheatsheet-machine-learning-tips-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} (100%)
 rename ko/{refresher-probability.md => cs-229-probability.md} (100%)
 rename ko/{cheatsheet-unsupervised-learning.md => cs-229-unsupervised-learning.md} (100%)
 rename pt/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (100%)
 rename pt/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (100%)
 rename pt/{cheatsheet-machine-learning-tips-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} (100%)
 rename pt/{refresher-probability.md => cs-229-probability.md} (100%)
 rename pt/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)
 rename pt/{cheatsheet-unsupervised-learning.md => cs-229-unsupervised-learning.md} (100%)
 rename pt/{convolutional-neural-networks.md => cs-230-convolutional-neural-networks.md} (100%)
 rename tr/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (100%)
 rename tr/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (100%)
 rename tr/{cheatsheet-machine-learning-tips-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} (100%)
 rename tr/{refresher-probability.md => cs-229-probability.md} (100%)
 rename tr/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)
 rename tr/{cheatsheet-unsupervised-learning.md => cs-229-unsupervised-learning.md} (100%)
 rename tr/{convolutional-neural-networks.md => cs-230-convolutional-neural-networks.md} (100%)
 rename tr/{deep-learning-tips-and-tricks.md => cs-230-deep-learning-tips-and-tricks.md} (100%)
 rename tr/{recurrent-neural-networks.md => cs-230-recurrent-neural-networks.md} (100%)
 rename uk/{refresher-probability.md => cs-229-probability.md} (100%)
 rename zh-tw/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (100%)
 rename zh-tw/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (100%)
 rename zh-tw/{refresher-probability.md => cs-229-probability.md} (100%)
 rename zh-tw/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)
 rename zh-tw/{cheatsheet-unsupervised-learning.md => cs-229-unsupervised-learning.md} (100%)
 rename zh/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)

diff --git a/ar/cheatsheet-deep-learning.md b/ar/cs-229-deep-learning.md
similarity index 100%
rename from ar/cheatsheet-deep-learning.md
rename to ar/cs-229-deep-learning.md
diff --git a/ar/refresher-linear-algebra.md b/ar/cs-229-linear-algebra.md
similarity index 100%
rename from ar/refresher-linear-algebra.md
rename to ar/cs-229-linear-algebra.md
diff --git a/es/cheatsheet-deep-learning.md b/es/cs-229-deep-learning.md
similarity index 100%
rename from es/cheatsheet-deep-learning.md
rename to es/cs-229-deep-learning.md
diff --git a/es/refresher-linear-algebra.md b/es/cs-229-linear-algebra.md
similarity index 100%
rename from es/refresher-linear-algebra.md
rename to es/cs-229-linear-algebra.md
diff --git a/es/cheatsheet-machine-learning-tips-and-tricks.md b/es/cs-229-machine-learning-tips-and-tricks.md
similarity index 100%
rename from es/cheatsheet-machine-learning-tips-and-tricks.md
rename to es/cs-229-machine-learning-tips-and-tricks.md
diff --git a/es/refresher-probability.md b/es/cs-229-probability.md
similarity index 100%
rename from es/refresher-probability.md
rename to es/cs-229-probability.md
diff --git a/es/cheatsheet-supervised-learning.md b/es/cs-229-supervised-learning.md
similarity index 100%
rename from es/cheatsheet-supervised-learning.md
rename to es/cs-229-supervised-learning.md
diff --git a/es/cheatsheet-unsupervised-learning.md b/es/cs-229-unsupervised-learning.md
similarity index 100%
rename from es/cheatsheet-unsupervised-learning.md
rename to es/cs-229-unsupervised-learning.md
diff --git a/fa/cheatsheet-deep-learning.md b/fa/cs-229-deep-learning.md
similarity index 100%
rename from fa/cheatsheet-deep-learning.md
rename to fa/cs-229-deep-learning.md
diff --git a/fa/refresher-linear-algebra.md b/fa/cs-229-linear-algebra.md
similarity index 100%
rename from fa/refresher-linear-algebra.md
rename to fa/cs-229-linear-algebra.md
diff --git a/fa/cheatsheet-machine-learning-tips-and-tricks.md b/fa/cs-229-machine-learning-tips-and-tricks.md
similarity index 100%
rename from fa/cheatsheet-machine-learning-tips-and-tricks.md
rename to fa/cs-229-machine-learning-tips-and-tricks.md
diff --git a/fa/refresher-probability.md b/fa/cs-229-probability.md
similarity index 100%
rename from fa/refresher-probability.md
rename to fa/cs-229-probability.md
diff --git a/fa/cheatsheet-supervised-learning.md b/fa/cs-229-supervised-learning.md
similarity index 100%
rename from fa/cheatsheet-supervised-learning.md
rename to fa/cs-229-supervised-learning.md
diff --git a/fa/cheatsheet-unsupervised-learning.md b/fa/cs-229-unsupervised-learning.md
similarity index 100%
rename from fa/cheatsheet-unsupervised-learning.md
rename to fa/cs-229-unsupervised-learning.md
diff --git a/fa/convolutional-neural-networks.md b/fa/cs-230-convolutional-neural-networks.md
similarity index 100%
rename from fa/convolutional-neural-networks.md
rename to fa/cs-230-convolutional-neural-networks.md
diff --git a/fa/deep-learning-tips-and-tricks.md b/fa/cs-230-deep-learning-tips-and-tricks.md
similarity index 100%
rename from fa/deep-learning-tips-and-tricks.md
rename to fa/cs-230-deep-learning-tips-and-tricks.md
diff --git a/fa/recurrent-neural-networks.md b/fa/cs-230-recurrent-neural-networks.md
similarity index 100%
rename from fa/recurrent-neural-networks.md
rename to fa/cs-230-recurrent-neural-networks.md
diff --git a/fr/cheatsheet-deep-learning.md b/fr/cs-229-deep-learning.md
similarity index 100%
rename from fr/cheatsheet-deep-learning.md
rename to fr/cs-229-deep-learning.md
diff --git a/fr/refresher-linear-algebra.md b/fr/cs-229-linear-algebra.md
similarity index 100%
rename from fr/refresher-linear-algebra.md
rename to fr/cs-229-linear-algebra.md
diff --git a/fr/cheatsheet-machine-learning-tips-and-tricks.md b/fr/cs-229-machine-learning-tips-and-tricks.md
similarity index 100%
rename from fr/cheatsheet-machine-learning-tips-and-tricks.md
rename to fr/cs-229-machine-learning-tips-and-tricks.md
diff --git a/fr/refresher-probability.md b/fr/cs-229-probability.md
similarity index 100%
rename from fr/refresher-probability.md
rename to fr/cs-229-probability.md
diff --git a/fr/cheatsheet-supervised-learning.md b/fr/cs-229-supervised-learning.md
similarity index 100%
rename from fr/cheatsheet-supervised-learning.md
rename to fr/cs-229-supervised-learning.md
diff --git a/fr/cheatsheet-unsupervised-learning.md b/fr/cs-229-unsupervised-learning.md
similarity index 100%
rename from fr/cheatsheet-unsupervised-learning.md
rename to fr/cs-229-unsupervised-learning.md
diff --git a/fr/convolutional-neural-networks.md b/fr/cs-230-convolutional-neural-networks.md
similarity index 100%
rename from fr/convolutional-neural-networks.md
rename to fr/cs-230-convolutional-neural-networks.md
diff --git a/fr/deep-learning-tips-and-tricks.md b/fr/cs-230-deep-learning-tips-and-tricks.md
similarity index 100%
rename from fr/deep-learning-tips-and-tricks.md
rename to fr/cs-230-deep-learning-tips-and-tricks.md
diff --git a/fr/recurrent-neural-networks.md b/fr/cs-230-recurrent-neural-networks.md
similarity index 100%
rename from fr/recurrent-neural-networks.md
rename to fr/cs-230-recurrent-neural-networks.md
diff --git a/ja/deep-learning-tips-and-tricks.md b/ja/cs-230-deep-learning-tips-and-tricks.md
similarity index 100%
rename from ja/deep-learning-tips-and-tricks.md
rename to ja/cs-230-deep-learning-tips-and-tricks.md
diff --git a/ko/refresher-linear-algebra.md b/ko/cs-229-linear-algebra.md
similarity index 100%
rename from ko/refresher-linear-algebra.md
rename to ko/cs-229-linear-algebra.md
diff --git a/ko/cheatsheet-machine-learning-tips-and-tricks.md b/ko/cs-229-machine-learning-tips-and-tricks.md
similarity index 100%
rename from ko/cheatsheet-machine-learning-tips-and-tricks.md
rename to ko/cs-229-machine-learning-tips-and-tricks.md
diff --git a/ko/refresher-probability.md b/ko/cs-229-probability.md
similarity index 100%
rename from ko/refresher-probability.md
rename to ko/cs-229-probability.md
diff --git a/ko/cheatsheet-unsupervised-learning.md b/ko/cs-229-unsupervised-learning.md
similarity index 100%
rename from ko/cheatsheet-unsupervised-learning.md
rename to ko/cs-229-unsupervised-learning.md
diff --git a/pt/cheatsheet-deep-learning.md b/pt/cs-229-deep-learning.md
similarity index 100%
rename from pt/cheatsheet-deep-learning.md
rename to pt/cs-229-deep-learning.md
diff --git a/pt/refresher-linear-algebra.md b/pt/cs-229-linear-algebra.md
similarity index 100%
rename from pt/refresher-linear-algebra.md
rename to pt/cs-229-linear-algebra.md
diff --git a/pt/cheatsheet-machine-learning-tips-and-tricks.md b/pt/cs-229-machine-learning-tips-and-tricks.md
similarity index 100%
rename from pt/cheatsheet-machine-learning-tips-and-tricks.md
rename to pt/cs-229-machine-learning-tips-and-tricks.md
diff --git a/pt/refresher-probability.md b/pt/cs-229-probability.md
similarity index 100%
rename from pt/refresher-probability.md
rename to pt/cs-229-probability.md
diff --git a/pt/cheatsheet-supervised-learning.md b/pt/cs-229-supervised-learning.md
similarity index 100%
rename from pt/cheatsheet-supervised-learning.md
rename to pt/cs-229-supervised-learning.md
diff --git a/pt/cheatsheet-unsupervised-learning.md b/pt/cs-229-unsupervised-learning.md
similarity index 100%
rename from pt/cheatsheet-unsupervised-learning.md
rename to pt/cs-229-unsupervised-learning.md
diff --git a/pt/convolutional-neural-networks.md b/pt/cs-230-convolutional-neural-networks.md
similarity index 100%
rename from pt/convolutional-neural-networks.md
rename to pt/cs-230-convolutional-neural-networks.md
diff --git a/tr/cheatsheet-deep-learning.md b/tr/cs-229-deep-learning.md
similarity index 100%
rename from tr/cheatsheet-deep-learning.md
rename to tr/cs-229-deep-learning.md
diff --git a/tr/refresher-linear-algebra.md b/tr/cs-229-linear-algebra.md
similarity index 100%
rename from tr/refresher-linear-algebra.md
rename to tr/cs-229-linear-algebra.md
diff --git a/tr/cheatsheet-machine-learning-tips-and-tricks.md b/tr/cs-229-machine-learning-tips-and-tricks.md
similarity index 100%
rename from tr/cheatsheet-machine-learning-tips-and-tricks.md
rename to tr/cs-229-machine-learning-tips-and-tricks.md
diff --git a/tr/refresher-probability.md b/tr/cs-229-probability.md
similarity index 100%
rename from tr/refresher-probability.md
rename to tr/cs-229-probability.md
diff --git a/tr/cheatsheet-supervised-learning.md b/tr/cs-229-supervised-learning.md
similarity index 100%
rename from tr/cheatsheet-supervised-learning.md
rename to tr/cs-229-supervised-learning.md
diff --git a/tr/cheatsheet-unsupervised-learning.md b/tr/cs-229-unsupervised-learning.md
similarity index 100%
rename from tr/cheatsheet-unsupervised-learning.md
rename to tr/cs-229-unsupervised-learning.md
diff --git a/tr/convolutional-neural-networks.md b/tr/cs-230-convolutional-neural-networks.md
similarity index 100%
rename from tr/convolutional-neural-networks.md
rename to tr/cs-230-convolutional-neural-networks.md
diff --git a/tr/deep-learning-tips-and-tricks.md b/tr/cs-230-deep-learning-tips-and-tricks.md
similarity index 100%
rename from tr/deep-learning-tips-and-tricks.md
rename to tr/cs-230-deep-learning-tips-and-tricks.md
diff --git a/tr/recurrent-neural-networks.md b/tr/cs-230-recurrent-neural-networks.md
similarity index 100%
rename from tr/recurrent-neural-networks.md
rename to tr/cs-230-recurrent-neural-networks.md
diff --git a/uk/refresher-probability.md b/uk/cs-229-probability.md
similarity index 100%
rename from uk/refresher-probability.md
rename to uk/cs-229-probability.md
diff --git a/zh-tw/cheatsheet-deep-learning.md b/zh-tw/cs-229-deep-learning.md
similarity index 100%
rename from zh-tw/cheatsheet-deep-learning.md
rename to zh-tw/cs-229-deep-learning.md
diff --git a/zh-tw/refresher-linear-algebra.md b/zh-tw/cs-229-linear-algebra.md
similarity index 100%
rename from zh-tw/refresher-linear-algebra.md
rename to zh-tw/cs-229-linear-algebra.md
diff --git a/zh-tw/refresher-probability.md b/zh-tw/cs-229-probability.md
similarity index 100%
rename from zh-tw/refresher-probability.md
rename to zh-tw/cs-229-probability.md
diff --git a/zh-tw/cheatsheet-supervised-learning.md b/zh-tw/cs-229-supervised-learning.md
similarity index 100%
rename from zh-tw/cheatsheet-supervised-learning.md
rename to zh-tw/cs-229-supervised-learning.md
diff --git a/zh-tw/cheatsheet-unsupervised-learning.md b/zh-tw/cs-229-unsupervised-learning.md
similarity index 100%
rename from zh-tw/cheatsheet-unsupervised-learning.md
rename to zh-tw/cs-229-unsupervised-learning.md
diff --git a/zh/cheatsheet-supervised-learning.md b/zh/cs-229-supervised-learning.md
similarity index 100%
rename from zh/cheatsheet-supervised-learning.md
rename to zh/cs-229-supervised-learning.md

From 6c802aac24a9b61d36aeba482ff0855411b4eb29 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 11:51:35 -0700
Subject: [PATCH 260/531] Update README.md

---
 README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 74756c284..e7b28e777 100644
--- a/README.md
+++ b/README.md
@@ -35,7 +35,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 ## Progression 
 ### CS 221 (Artificial Intelligence)
-| |[Reflex models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/reflex-models.md)|States models|Variables models|Logic models|
+| |[Reflex models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-reflex-models.md)|States models|Variables models|Logic models|
 |:---|:---:|:---:|:---:|:---:|
 |**Deutsch**|not started|not started|not started|not started|
 |**Español**|not started|not started|not started|not started|
@@ -51,7 +51,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**中文**|not started|not started|not started|not started|
 
 ### CS 229 (Machine Learning)
-| |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cheatsheet-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cheatsheet-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cheatsheet-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cheatsheet-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/refresher-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/refresher-linear-algebra.md)|
+| |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |**العَرَبِيَّة**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|
 |**Català**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|
@@ -76,7 +76,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 
 ### CS 230 (Deep Learning)
-| |[Convolutional Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/convolutional-neural-networks.md)|[Recurrent Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/recurrent-neural-networks.md)|[Deep Learning tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/deep-learning-tips-and-tricks.md)|
+| |[Convolutional Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-convolutional-neural-networks.md)|[Recurrent Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-recurrent-neural-networks.md)|[Deep Learning tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-deep-learning-tips-and-tricks.md)|
 |:---|:---:|:---:|:---:|
 |**العَرَبِيَّة**|not started|not started|not started|
 |**Català**|not started|not started|not started|

From e109d8302c4789eada10ce35b5ec2d052207dd5b Mon Sep 17 00:00:00 2001
From: afshinea <afshine.amidi@centraliens.net>
Date: Sun, 30 Jun 2019 14:10:50 -0700
Subject: [PATCH 261/531] Add template

---
 template/cs-221-states-models.md | 980 +++++++++++++++++++++++++++++++
 1 file changed, 980 insertions(+)
 create mode 100644 template/cs-221-states-models.md

diff --git a/template/cs-221-states-models.md b/template/cs-221-states-models.md
new file mode 100644
index 000000000..a7ea257dc
--- /dev/null
+++ b/template/cs-221-states-models.md
@@ -0,0 +1,980 @@
+**States-based models translation**
+
+<br>
+
+**1. States-based models with search optimization and MDP**
+
+&#10230;
+
+<br>
+
+
+**2. Search optimization**
+
+&#10230;
+
+<br>
+
+
+**3. In this section, we assume that by accomplishing action a from state s, we deterministically arrive in state Succ(s,a). The goal here is to determine a sequence of actions (a1,a2,a3,a4,...) that starts from an initial state and leads to an end state. In order to solve this kind of problem, our objective will be to find the minimum cost path by using states-based models.**
+
+&#10230;
+
+<br>
+
+
+**4. Tree search**
+
+&#10230;
+
+<br>
+
+
+**5. This category of states-based algorithms explores all possible states and actions. It is quite memory efficient, and is suitable for huge state spaces but the runtime can become exponential in the worst cases.**
+
+&#10230;
+
+<br>
+
+
+**6. [Self-loop, More than a parent, Cycle, More than a root, Valid tree]**
+
+&#10230;
+
+<br>
+
+
+**7. [Search problem ― A search problem is defined with:, a starting state sstart, possible actions Actions(s) from state s, action cost Cost(s,a) from state s with action a, successor Succ(s,a) of state s after action a, whether an end state was reached IsEnd(s)]**
+
+&#10230;
+
+<br>
+
+
+**8. The objective is to find a path that minimizes the cost.**
+
+&#10230;
+
+<br>
+
+
+**9. Backtracking search ― Backtracking search is a naive recursive algorithm that tries all possibilities to find the minimum cost path. Here, action costs can be either positive or negative.**
+
+&#10230;
+
+<br>
+
+
+**10. Breadth-first search (BFS) ― Breadth-first search is a graph search algorithm that does a level-by-level traversal. We can implement it iteratively with the help of a queue that stores at each step future nodes to be visited. For this algorithm, we can assume action costs to be equal to a constant c⩾0.**
+
+&#10230;
+
+<br>
+
+
+**11. Depth-first search (DFS) ― Depth-first search is a search algorithm that traverses a graph by following each path as deep as it can. We can implement it recursively, or iteratively with the help of a stack that stores at each step future nodes to be visited. For this algorithm, action costs are assumed to be equal to 0.**
+
+&#10230;
+
+<br>
+
+
+**12. Iterative deepening ― The iterative deepening trick is a modification of the depth-first search algorithm so that it stops after reaching a certain depth, which guarantees optimality when all action costs are equal. Here, we assume that action costs are equal to a constant c⩾0.**
+
+&#10230;
+
+<br>
+
+
+**13. Tree search algorithms summary ― By noting b the number of actions per state, d the solution depth, and D the maximum depth, we have:**
+
+&#10230;
+
+<br>
+
+
+**14. [Algorithm, Action costs, Space, Time]**
+
+&#10230;
+
+<br>
+
+
+**15. [Backtracking search, any, Breadth-first search, Depth-first search, DFS-Iterative deepening]**
+
+&#10230;
+
+<br>
+
+
+**16. Graph search**
+
+&#10230;
+
+<br>
+
+
+**17. This category of states-based algorithms aims at constructing optimal paths, enabling exponential savings. In this section, we will focus on dynamic programming and uniform cost search.**
+
+&#10230;
+
+<br>
+
+
+**18. Graph ― A graph is comprised of a set of vertices V (also called nodes) as well as a set of edges E (also called links).**
+
+&#10230;
+
+<br>
+
+
+**19. Remark: a graph is said to be acylic when there is no cycle.**
+
+&#10230;
+
+<br>
+
+
+**20. State ― A state is a summary of all past actions sufficient to choose future actions optimally.**
+
+&#10230;
+
+<br>
+
+
+**21. Dynamic programming ― Dynamic programming (DP) is a backtracking search algorithm with memoization (i.e. partial results are saved) whose goal is to find a minimum cost path from state s to an end state send. It can potentially have exponential savings compared to traditional graph search algorithms, and has the property to only work for acyclic graphs. For any given state s, the future cost is computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**22. [if, otherwise]**
+
+&#10230;
+
+<br>
+
+
+**23. Remark: the figure above illustrates a bottom-to-top approach whereas the formula provides the intuition of a top-to-bottom problem resolution.**
+
+&#10230;
+
+<br>
+
+
+**24. Types of states ― The table below presents the terminology when it comes to states in the context of uniform cost search:**
+
+&#10230;
+
+<br>
+
+
+**25. [State, Explanation]**
+
+&#10230;
+
+<br>
+
+
+**26. [Explored, Frontier, Unexplored]**
+
+&#10230;
+
+<br>
+
+
+**27. [States for which the optimal path has already been found, States seen for which we are still figuring out how to get there with the cheapest cost, States not seen yet]**
+
+&#10230;
+
+<br>
+
+
+**28. Uniform cost search ― Uniform cost search (UCS) is a search algorithm that aims at finding the shortest path from a state sstart to an end state send. It explores states s in increasing order of PastCost(s) and relies on the fact that all action costs are non-negative.**
+
+&#10230;
+
+<br>
+
+
+**29. Remark 1: the UCS algorithm is logically equivalent to Djikstra's algorithm.**
+
+&#10230;
+
+<br>
+
+
+**30. Remark 2: the algorithm would not work for a problem with negative action costs, and adding a positive constant to make them non-negative would not solve the problem since this would end up being a different problem.**
+
+&#10230;
+
+<br>
+
+
+**31. Correctness theorem ― When a state s is popped from the frontier F and moved to explored set E, its priority is equal to PastCost(s) which is the minimum cost path from sstart to s.**
+
+&#10230;
+
+<br>
+
+
+**32. Graph search algorithms summary ― By noting N the number of total states, n of which are explored before the end state send, we have:**
+
+&#10230;
+
+<br>
+
+
+**33. [Algorithm, Acyclicity, Costs, Time/space]**
+
+&#10230;
+
+<br>
+
+
+**34. [Dynamic programming, Uniform cost search]**
+
+&#10230;
+
+<br>
+
+
+**35. Remark: the complexity countdown supposes the number of possible actions per state to be constant.**
+
+&#10230;
+
+<br>
+
+
+**36. Learning costs**
+
+&#10230;
+
+<br>
+
+
+**37. Suppose we are not given the values of Cost(s,a), we want to estimate these quantities from a training set of minimizing-cost-path sequence of actions (a1,a2,...,ak).**
+
+&#10230;
+
+<br>
+
+
+**38. [Structured perceptron ― The structured perceptron is an algorithm aiming at iteratively learning the cost of each state-action pair. At each step, it:, decreases the estimated cost of each state-action of the true minimizing path y given by the training data, increases the estimated cost of each state-action of the current predicted path y' inferred from the learned weights.]**
+
+&#10230;
+
+<br>
+
+
+**39. Remark: there are several versions of the algorithm, one of which simplifies the problem to only learning the cost of each action a, and the other parametrizes Cost(s,a) to a feature vector of learnable weights.**
+
+&#10230;
+
+<br>
+
+
+**40. A* search**
+
+&#10230;
+
+<br>
+
+
+**41. Heuristic function ― A heuristic is a function h over states s, where each h(s) aims at estimating FutureCost(s), the cost of the path from s to send.**
+
+&#10230;
+
+<br>
+
+
+**42. Algorithm ― A∗ is a search algorithm that aims at finding the shortest path from a state s to an end state send. It explores states s in increasing order of PastCost(s)+h(s). It is equivalent to a uniform cost search with edge costs Cost′(s,a) given by:**
+
+&#10230;
+
+<br>
+
+
+**43. Remark: this algorithm can be seen as a biased version of UCS exploring states estimated to be closer to the end state.**
+
+&#10230;
+
+<br>
+
+
+**44. [Consistency ― A heuristic h is said to be consistent if it satisfies the two following properties:, For all states s and actions a, The end state verifies the following:]**
+
+&#10230;
+
+<br>
+
+
+**45. Correctness ― If h is consistent, then A∗ returns the minimum cost path.**
+
+&#10230;
+
+<br>
+
+
+**46. Admissibility ― A heuristic h is said to be admissible if we have:**
+
+&#10230;
+
+<br>
+
+
+**47. Theorem ― Let h(s) be a given heuristic. We have:**
+
+&#10230;
+
+<br>
+
+
+**48. [consistent, admissible]**
+
+&#10230;
+
+<br>
+
+
+**49. Efficiency ― A* explores all states s satisfying the following equation:**
+
+&#10230;
+
+<br>
+
+
+**50. Remark: larger values of h(s) is better as this equation shows it will restrict the set of states s going to be explored.**
+
+&#10230;
+
+<br>
+
+
+**51. Relaxation**
+
+&#10230;
+
+<br>
+
+
+**52. It is a framework for producing consistent heuristics. The idea is to find closed-form reduced costs by removing constraints and use them as heuristics.**
+
+&#10230;
+
+<br>
+
+
+**53. Relaxed search problem ― The relaxation of search problem P with costs Cost is noted Prel with costs Costrel, and satisfies the identity:**
+
+&#10230;
+
+<br>
+
+
+**54. Relaxed heuristic ― Given a relaxed search problem Prel, we define the relaxed heuristic h(s)=FutureCostrel(s) as the minimum cost path from s to an end state in the graph of costs Costrel(s,a).**
+
+&#10230;
+
+<br>
+
+
+**55. Consistency of relaxed heuristics ― Let Prel be a given relaxed problem. By theorem, we have:**
+
+&#10230;
+
+<br>
+
+
+**56. consistent**
+
+&#10230;
+
+<br>
+
+
+**57. [Tradeoff when choosing heuristic ― We have to balance two aspects in choosing a heuristic:, Computational efficiency: h(s)=FutureCostrel(s) must be easy to compute. It has to produce a closed form, easier search and independent subproblems., Good enough approximation: the heuristic h(s) should be close to FutureCost(s) and we have thus to not remove too many constraints.]**
+
+&#10230;
+
+<br>
+
+
+**58. Max heuristic ― Let h1(s), h2(s) be two heuristics. We have the following property:**
+
+&#10230;
+
+<br>
+
+
+**59. Markov decision processes**
+
+&#10230;
+
+<br>
+
+
+**60. In this section, we assume that performing action a from state s can lead to several states s′1,s′2,... in a probabilistic manner. In order to find our way between an initial state and an end state, our objective will be to find the maximum value policy by using Markov decision processes that help us cope with randomness and uncertainty.**
+
+&#10230;
+
+<br>
+
+
+**61. Notations**
+
+&#10230;
+
+<br>
+
+
+**62. [Definition ― The objective of a Markov decision process is to maximize rewards. It is defined with:, a starting state sstart, possible actions Actions(s) from state s, transition probabilities T(s,a,s′) from s to s′ with action a, rewards Reward(s,a,s′) from s to s′ with action a, whether an end state was reached IsEnd(s), a discount factor 0⩽γ⩽1]**
+
+&#10230;
+
+<br>
+
+
+**63. Transition probabilities ― The transition probability T(s,a,s′) specifies the probability of going to state s′ after action a is taken in state s. Each s′↦T(s,a,s′) is a probability distribution, which means that:**
+
+&#10230;
+
+<br>
+
+
+**64. states**
+
+&#10230;
+
+<br>
+
+
+**65. Policy ― A policy π is a function that maps each state s to an action a, i.e.**
+
+&#10230;
+
+<br>
+
+
+**66. Utility ― The utility of a path (s0,...,sk) is the discounted sum of the rewards on that path. In other words,**
+
+&#10230;
+
+<br>
+
+
+**67. The figure above is an illustration of the case k=4.**
+
+&#10230;
+
+<br>
+
+
+**68. Q-value ― The Q-value of a policy π at state s with action a, also noted Qπ(s,a), is the expected utility from state s after taking action a and then following policy π. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**69. Value of a policy ― The value of a policy π from state s, also noted Vπ(s), is the expected utility by following policy π from state s over random paths. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**70. Remark: Vπ(s) is equal to 0 if s is an end state.**
+
+&#10230;
+
+<br>
+
+
+**71. Applications**
+
+&#10230;
+
+<br>
+
+
+**72. [Policy evaluation ― Given a policy π, policy evaluation is an iterative algorithm that aims at estimating Vπ. It is done as follows:, Initialization: for all states s, we have:, Iteration: for t from 1 to TPE, we have, with]**
+
+&#10230;
+
+<br>
+
+
+**73. Remark: by noting S the number of states, A the number of actions per state, S′ the number of successors and T the number of iterations, then the time complexity is of O(TPESS′).**
+
+&#10230;
+
+<br>
+
+
+**74. Optimal Q-value ― The optimal Q-value Qopt(s,a) of state s with action a is defined to be the maximum Q-value attained by any policy starting. It is computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**75. Optimal value ― The optimal value Vopt(s) of state s is defined as being the maximum value attained by any policy. It is computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**76. actions**
+
+&#10230;
+
+<br>
+
+
+**77. Optimal policy ― The optimal policy πopt is defined as being the policy that leads to the optimal values. It is defined by:**
+
+&#10230;
+
+<br>
+
+
+**78. [Value iteration ― Value iteration is an algorithm that finds the optimal value Vopt as well as the optimal policy πopt. It is done as follows:, Initialization: for all states s, we have:, Iteration: for t from 1 to TVI, we have:, with]**
+
+&#10230;
+
+<br>
+
+
+**79. Remark: if we have either γ<1 or the MDP graph being acyclic, then the value iteration algorithm is guaranteed to converge to the correct answer.**
+
+&#10230;
+
+<br>
+
+
+**80. When unknown transitions and rewards**
+
+&#10230;
+
+<br>
+
+
+**81. Now, let's assume that the transition probabilities and the rewards are unknown.**
+
+&#10230;
+
+<br>
+
+
+**82. Model-based Monte Carlo ― The model-based Monte Carlo method aims at estimating T(s,a,s′) and Reward(s,a,s′) using Monte Carlo simulation with: **
+
+&#10230;
+
+<br>
+
+
+**83. [# times (s,a,s′) occurs, and]**
+
+&#10230;
+
+<br>
+
+
+**84. These estimations will be then used to deduce Q-values, including Qπ and Qopt.**
+
+&#10230;
+
+<br>
+
+
+**85. Remark: model-based Monte Carlo is said to be off-policy, because the estimation does not depend on the exact policy.**
+
+&#10230;
+
+<br>
+
+
+**86. Model-free Monte Carlo ― The model-free Monte Carlo method aims at directly estimating Qπ, as follows:**
+
+&#10230;
+
+<br>
+
+
+**87. Qπ(s,a)=average of ut where st−1=s,at=a**
+
+&#10230;
+
+<br>
+
+
+**88. where ut denotes the utility starting at step t of a given episode.**
+
+&#10230;
+
+<br>
+
+
+**89. Remark: model-free Monte Carlo is said to be on-policy, because the estimated value is dependent on the policy π used to generate the data.**
+
+&#10230;
+
+<br>
+
+
+**90. Equivalent formulation - By introducing the constant η=11+(#updates to (s,a)) and for each (s,a,u) of the training set, the update rule of model-free Monte Carlo has a convex combination formulation:**
+
+&#10230;
+
+<br>
+
+
+**91. as well as a stochastic gradient formulation:**
+
+&#10230;
+
+<br>
+
+
+**92. SARSA ― State-action-reward-state-action (SARSA) is a boostrapping method estimating Qπ by using both raw data and estimates as part of the update rule. For each (s,a,r,s′,a′), we have:**
+
+&#10230;
+
+<br>
+
+
+**93. Remark: the SARSA estimate is updated on the fly as opposed to the model-free Monte Carlo one where the estimate can only be updated at the end of the episode.**
+
+&#10230;
+
+<br>
+
+
+**94. Q-learning ― Q-learning is an off-policy algorithm that produces an estimate for Qopt. On each (s,a,r,s′,a′), we have:**
+
+&#10230;
+
+<br>
+
+
+**95. Epsilon-greedy ― The epsilon-greedy policy is an algorithm that balances exploration with probability ϵ and exploitation with probability 1−ϵ. For a given state s, the policy πact is computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**96. [with probability, random from Actions(s)]**
+
+&#10230;
+
+<br>
+
+
+**97. Game playing**
+
+&#10230;
+
+<br>
+
+
+**98. In games (e.g. chess, backgammon, Go), other agents are present and need to be taken into account when constructing our policy.**
+
+&#10230;
+
+<br>
+
+
+**99. Game tree ― A game tree is a tree that describes the possibilities of a game. In particular, each node is a decision point for a player and each root-to-leaf path is a possible outcome of the game.**
+
+&#10230;
+
+<br>
+
+
+**100. [Two-player zero-sum game ― It is a game where each state is fully observed and such that players take turns. It is defined with:, a starting state sstart, possible actions Actions(s) from state s, successors Succ(s,a) from states s with actions a, whether an end state was reached IsEnd(s), the agent's utility Utility(s) at end state s, the player Player(s) who controls state s]**
+
+&#10230;
+
+<br>
+
+
+**101. Remark: we will assume that the utility of the agent has the opposite sign of the one of the opponent.**
+
+&#10230;
+
+<br>
+
+
+**102. [Types of policies ― There are two types of policies:, Deterministic policies, noted πp(s), which are actions that player p takes in state s., Stochastic policies, noted πp(s,a)∈[0,1], which are probabilities that player p takes action a in state s.]**
+
+&#10230;
+
+<br>
+
+
+**103. Expectimax ― For a given state s, the expectimax value Vexptmax(s) is the maximum expected utility of any agent policy when playing with respect to a fixed and known opponent policy πopp. It is computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**104. Remark: expectimax is the analog of value iteration for MDPs.**
+
+&#10230;
+
+<br>
+
+
+**105. Minimax ― The goal of minimax policies is to find an optimal policy against an adversary by assuming the worst case, i.e. that the opponent is doing everything to minimize the agent's utility. It is done as follows:**
+
+&#10230;
+
+<br>
+
+
+**106. Remark: we can extract πmax and πmin from the minimax value Vminimax.**
+
+&#10230;
+
+<br>
+
+
+**107. Minimax properties ― By noting V the value function, there are 3 properties around minimax to have in mind:**
+
+&#10230;
+
+<br>
+
+
+**108. Property 1: if the agent were to change its policy to any πagent, then the agent would be no better off.**
+
+&#10230;
+
+<br>
+
+
+**109. Property 2: if the opponent changes its policy from πmin to πopp, then he will be no better off.**
+
+&#10230;
+
+<br>
+
+
+**110. Property 3: if the opponent is known to be not playing the adversarial policy, then the minimax policy might not be optimal for the agent.**
+
+&#10230;
+
+<br>
+
+
+**111. In the end, we have the following relationship:**
+
+&#10230;
+
+<br>
+
+
+**112. Speeding up minimax**
+
+&#10230;
+
+<br>
+
+
+**113. Evaluation function ― An evaluation function is a domain-specific and approximate estimate of the value Vminimax(s). It is noted Eval(s).**
+
+&#10230;
+
+<br>
+
+
+**114. Remark: FutureCost(s) is an analogy for search problems.**
+
+&#10230;
+
+<br>
+
+
+**115. Alpha-beta pruning ― Alpha-beta pruning is a domain-general exact method optimizing the minimax algorithm by avoiding the unnecessary exploration of parts of the game tree. To do so, each player keeps track of the best value they can hope for (stored in α for the maximizing player and in β for the minimizing player). At a given step, the condition β<α means that the optimal path is not going to be in the current branch as the earlier player had a better option at their disposal.**
+
+&#10230;
+
+<br>
+
+
+**116. TD learning ― Temporal difference (TD) learning is used when we don't know the transitions/rewards. The value is based on exploration policy. To be able to use it, we need to know rules of the game Succ(s,a). For each (s,a,r,s′), the update is done as follows:**
+
+&#10230;
+
+<br>
+
+
+**117. Simultaneous games**
+
+&#10230;
+
+<br>
+
+
+**118. This is the contrary of turn-based games, where there is no ordering on the player's moves.**
+
+&#10230;
+
+<br>
+
+
+**119. Single-move simultaneous game ― Let there be two players A and B, with given possible actions. We note V(a,b) to be A's utility if A chooses action a, B chooses action b. V is called the payoff matrix.**
+
+&#10230;
+
+<br>
+
+
+**120. [Strategies ― There are two main types of strategies:, A pure strategy is a single action:, A mixed strategy is a probability distribution over actions:]**
+
+&#10230;
+
+<br>
+
+
+**121. Game evaluation ― The value of the game V(πA,πB) when player A follows πA and player B follows πB is such that:**
+
+&#10230;
+
+<br>
+
+
+**122. Minimax theorem ― By noting πA,πB ranging over mixed strategies, for every simultaneous two-player zero-sum game with a finite number of actions, we have:**
+
+&#10230;
+
+<br>
+
+
+**123. Non-zero-sum games**
+
+&#10230;
+
+<br>
+
+
+**124. Payoff matrix ― We define Vp(πA,πB) to be the utility for player p.**
+
+&#10230;
+
+<br>
+
+
+**125. Nash equilibrium ― A Nash equilibrium is (π∗A,π∗B) such that no player has an incentive to change its strategy. We have:**
+
+&#10230;
+
+<br>
+
+
+**126. and**
+
+&#10230;
+
+<br>
+
+
+**127. Remark: in any finite-player game with finite number of actions, there exists at least one Nash equilibrium.**
+
+&#10230;
+
+<br>
+
+
+**128. [Tree search, Backtracking search, Breadth-first search, Depth-first search, Iterative deepening]**
+
+&#10230;
+
+<br>
+
+
+**129. [Graph search, Dynamic programming, Uniform cost search]**
+
+&#10230;
+
+<br>
+
+
+**130. [Learning costs, Structured perceptron]**
+
+&#10230;
+
+<br>
+
+
+**131. [A star search, Heuristic function, Algorithm, Consistency, correctness, Admissibility, efficiency]**
+
+&#10230;
+
+<br>
+
+
+**132. [Relaxation, Relaxed search problem, Relaxed heuristic, Max heuristic]**
+
+&#10230;
+
+<br>
+
+
+**133. [Markov decision processes, Overview, Policy evaluation, Value iteration, Transitions, rewards]**
+
+&#10230;
+
+<br>
+
+
+**134. [Game playing, Expectimax, Minimax, Speeding up minimax, Simultaneous games, Non-zero-sum games]**
+
+&#10230;
+
+<br>
+
+
+**135. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**136. Original authors**
+
+&#10230;
+
+<br>
+
+
+**137. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**138. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**139. By X and Y**
+
+&#10230;
+
+<br>
+
+
+**140. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230;

From 6a2dfaf5c1fa3093686fb358668a17890a8ab87f Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 14:12:22 -0700
Subject: [PATCH 262/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index e7b28e777..e49852d18 100644
--- a/README.md
+++ b/README.md
@@ -35,7 +35,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 ## Progression 
 ### CS 221 (Artificial Intelligence)
-| |[Reflex models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-reflex-models.md)|States models|Variables models|Logic models|
+| |[Reflex models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-reflex-models.md)|[States models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-states-models.md)|Variables models|Logic models|
 |:---|:---:|:---:|:---:|:---:|
 |**Deutsch**|not started|not started|not started|not started|
 |**Español**|not started|not started|not started|not started|

From 379afba83303f658f7fab9b421267136d20c8a65 Mon Sep 17 00:00:00 2001
From: afshinea <afshine.amidi@centraliens.net>
Date: Sun, 30 Jun 2019 15:53:15 -0700
Subject: [PATCH 263/531] Add template

---
 template/cs-221-variables-models.md | 617 ++++++++++++++++++++++++++++
 1 file changed, 617 insertions(+)
 create mode 100644 template/cs-221-variables-models.md

diff --git a/template/cs-221-variables-models.md b/template/cs-221-variables-models.md
new file mode 100644
index 000000000..5a6e394ce
--- /dev/null
+++ b/template/cs-221-variables-models.md
@@ -0,0 +1,617 @@
+**Variables-based models translation**
+
+<br>
+
+**1. Variables-based models with CSP and Bayesian networks**
+
+&#10230;
+
+<br>
+
+
+**2. Constraint satisfaction problems**
+
+&#10230;
+
+<br>
+
+
+**3. In this section, our objective is to find maximum weight assignments of variable-based models. One advantage compared to states-based models is that these algorithms are more convenient to encode problem-specific constraints.**
+
+&#10230;
+
+<br>
+
+
+**4. Factor graphs**
+
+&#10230;
+
+<br>
+
+
+**5. Definition ― A factor graph, also referred to as a Markov random field, is a set of variables X=(X1,...,Xn) where Xi∈Domaini and m factors f1,...,fm with each fj(X)⩾0.**
+
+&#10230;
+
+<br>
+
+
+**6. Domain**
+
+&#10230;
+
+<br>
+
+
+**7. Scope and arity ― The scope of a factor fj is the set of variables it depends on. The size of this set is called the arity.**
+
+&#10230;
+
+<br>
+
+
+**8. Remark: factors of arity 1 and 2 are called unary and binary respectively.**
+
+&#10230;
+
+<br>
+
+
+**9. Assignment weight ― Each assignment x=(x1,...,xn) yields a weight Weight(x) defined as being the product of all factors fj applied to that assignment. Its expression is given by:**
+
+&#10230;
+
+<br>
+
+
+**10. Constraint satisfaction problem ― A constraint satisfaction problem (CSP) is a factor graph where all factors are binary; we call them to be constraints:**
+
+&#10230;
+
+<br>
+
+
+**11. Here, the constraint j with assignment x is said to be satisfied if and only if fj(x)=1.**
+
+&#10230;
+
+<br>
+
+
+**12. Consistent assignment ― An assignment x of a CSP is said to be consistent if and only if Weight(x)=1, i.e. all constraints are satisfied.**
+
+&#10230;
+
+<br>
+
+
+**13. Dynamic ordering**
+
+&#10230;
+
+<br>
+
+
+**14. Dependent factors ― The set of dependent factors of variable Xi with partial assignment x is called D(x,Xi), and denotes the set of factors that link Xi to already assigned variables.**
+
+&#10230;
+
+<br>
+
+
+**15. Backtracking search ― Backtracking search is an algorithm used to find maximum weight assignments of a factor graph. At each step, it chooses an unassigned variable and explores its values by recursion. Dynamic ordering (i.e. choice of variables and values) and lookahead (i.e. early elimination of inconsistent options) can be used to explore the graph more efficiently, although the worst-case runtime stays exponential: O(|Domain|n).**
+
+&#10230;
+
+<br>
+
+
+**16. [Forward checking ― It is a one-step lookahead heuristic that preemptively removes inconsistent values from the domains of neighboring variables. It has the following characteristics:, After assigning a variable Xi, it eliminates inconsistent values from the domains of all its neighbors., If any of these domains becomes empty, we stop the local backtracking search., If we un-assign a variable Xi, we have to restore the domain of its neighbors.]**
+
+&#10230;
+
+<br>
+
+
+**17. Most constrained variable ― It is a variable-level ordering heuristic that selects the next unassigned variable that has the fewest consistent values. This has the effect of making inconsistent assignments to fail earlier in the search, which enables more efficient pruning.**
+
+&#10230;
+
+<br>
+
+
+**18. Least constrained value ― It is a value-level ordering heuristic that assigns the next value that yields the highest number of consistent values of neighboring variables. Intuitively, this procedure chooses first the values that are most likely to work.**
+
+&#10230;
+
+<br>
+
+
+**19. Remark: in practice, this heuristic is useful when all factors are constraints.**
+
+&#10230;
+
+<br>
+
+
+**20. The example above is an illustration of the 3-color problem with backtracking search coupled with most constrained variable exploration and least constrained value heuristic, as well as forward checking at each step.**
+
+&#10230;
+
+<br>
+
+
+**21. [Arc consistency ― We say that arc consistency of variable Xl with respect to Xk is enforced when for each xl∈Domainl:, unary factors of Xl are non-zero, there exists at least one xk∈Domaink such that any factor between Xl and Xk is non-zero.]**
+
+&#10230;
+
+<br>
+
+
+**22. AC-3 ― The AC-3 algorithm is a multi-step lookahead heuristic that applies forward checking to all relevant variables. After a given assignment, it performs forward checking and then successively enforces arc consistency with respect to the neighbors of variables for which the domain change during the process.**
+
+&#10230;
+
+<br>
+
+
+**23. Remark: AC-3 can be implemented both iteratively and recursively.**
+
+&#10230;
+
+<br>
+
+
+**24. Approximate methods**
+
+&#10230;
+
+<br>
+
+
+**25. Beam search ― Beam search is an approximate algorithm that extends partial assignments of n variables of branching factor b=|Domain| by exploring the K top paths at each step. The beam size K∈{1,...,bn} controls the tradeoff between efficiency and accuracy. This algorithm has a time complexity of O(n⋅Kblog(Kb)).**
+
+&#10230;
+
+<br>
+
+
+**26. The example below illustrates a possible beam search of parameters K=2, b=3 and n=5.**
+
+&#10230;
+
+<br>
+
+
+**27. Remark: K=1 corresponds to greedy search whereas K→+∞ is equivalent to BFS tree search.**
+
+&#10230;
+
+<br>
+
+
+**28. Iterated conditional modes ― Iterated conditional modes (ICM) is an iterative approximate algorithm that modifies the assignment of a factor graph one variable at a time until convergence. At step i, we assign to Xi the value v that maximizes the product of all factors connected to that variable.**
+
+&#10230;
+
+<br>
+
+
+**29. Remark: ICM may get stuck in local minima.**
+
+&#10230;
+
+<br>
+
+
+**30. [Gibbs sampling ― Gibbs sampling is an iterative approximate method that modifies the assignment of a factor graph one variable at a time until convergence. At step i:, we assign to each element u∈Domaini a weight w(u) that is the product of all factors connected to that variable, we sample v from the probability distribution induced by w and assign it to Xi.]**
+
+&#10230;
+
+<br>
+
+
+**31. Remark: Gibbs sampling can be seen as the probabilistic counterpart of ICM. It has the advantage to be able to escape local minima in most cases.**
+
+&#10230;
+
+<br>
+
+
+**32. Factor graph transformations**
+
+&#10230;
+
+<br>
+
+
+**33. Independence ― Let A,B be a partitioning of the variables X. We say that A and B are independent if there are no edges between A and B and we write:**
+
+&#10230;
+
+<br>
+
+
+**34. Remark: independence is the key property that allows us to solve subproblems in parallel.**
+
+&#10230;
+
+<br>
+
+
+**35. Conditional independence ― We say that A and B are conditionally independent given C if conditioning on C produces a graph in which A and B are independent. In this case, it is written:**
+
+&#10230;
+
+<br>
+
+
+**36. [Conditioning ― Conditioning is a transformation aiming at making variables independent that breaks up a factor graph into smaller pieces that can be solved in parallel and can use backtracking. In order to condition on a variable Xi=v, we do as follows:, Consider all factors f1,...,fk that depend on Xi, Remove Xi and f1,...,fk, Add gj(x) for j∈{1,...,k} defined as:]**
+
+&#10230;
+
+<br>
+
+
+**37. Markov blanket ― Let A⊆X be a subset of variables. We define MarkovBlanket(A) to be the neighbors of A that are not in A.**
+
+&#10230;
+
+<br>
+
+
+**38. Proposition ― Let C=MarkovBlanket(A) and B=X∖(A∪C). Then we have:**
+
+&#10230;
+
+<br>
+
+
+**39. [Elimination ― Elimination is a factor graph transformation that removes Xi from the graph and solves a small subproblem conditioned on its Markov blanket as follows:, Consider all factors fi,1,...,fi,k that depend on Xi, Remove Xi
+and fi,1,...,fi,k, Add fnew,i(x) defined as:]**
+
+&#10230;
+
+<br>
+
+
+**40. Treewidth ― The treewidth of a factor graph is the maximum arity of any factor created by variable elimination with the best variable ordering. In other words,**
+
+&#10230;
+
+<br>
+
+
+**41. The example below illustrates the case of a factor graph of treewidth 3.**
+
+&#10230;
+
+<br>
+
+
+**42. Remark: finding the best variable ordering is a NP-hard problem.**
+
+&#10230;
+
+<br>
+
+
+**43. Bayesian networks**
+
+&#10230;
+
+<br>
+
+
+**44. In this section, our goal will be to compute conditional probabilities. What is the probability of a query given evidence?**
+
+&#10230;
+
+<br>
+
+
+**45. Introduction**
+
+&#10230;
+
+<br>
+
+
+**46. Explaining away ― Suppose causes C1 and C2 influence an effect E. Conditioning on the effect E and on one of the causes (say C1) changes the probability of the other cause (say C2). In this case, we say that C1 has explained away C2.**
+
+&#10230;
+
+<br>
+
+
+**47. Directed acyclic graph ― A directed acyclic graph (DAG) is a finite directed graph with no directed cycles.**
+
+&#10230;
+
+<br>
+
+
+**48. Bayesian network ― A Bayesian network is a directed acyclic graph (DAG) that specifies a joint distribution over random variables X=(X1,...,Xn) as a product of local conditional distributions, one for each node:**
+
+&#10230;
+
+<br>
+
+
+**49. Remark: Bayesian networks are factor graphs imbued with the language of probability.**
+
+&#10230;
+
+<br>
+
+
+**50. Locally normalized ― For each xParents(i), all factors are local conditional distributions. Hence they have to satisfy:**
+
+&#10230;
+
+<br>
+
+
+**51. As a result, sub-Bayesian networks and conditional distributions are consistent.**
+
+&#10230;
+
+<br>
+
+
+**52. Remark: local conditional distributions are the true conditional distributions.**
+
+&#10230;
+
+<br>
+
+
+**53. Marginalization ― The marginalization of a leaf node yields a Bayesian network without that node.**
+
+&#10230;
+
+<br>
+
+
+**54. Probabilistic programs**
+
+&#10230;
+
+<br>
+
+
+**55. Concept ― A probabilistic program randomizes variables assignment. That way, we can write down complex Bayesian networks that generate assignments without us having to explicitly specify associated probabilities.**
+
+&#10230;
+
+<br>
+
+
+**56. Remark: examples of probabilistic programs include Hidden Markov model (HMM), factorial HMM, naive Bayes, latent Dirichlet allocation, diseases and symptoms and stochastic block models.**
+
+&#10230;
+
+<br>
+
+
+**57. Summary ― The table below summarizes the common probabilistic programs as well as their applications:**
+
+&#10230;
+
+<br>
+
+
+**58. [Program, Algorithm, Illustration, Example]**
+
+&#10230;
+
+<br>
+
+
+**59. [Markov Model, Hidden Markov Model (HMM), Factorial HMM, Naive Bayes, Latent Dirichlet Allocation (LDA)]**
+
+&#10230;
+
+<br>
+
+
+**60. [Generate, distribution]**
+
+&#10230;
+
+<br>
+
+
+**61. [Language modeling, Object tracking, Multiple object tracking, Document classification, Topic modeling]**
+
+&#10230;
+
+<br>
+
+
+**62. Inference**
+
+&#10230;
+
+<br>
+
+
+**63. [General probabilistic inference strategy ― The strategy to compute the probability P(Q|E=e) of query Q given evidence E=e is as follows:, Step 1: Remove variables that are not ancestors of the query Q or the evidence E by marginalization, Step 2: Convert Bayesian network to factor graph, Step 3: Condition on the evidence E=e, Step 4: Remove nodes disconnected from the query Q by marginalization, Step 5: Run a probabilistic inference algorithm (manual, variable elimination, Gibbs sampling, particle filtering)]**
+
+&#10230;
+
+<br>
+
+
+**64. Forward-backward algorithm ― This algorithm computes the exact value of P(H=hk|E=e) (smoothing query) for any k∈{1,...,L} in the case of an HMM of size L. To do so, we proceed in 3 steps:**
+
+&#10230;
+
+<br>
+
+
+**65. Step 1: for ..., compute ...**
+
+&#10230;
+
+<br>
+
+
+**66. with the convention F0=BL+1=1. From this procedure and these notations, we get that**
+
+&#10230;
+
+<br>
+
+
+**67. Remark: this algorithm interprets each assignment to be a path where each edge hi−1→hi is of weight p(hi|hi−1)p(ei|hi).**
+
+&#10230;
+
+<br>
+
+
+**68. [Gibbs sampling ― This algorithm is an iterative approximate method that uses a small set of assignments (particles) to represent a large probability distribution. From a random assignment x, Gibbs sampling performs the following steps for i∈{1,...,n} until convergence:, For all u∈Domaini, compute the weight w(u) of assignment x where Xi=u, Sample v from the probability distribution induced by w: v∼P(Xi=v|X−i=x−i), Set Xi=v]**
+
+&#10230;
+
+<br>
+
+
+**69. Remark: X−i denotes X∖{Xi} and x−i represents the corresponding assignment.**
+
+&#10230;
+
+<br>
+
+
+**70. [Particle filtering ― This algorithm approximates the posterior density of state variables given the evidence of observation variables by keeping track of K particles at a time. Starting from a set of particles C of size K, we run the following 3 steps iteratively:, Step 1: proposal - For each old particle xt−1∈C, sample x from the transition probability distribution p(x|xt−1) and add x to a set C′., Step 2: weighting - Weigh each x of the set C′ by w(x)=p(et|x), where et is the evidence observed at time t., Step 3: resampling - Sample K elements from the set C′ using the probability distribution induced by w and store them in C: these are the current particles xt.]**
+
+&#10230;
+
+<br>
+
+
+**71. Remark: a more expensive version of this algorithm also keeps track of past particles in the proposal step.**
+
+&#10230;
+
+<br>
+
+
+**72. Maximum likelihood ― If we don't know the local conditional distributions, we can learn them using maximum likelihood.**
+
+&#10230;
+
+<br>
+
+
+**73. Laplace smoothing ― For each distribution d and partial assignment (xParents(i),xi), add λ to countd(xParents(i),xi), then normalize to get probability estimates.**
+
+&#10230;
+
+<br>
+
+
+**74. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230;
+
+<br>
+
+
+**75. [E-step: Evaluate the posterior probability q(h) that each data point e came from a particular cluster h as follows:, M-step: Use the posterior probabilities q(h) as cluster specific weights on data points e to determine θ through maximum likelihood.]**
+
+&#10230;
+
+<br>
+
+
+**76. [Factor graphs, Arity, Assignment weight, Constraint satisfaction problem, Consistent assignment]**
+
+&#10230;
+
+<br>
+
+
+**77. [Dynamic ordering, Dependent factors, Backtracking search, Forward checking, Most constrained variable, Least constrained value]**
+
+&#10230;
+
+<br>
+
+
+**78. [Approximate methods, Beam search, Iterated conditional modes, Gibbs sampling]**
+
+&#10230;
+
+<br>
+
+
+**79. [Factor graph transformations, Conditioning, Elimination]**
+
+&#10230;
+
+<br>
+
+
+**80. [Bayesian networks, Definition, Locally normalized, Marginalization]**
+
+&#10230;
+
+<br>
+
+
+**81. [Probabilistic program, Concept, Summary]**
+
+&#10230;
+
+<br>
+
+
+**82. [Inference, Forward-backward algorithm, Gibbs sampling, Laplace smoothing]**
+
+&#10230;
+
+<br>
+
+
+**83. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**84. Original authors**
+
+&#10230;
+
+<br>
+
+
+**85. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**86. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**87. By X and Y**
+
+&#10230;
+
+<br>
+
+
+**88. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230;

From 0438592d8c0fa5e9373231b45653ed27540ed3b6 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 15:57:42 -0700
Subject: [PATCH 264/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index e49852d18..8120a2eff 100644
--- a/README.md
+++ b/README.md
@@ -35,7 +35,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 
 ## Progression 
 ### CS 221 (Artificial Intelligence)
-| |[Reflex models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-reflex-models.md)|[States models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-states-models.md)|Variables models|Logic models|
+| |[Reflex models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-reflex-models.md)|[States models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-states-models.md)|[Variables models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-variables-models.md)|Logic models|
 |:---|:---:|:---:|:---:|:---:|
 |**Deutsch**|not started|not started|not started|not started|
 |**Español**|not started|not started|not started|not started|

From 1de58f22ee83ca75fd34ae2233680552be26d01a Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 15:58:46 -0700
Subject: [PATCH 265/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 8120a2eff..8e8411745 100644
--- a/README.md
+++ b/README.md
@@ -51,7 +51,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**中文**|not started|not started|not started|not started|
 
 ### CS 229 (Machine Learning)
-| |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)|
+| |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[Tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |**العَرَبِيَّة**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|
 |**Català**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|

From f9c4512a0a80be0de74d879710733f666a609134 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 15:59:14 -0700
Subject: [PATCH 266/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 8e8411745..8120a2eff 100644
--- a/README.md
+++ b/README.md
@@ -51,7 +51,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**中文**|not started|not started|not started|not started|
 
 ### CS 229 (Machine Learning)
-| |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[Tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)|
+| |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
 |**العَرَبِيَّة**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|
 |**Català**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|

From 89650ec92a79a3dd52bc36c2a5838c0985abbd8a Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 16:00:31 -0700
Subject: [PATCH 267/531] Update cs-221-reflex-models.md

---
 template/cs-221-reflex-models.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/template/cs-221-reflex-models.md b/template/cs-221-reflex-models.md
index a6b338419..c58038fe9 100644
--- a/template/cs-221-reflex-models.md
+++ b/template/cs-221-reflex-models.md
@@ -2,7 +2,7 @@
 
 <br>
 
-**1. Reflex-based models with Machine Learning**
+**1. Reflex-based models with Machine Learning** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-reflex-models)
 
 &#10230;
 

From e5480a859fc2f85512da1bd60c7aef3cd19303f8 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 16:00:44 -0700
Subject: [PATCH 268/531] Update cs-221-reflex-models.md

---
 template/cs-221-reflex-models.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/template/cs-221-reflex-models.md b/template/cs-221-reflex-models.md
index c58038fe9..f64a380b0 100644
--- a/template/cs-221-reflex-models.md
+++ b/template/cs-221-reflex-models.md
@@ -1,8 +1,8 @@
-**Reflex-based models translation**
+**Reflex-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-reflex-models)
 
 <br>
 
-**1. Reflex-based models with Machine Learning** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-reflex-models)
+**1. Reflex-based models with Machine Learning**
 
 &#10230;
 

From 9100657982c922d59f0459079e5041ca57d41a01 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 16:01:21 -0700
Subject: [PATCH 269/531] Update cs-221-states-models.md

---
 template/cs-221-states-models.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/template/cs-221-states-models.md b/template/cs-221-states-models.md
index a7ea257dc..a945c8632 100644
--- a/template/cs-221-states-models.md
+++ b/template/cs-221-states-models.md
@@ -1,4 +1,4 @@
-**States-based models translation**
+**States-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-states-models)
 
 <br>
 

From 4ba5edac84a74e46ce5e63ee272d79de8937e4cd Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 16:01:41 -0700
Subject: [PATCH 270/531] Update cs-221-variables-models.md

---
 template/cs-221-variables-models.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/template/cs-221-variables-models.md b/template/cs-221-variables-models.md
index 5a6e394ce..f55ef0270 100644
--- a/template/cs-221-variables-models.md
+++ b/template/cs-221-variables-models.md
@@ -1,4 +1,4 @@
-**Variables-based models translation**
+**Variables-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-variables-models)
 
 <br>
 

From 3f671c12cedf173b745102618905e619d55486ba Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 16:02:29 -0700
Subject: [PATCH 271/531] Update cs-229-deep-learning.md

---
 template/cs-229-deep-learning.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/template/cs-229-deep-learning.md b/template/cs-229-deep-learning.md
index a5aa3756c..9942428f7 100644
--- a/template/cs-229-deep-learning.md
+++ b/template/cs-229-deep-learning.md
@@ -1,3 +1,5 @@
+**Deep learning translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-deep-learning)
+
 **1. Deep Learning cheatsheet**
 
 &#10230;

From e19d99358f4736f633fffa9db050ba48404cee42 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 16:02:58 -0700
Subject: [PATCH 272/531] Update cs-229-deep-learning.md

---
 template/cs-229-deep-learning.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/template/cs-229-deep-learning.md b/template/cs-229-deep-learning.md
index 9942428f7..a7770a048 100644
--- a/template/cs-229-deep-learning.md
+++ b/template/cs-229-deep-learning.md
@@ -1,5 +1,7 @@
 **Deep learning translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-deep-learning)
 
+<br>
+
 **1. Deep Learning cheatsheet**
 
 &#10230;

From 752849ef88fe483d3388956acc596c64a8cd9748 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 16:03:29 -0700
Subject: [PATCH 273/531] Update cs-229-linear-algebra.md

---
 template/cs-229-linear-algebra.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/template/cs-229-linear-algebra.md b/template/cs-229-linear-algebra.md
index a6b440d1e..dced85397 100644
--- a/template/cs-229-linear-algebra.md
+++ b/template/cs-229-linear-algebra.md
@@ -1,3 +1,7 @@
+**Linear Algebra and Calculus translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus)
+
+<br>
+
 **1. Linear Algebra and Calculus refresher**
 
 &#10230;

From f639ac1cd55d56f6ba5a697ea3b5ef9fc17f43ee Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 16:04:39 -0700
Subject: [PATCH 274/531] Update cs-229-machine-learning-tips-and-tricks.md

---
 template/cs-229-machine-learning-tips-and-tricks.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/template/cs-229-machine-learning-tips-and-tricks.md b/template/cs-229-machine-learning-tips-and-tricks.md
index 9712297b8..edba03259 100644
--- a/template/cs-229-machine-learning-tips-and-tricks.md
+++ b/template/cs-229-machine-learning-tips-and-tricks.md
@@ -1,3 +1,7 @@
+**Machine Learning tips and tricks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks)
+
+<br>
+
 **1. Machine Learning tips and tricks cheatsheet**
 
 &#10230;

From f9ca36747508cc58cbf80fe3f82bb8faf3249a75 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 16:05:18 -0700
Subject: [PATCH 275/531] Update cs-229-supervised-learning.md

---
 template/cs-229-supervised-learning.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/template/cs-229-supervised-learning.md b/template/cs-229-supervised-learning.md
index a6b19ea1c..d82685e6e 100644
--- a/template/cs-229-supervised-learning.md
+++ b/template/cs-229-supervised-learning.md
@@ -1,3 +1,7 @@
+**Supervised Learning translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-supervised-learning)
+
+<br>
+
 **1. Supervised Learning cheatsheet**
 
 &#10230;

From a528691ba9471d700779325df2a8532d6f788de5 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 16:05:42 -0700
Subject: [PATCH 276/531] Update cs-229-unsupervised-learning.md

---
 template/cs-229-unsupervised-learning.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/template/cs-229-unsupervised-learning.md b/template/cs-229-unsupervised-learning.md
index 6daab3b21..18fafef8c 100644
--- a/template/cs-229-unsupervised-learning.md
+++ b/template/cs-229-unsupervised-learning.md
@@ -1,3 +1,7 @@
+**Unsupervised Learning translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-unsupervised-learning)
+
+<br>
+
 **1. Unsupervised Learning cheatsheet**
 
 &#10230;

From 2abf50b1891e286cea0fd4442b5ad91f6a59ac50 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 16:06:34 -0700
Subject: [PATCH 277/531] Update cs-229-probability.md

---
 template/cs-229-probability.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/template/cs-229-probability.md b/template/cs-229-probability.md
index 5c9b34656..b8be13004 100644
--- a/template/cs-229-probability.md
+++ b/template/cs-229-probability.md
@@ -1,3 +1,7 @@
+**Probabilities and Statistics translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/refresher-probabilities-statistics)
+
+<br>
+
 **1. Probabilities and Statistics refresher**
 
 &#10230;

From c88988661858c32f8fc52472583f54bd22d64fe3 Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 16:07:03 -0700
Subject: [PATCH 278/531] Update cs-230-convolutional-neural-networks.md

---
 template/cs-230-convolutional-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/template/cs-230-convolutional-neural-networks.md b/template/cs-230-convolutional-neural-networks.md
index 1b1283628..94006a675 100644
--- a/template/cs-230-convolutional-neural-networks.md
+++ b/template/cs-230-convolutional-neural-networks.md
@@ -1,4 +1,4 @@
-**Convolutional Neural Networks translation**
+**Convolutional Neural Networks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks)
 
 <br>
 

From bf3f0e874484ad18e2efde3007b12e2fdbdeea1f Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 16:07:23 -0700
Subject: [PATCH 279/531] Update cs-230-recurrent-neural-networks.md

---
 template/cs-230-recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/template/cs-230-recurrent-neural-networks.md b/template/cs-230-recurrent-neural-networks.md
index 191e400a1..bd3c638bc 100644
--- a/template/cs-230-recurrent-neural-networks.md
+++ b/template/cs-230-recurrent-neural-networks.md
@@ -1,4 +1,4 @@
-**Recurrent Neural Networks translation**
+**Recurrent Neural Networks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks)
 
 <br>
 

From 55774e0c351c7995cdadff185386efd5ea1ea32e Mon Sep 17 00:00:00 2001
From: Afshine Amidi <26204670+afshinea@users.noreply.github.com>
Date: Sun, 30 Jun 2019 16:08:27 -0700
Subject: [PATCH 280/531] Update cs-230-deep-learning-tips-and-tricks.md

---
 template/cs-230-deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/template/cs-230-deep-learning-tips-and-tricks.md b/template/cs-230-deep-learning-tips-and-tricks.md
index 347234ec2..75127ac5d 100644
--- a/template/cs-230-deep-learning-tips-and-tricks.md
+++ b/template/cs-230-deep-learning-tips-and-tricks.md
@@ -1,4 +1,4 @@
-**Deep Learning Tips and Tricks translation**
+**Deep Learning Tips and Tricks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-deep-learning-tips-and-tricks)
 
 <br>
 

From 9e1ee8cd59da1c7cd568b32cf13fe0d882bc8e27 Mon Sep 17 00:00:00 2001
From: afshinea <afshine.amidi@centraliens.net>
Date: Sun, 30 Jun 2019 19:18:35 -0700
Subject: [PATCH 281/531] Update template for AI

---
 README.md                       |   4 +-
 template/cs-221-logic-models.md | 462 ++++++++++++++++++++++++++++++++
 2 files changed, 464 insertions(+), 2 deletions(-)
 create mode 100644 template/cs-221-logic-models.md

diff --git a/README.md b/README.md
index 8120a2eff..ccc114c7a 100644
--- a/README.md
+++ b/README.md
@@ -33,9 +33,9 @@ The translation process of each cheatsheet contains two steps:
 ### Important note
 Please make sure to propose the translation of **only one** cheatsheet per pull request -- it simplifies a lot the review process.
 
-## Progression 
+## Progression
 ### CS 221 (Artificial Intelligence)
-| |[Reflex models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-reflex-models.md)|[States models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-states-models.md)|[Variables models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-variables-models.md)|Logic models|
+| |[Reflex models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-reflex-models.md)|[States models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-states-models.md)|[Variables models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-variables-models.md)|[Logic models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-logic-models.md)|
 |:---|:---:|:---:|:---:|:---:|
 |**Deutsch**|not started|not started|not started|not started|
 |**Español**|not started|not started|not started|not started|
diff --git a/template/cs-221-logic-models.md b/template/cs-221-logic-models.md
new file mode 100644
index 000000000..844191727
--- /dev/null
+++ b/template/cs-221-logic-models.md
@@ -0,0 +1,462 @@
+**Logic-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-logic-models)
+
+<br>
+
+**1. Logic-based models with propositional and first-order logic **
+
+&#10230;
+
+<br>
+
+
+**2. Basics**
+
+&#10230;
+
+<br>
+
+
+**3. Syntax of propositional logic ― By noting f,g formulas, and ¬,∧,∨,→,↔ connectives, we can write the following logical expressions:**
+
+&#10230;
+
+<br>
+
+
+**4. [Name, Symbol, Meaning, Illustration]**
+
+&#10230;
+
+<br>
+
+
+**5. [Affirmation, Negation, Conjunction, Disjunction, Implication, Biconditional]**
+
+&#10230;
+
+<br>
+
+
+**6. [not f, f and g, f or g, if f then g, f, that is to say g]**
+
+&#10230;
+
+<br>
+
+
+**7. Remark: formulas can be built up recursively out of these connectives.**
+
+&#10230;
+
+<br>
+
+
+**8. Model ― A model w denotes an assignment of binary weights to propositional symbols.**
+
+&#10230;
+
+<br>
+
+
+**9. Example: the set of truth values w={A:0,B:1,C:0} is one possible model to the propositional symbols A, B and C.**
+
+&#10230;
+
+<br>
+
+
+**10. Interpretation function ― The interpretation function I(f,w) outputs whether model w satisfies formula f:**
+
+&#10230;
+
+<br>
+
+
+**11. Set of models ― M(f) denotes the set of models w that satisfy formula f. Mathematically speaking, we define it as follows:**
+
+&#10230;
+
+<br>
+
+
+**12. Knowledge base**
+
+&#10230;
+
+<br>
+
+
+**13. Definition ― The knowledge base KB is the conjunction of all formulas that have been considered so far. The set of models of the knowledge base is the intersection of the set of models that satisfy each formula. In other words:**
+
+&#10230;
+
+<br>
+
+
+**14. Probabilistic interpretation ― The probability that query f is evaluated to 1 can be seen as the proportion of models w of the knowledge base KB that satisfy f, i.e.:**
+
+&#10230;
+
+<br>
+
+
+**15. Satisfiability ― The knowledge base KB is said to be satisfiable if at least one model w satisfies all its constraints. In other words:**
+
+&#10230;
+
+<br>
+
+
+**16. satisfiable**
+
+&#10230;
+
+<br>
+
+
+**17. Remark: M(KB) denotes the set of models compatible with all the constraints of the knowledge base.**
+
+&#10230;
+
+<br>
+
+
+**18. Relation between formulas and knowledge base - We define the following properties between the knowledge base KB and a new formula f:**
+
+&#10230;
+
+<br>
+
+
+**19. [Name, Mathematical formulation, Illustration, Notes]**
+
+&#10230;
+
+<br>
+
+
+**20. [KB entails f, KB contradicts f, f contingent to KB]**
+
+&#10230;
+
+<br>
+
+
+**21. [f does not bring any new information, Also written KB⊨f, No model satisfies the constraints after adding f, Equivalent to KB⊨¬f, f does not contradict KB, f adds a non-trivial amount of information to KB]**
+
+&#10230;
+
+<br>
+
+
+**22. Model checking ― A model checking algorithm takes as input a knowledge base KB and outputs whether it is satisfiable or not.**
+
+&#10230;
+
+<br>
+
+
+**23. Remark: popular model checking algorithms include DPLL and WalkSat.**
+
+&#10230;
+
+<br>
+
+
+**24. Inference rule ― An inference rule of premises f1,...,fk and conclusion g is written:**
+
+&#10230;
+
+<br>
+
+
+**25. Forward inference algorithm ― From a set of inference rules Rules, this algorithm goes through all possible f1,...,fk and adds g to the knowledge base KB if a matching rule exists. This process is repeated until no more additions can be made to KB.**
+
+&#10230;
+
+<br>
+
+
+**26. Derivation ― We say that KB derives f (written KB⊢f) with rules Rules if f already is in KB or gets added during the forward inference algorithm using the set of rules Rules.**
+
+&#10230;
+
+<br>
+
+
+**27. Properties of inference rules ― A set of inference rules Rules can have the following properties:**
+
+&#10230;
+
+<br>
+
+
+**28. [Name, Mathematical formulation, Notes]**
+
+&#10230;
+
+<br>
+
+
+**29. [Soundness, Completeness]**
+
+&#10230;
+
+<br>
+
+
+**30. [Inferred formulas are entailed by KB, Can be checked one rule at a time, "Nothing but the truth", Formulas entailing KB are either already in the knowledge base or inferred from it, "The whole truth"]**
+
+&#10230;
+
+<br>
+
+
+**31. Propositional logic**
+
+&#10230;
+
+<br>
+
+
+**32. In this section, we will go through logic-based models that use logical formulas and inference rules. The idea here is to balance expressivity and computational efficiency.**
+
+&#10230;
+
+<br>
+
+
+**33. Horn clause ― By noting p1,...,pk and q propositional symbols, a Horn clause has the form:**
+
+&#10230;
+
+<br>
+
+
+**34. Remark: when q=false, it is called a "goal clause", otherwise we denote it as a "definite clause".**
+
+&#10230;
+
+<br>
+
+
+**35. Modus ponens ― For propositional symbols f1,...,fk and p, the modus ponens rule is written:**
+
+&#10230;
+
+<br>
+
+
+**36. Remark: it takes linear time to apply this rule, as each application generate a clause that contains a single propositional symbol.**
+
+&#10230;
+
+<br>
+
+
+**37. Completeness ― Modus ponens is complete with respect to Horn clauses if we suppose that KB contains only Horn clauses and p is an entailed propositional symbol. Applying modus ponens will then derive p.**
+
+&#10230;
+
+<br>
+
+
+**38. Conjunctive normal form ― A conjunctive normal form (CNF) formula is a conjunction of clauses, where each clause is a disjunction of atomic formulas.**
+
+&#10230;
+
+<br>
+
+
+**39. Remark: in other words, CNFs are ∧ of ∨.**
+
+&#10230;
+
+<br>
+
+
+**40. Equivalent representation ― Every formula in propositional logic can be written into an equivalent CNF formula. The table below presents general conversion properties:**
+
+&#10230;
+
+<br>
+
+
+**41. [Rule name, Initial, Converted, Eliminate, Distribute, over]**
+
+&#10230;
+
+<br>
+
+
+**42. Resolution rule ― For propositional symbols f1,...,fn, and g1,...,gm as well as p, the resolution rule is written:**
+
+&#10230;
+
+<br>
+
+
+**43. Remark: it can take exponential time to apply this rule, as each application generates a clause that has a subset of the propositional symbols.**
+
+&#10230;
+
+<br>
+
+
+**44. [Resolution-based inference ― The resolution-based inference algorithm follows the following steps:, Step 1: Convert all formulas into CNF, Step 2: Repeatedly apply resolution rule, Step 3: Return unsatisfiable if and only if False, is derived]**
+
+&#10230;
+
+<br>
+
+
+**45. First-order logic**
+
+&#10230;
+
+<br>
+
+
+**46. The idea here is to use variables to yield more compact knowledge representations.**
+
+&#10230;
+
+<br>
+
+
+**47. [Model ― A model w in first-order logic maps:, constant symbols to objects, predicate symbols to tuple of objects]**
+
+&#10230;
+
+<br>
+
+
+**48. Horn clause ― By noting x1,...,xn variables and a1,...,ak,b atomic formulas, the first-order logic version of a horn clause has the form:**
+
+&#10230;
+
+<br>
+
+
+**49. Substitution ― A substitution θ maps variables to terms and Subst[θ,f] denotes the result of substitution θ on f.**
+
+&#10230;
+
+<br>
+
+
+**50. Unification ― Unification takes two formulas f and g and returns the most general substitution θ that makes them equal:**
+
+&#10230;
+
+<br>
+
+
+**51. such that**
+
+&#10230;
+
+<br>
+
+
+**52. Note: Unify[f,g] returns Fail if no such θ exists.**
+
+&#10230;
+
+<br>
+
+
+**53. Modus ponens ― By noting x1,...,xn variables, a1,...,ak and a′1,...,a′k atomic formulas and by calling θ=Unify(a′1∧...∧a′k,a1∧...∧ak) the first-order logic version of modus ponens can be written:**
+
+&#10230;
+
+<br>
+
+
+**54. Completeness ― Modus ponens is complete for first-order logic with only Horn clauses.**
+
+&#10230;
+
+<br>
+
+
+**55. Resolution rule ― By noting f1,...,fn, g1,...,gm, p, q formulas and by calling θ=Unify(p,q), the first-order logic version of the resolution rule can be written:**
+
+&#10230;
+
+<br>
+
+
+**56. [Semi-decidability ― First-order logic, even restricted to only Horn clauses, is semi-decidable., if KB⊨f, forward inference on complete inference rules will prove f in finite time, if KB⊭f, no algorithm can show this in finite time]**
+
+&#10230;
+
+<br>
+
+
+**57. [Basics, Notations, Model, Interpretation function, Set of models]**
+
+&#10230;
+
+<br>
+
+
+**58. [Knowledge base, Definition, Probabilistic interpretation, Satisfiability, Relationship with formulas, Forward inference, Rule properties]**
+
+&#10230;
+
+<br>
+
+
+**59. [Propositional logic, Clauses, Modus ponens, Conjunctive normal form, Representation equivalence, Resolution]**
+
+&#10230;
+
+<br>
+
+
+**60. [First-order logic, Substitution, Unification, Resolution rule, Modus ponens, Resolution, Semi-decidability]**
+
+&#10230;
+
+<br>
+
+
+**61. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**62. Original authors**
+
+&#10230;
+
+<br>
+
+
+**63. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**64. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**65. By X and Y**
+
+&#10230;
+
+<br>
+
+
+**66. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230;

From cc755c8ed0c166452eca110fac1362cb016a3b87 Mon Sep 17 00:00:00 2001
From: afshinea <afshine.amidi@centraliens.net>
Date: Sun, 30 Jun 2019 19:19:51 -0700
Subject: [PATCH 282/531] Update template for AI

---
 template/cs-221-logic-models.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/template/cs-221-logic-models.md b/template/cs-221-logic-models.md
index 844191727..8be03acc4 100644
--- a/template/cs-221-logic-models.md
+++ b/template/cs-221-logic-models.md
@@ -2,7 +2,7 @@
 
 <br>
 
-**1. Logic-based models with propositional and first-order logic **
+**1. Logic-based models with propositional and first-order logic**
 
 &#10230;
 

From 430851cceb478c475d138a895e0d1717bd2133cf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=E4=B8=AD=E4=BA=95=E5=96=9C=E4=B9=8B?= <fpnz@airman.local>
Date: Thu, 4 Jul 2019 07:47:38 +0900
Subject: [PATCH 283/531] Reviewed and edited 27. to 65.

---
 ja/cs-230-deep-learning-tips-and-tricks.md | 46 +++++++++++-----------
 1 file changed, 23 insertions(+), 23 deletions(-)

diff --git a/ja/cs-230-deep-learning-tips-and-tricks.md b/ja/cs-230-deep-learning-tips-and-tricks.md
index e465698e6..a7de15349 100644
--- a/ja/cs-230-deep-learning-tips-and-tricks.md
+++ b/ja/cs-230-deep-learning-tips-and-tricks.md
@@ -200,14 +200,14 @@
 
 **29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
 
-&#10230;ステップ１：訓練データのバッチでフォワードプロパゲーションで損失を求めます。ステップ２：逆伝播法を用いてそれぞれの重みに対する損失の勾配を求めます。ステップ３：求めた勾配を用いてネットワークの重みを更新します。	
+&#10230;ステップ１：訓練データのバッチを用いて順伝播で損失を計算します。ステップ２：損失を逆伝播させて各重みに関する損失の勾配を求めます。ステップ３：求めた勾配を用いてネットワークの重みを更新します。
 
 <br>
 
 
 **30. [Forward propagation, Backpropagation, Weights update]**
 
-&#10230;伝播法、逆伝播法、重みの更新
+&#10230;順伝播、逆伝播、重みの更新
 
 <br>
 
@@ -242,21 +242,21 @@
 
 **35. [Training size, Illustration, Explanation]**
 
-&#10230;トレーニングサイズ、イラストレーション、解説
+&#10230;学習サイズ、図、解説
 
 <br>
 
 
 **36. [Small, Medium, Large]**
 
-&#10230;スモール、ミディアム、ラージ
+&#10230;小、中、大
 
 <br>
 
 
 **37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
 
-&#10230;全層を凍結、softmaxで重みを学習、ほぼ全部の層を凍結、最終層とsoftmaxで学習、学習済みの重みで初期化することで層とsoftmaxで学習
+&#10230;全層を凍結し、softmaxの重みを学習させる、大半の層を凍結し、最終層とsoftmaxの重みを学習させる、学習済みの重みで初期化して各層とsoftmaxの重みを学習させる
 
 <br>
 
@@ -278,7 +278,7 @@
 
 **40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
 
-&#10230;適応学習率法 - モデルを学習させる際に学習率を変動させることで、学習時間の短縮や精度の向上につながります。Adamがもっとも一般的に使用されている手法ではあるが、他の手法も役立つことがあります。それらの手法を下記の表にまとめました。
+&#10230;適応学習率法 - モデルを学習させる際に学習率を変動させると、学習時間の短縮や精度の向上につながります。Adamがもっとも一般的に使用されている手法ですが、他の手法も役立つことがあります。それらの手法を下記の表にまとめました。
 
 <br>
 
@@ -292,21 +292,21 @@
 
 **42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
 
-&#10230;運動量、振動の減少、SGDの改良、チューニングするパラメータが2つある
+&#10230;Momentum（運動量）、振動を抑制する、SGDの改良、チューニングするパラメータは2つ
 
 <br>
 
 
 **43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
 
-&#10230;RMSprop, 二条平均平方根のプロパゲーション、振動をコントロールすることで学習アルゴリズムを高速化する
+&#10230;RMSprop, 二乗平均平方根のプロパゲーション、振動をコントロールすることで学習アルゴリズムを高速化する
 
 <br>
 
 
 **44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
 
-&#10230;Adam, Adaptive Moment estimation, もっとも人気のある手法、チューニングするパラメータが4つある
+&#10230;Adam, Adaptive Moment estimation, もっとも人気のある手法、チューニングするパラメータは4つ
 
 <br>
 
@@ -320,14 +320,14 @@
 
 **46. Regularization**
 
-&#10230;正規化
+&#10230;正則化
 
 <br>
 
 
 **47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
 
-&#10230;ドロップアウト - ドロップアウトとは、ニューラルネットワークで過学習を避けるために	p>0の確率でノードをドロップアウト（無効化に）します。モデルを特定の特徴量に依存しすぎることを強制的に避けさせます。
+&#10230;ドロップアウト - ドロップアウトとは、ニューラルネットワークで過学習を避けるためにp>0の確率でノードをドロップアウト（無効化）する手法です。モデルが特定の特徴量に依存しすぎることを避けるよう強制します。
 
 <br>
 
@@ -341,7 +341,7 @@
 
 **49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
 
-&#10230;重みの最適化 - 重みが大きくなりすぎず、モデルが過学習しないために、モデルの重みに対して正規化を行います。主な正規化手法は以下でまとまっています。
+&#10230;重みの正則化 - 重みが大きくなりすぎず、モデルが過学習しないようにするため、モデルの重みに対して正則化を行います。主な正則化手法は以下の表にまとめられています。
 
 <br>
 
@@ -354,13 +354,13 @@
 
 **50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
-&#10230;bis. 係数を0へ小さくする、変数選択に良い、係数を小さくする、変数選択と小さい係数のトレードオフ
+&#10230;bis. 係数を0へ小さくする、変数選択に適している、係数を小さくする、変数選択と小さい係数のトレードオフ
 
 <br>
 
 **51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
 
-&#10230;Early stopping - バリデーションの損失が収束するか、あるいは増加し始めたときに学習を早々に止める正規方法
+&#10230;Early stopping - バリデーションの損失が変化しなくなるか、あるいは増加し始めたときに学習を早々に止める正則化方法
 
 <br>
 
@@ -381,42 +381,42 @@
 
 **54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
 
-&#10230;小さいバッチの過学習 - モデルをデバッグするときに、モデルのアーキテクチャを検証するために小さいテストを作ることが役立つことが多いです。特に、モデルを正しく学習できるのを確認するために、ミニバッチでネットワークを学習し、過学習が発生するかどうかチェックすることがあります。モデルが複雑すぎるか、単純すぎると、普通のトレーニングセットどころか、小さいバッチでさえ過学習できないのです。
+&#10230;小さいバッチの過学習 - モデルをデバッグするとき、モデル自体の構造に大きな問題がないか確認するため簡易的なテストが役に立つことが多いです。特に、モデルを正しく学習できることを確認するため、ミニバッチをネットワークに渡してそれを過学習できるかを見ます。もしできなければ、モデルは複雑すぎるか単純すぎるかのいずれかであることを意味し、普通サイズの学習データセットはもちろん、小さいバッチですら過学習できないのです。
 
 <br>
 
 
 **55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
 
-&#10230;Gradient checking (勾配チェック) - Gradient checking とは、ニューラルネットワークで逆伝播法時に用いられる手法です。特定の点で数値計算で計算した勾配と逆伝播法時に計算した勾配を比較する手法で、逆伝播法の実装が正しいことなど確認できます。
+&#10230;Gradient checking (勾配チェック) - Gradient checking とは、ニューラルネットワークの逆伝播を実装する際に用いられる手法です。特定の点で解析的勾配と数値的勾配とを比較する手法で、逆伝播の実装が正しいことを確認できます。
 
 <br>
 
 
 **56. [Type, Numerical gradient, Analytical gradient]**
 
-&#10230;種類、数値勾配、勾配の理論値
+&#10230;種類、数値的勾配、解析的勾配
 
 <br>
 
 
 **57. [Formula, Comments]**
 
-&#10230;数式、コメント
+&#10230;公式、コメント
 
 <br>
 
 
 **58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
 
-&#10230;計算量が多い；損失を次元ごとに２回計算する必要がある、勾配の実装のチェックに用いられる、hが小さすぎると数値的不安定だが、大きすぎると近似が正確でなくなるというトレードオフががある
+&#10230;計算コストが高い；損失を次元ごとに２回計算する必要がある、解析的実装が正しいかのチェックに用いられる、hを選ぶ時に小さすぎると数値不安定になり、大きすぎると勾配近似が不正確になるというトレードオフがある
 
 <br>
 
 
 **59. ['Exact' result, Direct computation, Used in the final implementation]**
 
-&#10230;エグザクトの勾配、直接計算する、最終的な実装で使われる
+&#10230;「正しい」結果、直接的な計算、最終的な実装で使われる
 
 <br>
 
@@ -434,13 +434,13 @@
 
 **62.Translated by X, Y and Z**
 
-&#10230;X,Y,そしてZにより翻訳されました。
+&#10230;X・Y・Z 訳
 
 <br>
 
 **63.Reviewed by X, Y and Z**
 
-&#10230;X,Y,そしてZにより校正されました。
+&#10230;X・Y・Z 校正
 
 <br>
 
@@ -452,6 +452,6 @@
 
 **65.By X and Y**
 
-&#10230;XそしてYによる。
+&#10230;X・Y 著
 
 <br>

From 3cfde5a1d6bced3dc993c6655580efd478132c8a Mon Sep 17 00:00:00 2001
From: Yuta Kanzawa <yutakanzawa@gmail.com>
Date: Mon, 15 Jul 2019 11:44:11 +0900
Subject: [PATCH 284/531] Update cheatsheet-supervised-learning.md

WIP. Fourth commit

No 69-95; Page 4
Corrected translation of 'training examples'
---
 ja/cheatsheet-supervised-learning.md | 58 ++++++++++++++--------------
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/ja/cheatsheet-supervised-learning.md b/ja/cheatsheet-supervised-learning.md
index 09734db5e..ce9f85804 100644
--- a/ja/cheatsheet-supervised-learning.md
+++ b/ja/cheatsheet-supervised-learning.md
@@ -96,7 +96,7 @@
 
 **17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
 
-&#10230;備考：確率的勾配降下法(SGD)は学習データ全体を用いてパラメータを更新し、バッチ勾配降下法は学習データの各バッチ毎に更新する。
+&#10230;備考：確率的勾配降下法(SGD)は学習標本全体を用いてパラメータを更新し、バッチ勾配降下法は学習標本の各バッチ毎に更新する。
 
 <br>
 
@@ -156,7 +156,7 @@
 
 **27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
 
-&#10230;局所重み付き回帰 ― 局所重み付き回帰は、LWRとも呼ばれ、線形回帰の派生形である。パラメータをτ∈Rとして次のように定義されるw(i)(x)により、個々の学習サンプルをそのコスト関数において重み付けする：
+&#10230;局所重み付き回帰 ― 局所重み付き回帰は、LWRとも呼ばれ、線形回帰の派生形である。パラメータをτ∈Rとして次のように定義されるw(i)(x)により、個々の学習標本をそのコスト関数において重み付けする：
 
 <br>
 
@@ -408,127 +408,127 @@
 
 **69. [Adaptive boosting, Gradient boosting]**
 
-&#10230;
+&#10230;[適応的ブースティング, 勾配ブースティング]
 
 <br>
 
 **70. High weights are put on errors to improve at the next boosting step**
 
-&#10230;
+&#10230;次のブースティングステップにて改善すべき誤分類に大きい重みが課される。
 
 <br>
 
 **71. Weak learners trained on remaining errors**
 
-&#10230;
+&#10230;残っている誤分類を弱い学習器が学習する。
 
 <br>
 
 **72. Other non-parametric approaches**
 
-&#10230;
+&#10230;他のノン・パラメトリックな手法
 
 <br>
 
 **73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
 
-&#10230;
+&#10230;k近傍法 ― k近傍法は、一般的にk-NNとして知られ、あるデータ点の応答はそのk個の最近傍点の性質によって決まるノン・パラメトリックな手法である。分類と回帰の両方に用いることができる。
 
 <br>
 
 **74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
 
-&#10230;
+&#10230;備考：パラメータkが大きくなるほど、バイアスが大きくなる。パラメータkが小さくなるほど、分散が大きくなる。
 
 <br>
 
 **75. Learning Theory**
 
-&#10230;
+&#10230;学習理論
 
 <br>
 
 **76. Union bound ― Let A1,...,Ak be k events. We have:**
 
-&#10230;
+&#10230;和集合上界 ― A1,...,Akというk個の事象があるとき、次が成り立つ：
 
 <br>
 
 **77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
 
-&#10230;
+&#10230;ヘフディング不等式 ― パラメータϕのベルヌーイ分布から得られるm個の独立同分布変数をZ1,..,Zmとする。その標本平均をˆϕとし、γは正の定数であるとすると、次が成り立つ：
 
 <br>
 
 **78. Remark: this inequality is also known as the Chernoff bound.**
 
-&#10230;
+&#10230;備考：この不等式はチェルノフ上界としても知られる。
 
 <br>
 
 **79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
 
-&#10230;
+&#10230;学習誤差 ― ある分類器hに対して、学習誤差、あるいは経験損失か経験誤差としても知られる、ˆϵ(h)を次のように定義する：
 
 <br>
 
 **80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
 
-&#10230;
+&#10230;確率的に近似的に正しい (PAC) ― PACとは、その下で学習理論に関する様々な業績が証明されてきたフレームワークであり、次の前提がある：
 
 <br>
 
 **81: the training and testing sets follow the same distribution **
 
-&#10230;
+&#10230;学習データと検証データは同じ分布に従う。
 
 <br>
 
 **82. the training examples are drawn independently**
 
-&#10230;
+&#10230;学習標本は独立に取得される。
 
 <br>
 
 **83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
 
-&#10230;
+&#10230;細分化 ― 集合S={x(1),...,x(d)}と分類器の集合Hがあるとき、もし任意のラベル{y(1),...,y(d)}の集合に対して次が成り立つとき、HはSを細分化する：
 
 <br>
 
 **84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
 
-&#10230;
+&#10230;上界定理 ― Hを|H|=kで有限の仮説集合とし、δとサンプルサイズmは定数とする。そのとき、少なくとも1-δ の確率で次が成り立つ：
 
 <br>
 
 **85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
 
-&#10230;
+&#10230;VC次元 ― ある仮説集合Hのヴァプニク・チェルヴォーネンキス次元 (VC)は、VC(H)と表記され、それはHによって細分化される最大の集合のサイズである。
 
 <br>
 
 **86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
 
-&#10230;
+&#10230;備考：2次元の線形分類器の集合であるHのVC次元は3である。
 
 <br>
 
 **87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
 
-&#10230;
+&#10230;定理（ヴァプニク） ― あるHについてVC(H)=dであり、mを学習標本の数とする。少なくとも1−δの確率で次が成り立つ：
 
 <br>
 
 **88. [Introduction, Type of prediction, Type of model]**
 
-&#10230;
+&#10230;[導入, 予測の種類, モデルの種類]
 
 <br>
 
 **89. [Notations and general concepts, loss function, gradient descent, likelihood]**
 
-&#10230;
+&#10230;[記法と全般的な概念, 損失関数, 勾配降下, 尤度]
 
 <br>
 
@@ -536,32 +536,32 @@
 
 &#10230;
 
-<br>
+<br>[線形モデル, 線形回帰, ロジスティック回帰, 一般化線形モデル]
 
 **91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
 
 &#10230;
 
-<br>
+<br>[サポートベクターマシン, 最適マージン分類器, ヒンジ損失, カーネル]
 
 **92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
 
 &#10230;
 
-<br>
+<br>[生成学習, ガウシアン判別分析, ナイーブベイズ]
 
 **93. [Trees and ensemble methods, CART, Random forest, Boosting]**
 
-&#10230;
+&#10230;[ツリーとアンサンブル学習, CART, ランダムフォレスト, ブースティング]
 
 <br>
 
 **94. [Other methods, k-NN]**
 
-&#10230;
+&#10230;[他の手法, k近傍法]
 
 <br>
 
 **95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
 
-&#10230;
+&#10230;[学習理論, ヘフディング不等式, PAC, VC次元]

From 9a4075b240815b33b70133ed9ffeaff8cf254cef Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Yavuz=20K=C3=B6me=C3=A7o=C4=9Flu?=
 <komecoglu.yavuz@gmail.com>
Date: Sat, 27 Jul 2019 21:17:34 +0300
Subject: [PATCH 285/531] [tr] Reflex-based models

It will continue to be translated into Turkish.
---
 tr/cs-221-reflex-models.md | 539 +++++++++++++++++++++++++++++++++++++
 1 file changed, 539 insertions(+)
 create mode 100644 tr/cs-221-reflex-models.md

diff --git a/tr/cs-221-reflex-models.md b/tr/cs-221-reflex-models.md
new file mode 100644
index 000000000..bb32af8b7
--- /dev/null
+++ b/tr/cs-221-reflex-models.md
@@ -0,0 +1,539 @@
+**Reflex-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-reflex-models)
+
+<br>
+
+**1. Reflex-based models with Machine Learning**
+
+&#10230; Makine Öğrenmesi ile Refleks tabanlı modeller
+
+<br>
+
+
+**2. Linear predictors**
+
+&#10230; Doğrusal öngörücüler
+
+<br>
+
+
+**3. In this section, we will go through reflex-based models that can improve with experience, by going through samples that have input-output pairs.**
+
+&#10230;
+
+<br>
+
+
+**4. Feature vector ― The feature vector of an input x is noted ϕ(x) and is such that:**
+
+&#10230;
+
+<br>
+
+
+**5. Score ― The score s(x,w) of an example (ϕ(x),y)∈Rd×R associated to a linear model of weights w∈Rd is given by the inner product:**
+
+&#10230;
+
+<br>
+
+
+**6. Classification**
+
+&#10230;
+
+<br>
+
+
+**7. Linear classifier ― Given a weight vector w∈Rd and a feature vector ϕ(x)∈Rd, the binary linear classifier fw is given by:**
+
+&#10230;
+
+<br>
+
+
+**8. if**
+
+&#10230;
+
+<br>
+
+
+**9. Margin ― The margin m(x,y,w)∈R of an example (ϕ(x),y)∈Rd×{−1,+1} associated to a linear model of weights w∈Rd quantifies the confidence of the prediction: larger values are better. It is given by:**
+
+&#10230;
+
+<br>
+
+
+**10. Regression**
+
+&#10230;
+
+<br>
+
+
+**11. Linear regression ― Given a weight vector w∈Rd and a feature vector ϕ(x)∈Rd, the output of a linear regression of weights w denoted as fw is given by:**
+
+&#10230;
+
+<br>
+
+
+**12. Residual ― The residual res(x,y,w)∈R is defined as being the amount by which the prediction fw(x) overshoots the target y:**
+
+&#10230;
+
+<br>
+
+
+**13. Loss minimization**
+
+&#10230;
+
+<br>
+
+
+**14. Loss function ― A loss function Loss(x,y,w) quantifies how unhappy we are with the weights w of the model in the prediction task of output y from input x. It is a quantity we want to minimize during the training process.**
+
+&#10230;
+
+<br>
+
+
+**15. Classification case - The classification of a sample x of true label y∈{−1,+1} with a linear model of weights w can be done with the predictor fw(x)≜sign(s(x,w)). In this situation, a metric of interest quantifying the quality of the classification is given by the margin m(x,y,w), and can be used with the following loss functions:**
+
+&#10230;
+
+<br>
+
+
+**16. [Name, Illustration, Zero-one loss, Hinge loss, Logistic loss]**
+
+&#10230;
+
+<br>
+
+
+**17. Regression case - The prediction of a sample x of true label y∈R with a linear model of weights w can be done with the predictor fw(x)≜s(x,w). In this situation, a metric of interest quantifying the quality of the regression is given by the margin res(x,y,w) and can be used with the following loss functions:**
+
+&#10230;
+
+<br>
+
+
+**18. [Name, Squared loss, Absolute deviation loss, Illustration]**
+
+&#10230;
+
+<br>
+
+
+**19. Loss minimization framework ― In order to train a model, we want to minimize the training loss is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**20. Non-linear predictors**
+
+&#10230;
+
+<br>
+
+
+**21. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
+
+&#10230;
+
+<br>
+
+
+**22. Remark: the higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
+
+&#10230;
+
+<br>
+
+
+**23. Neural networks ― Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks. The vocabulary around neural networks architectures is described in the figure below:**
+
+&#10230;
+
+<br>
+
+
+**24. [Input layer, Hidden layer, Output layer]**
+
+&#10230;
+
+<br>
+
+
+**25. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
+
+&#10230;
+
+<br>
+
+
+**26. where we note w, b, x, z the weight, bias, input and non-activated output of the neuron respectively.**
+
+&#10230;
+
+<br>
+
+
+**27. For a more detailed overview of the concepts above, check out the Supervised Learning cheatsheets!**
+
+&#10230;
+
+<br>
+
+
+**28. Stochastic gradient descent**
+
+&#10230;
+
+<br>
+
+
+**29. Gradient descent ― By noting η∈R the learning rate (also called step size), the update rule for gradient descent is expressed with the learning rate and the loss function Loss(x,y,w) as follows:**
+
+&#10230;
+
+<br>
+
+
+**30. Stochastic updates ― Stochastic gradient descent (SGD) updates the parameters of the model one training example (ϕ(x),y)∈Dtrain at a time. This method leads to sometimes noisy, but fast updates.**
+
+&#10230;
+
+<br>
+
+
+**31. Batch updates ― Batch gradient descent (BGD) updates the parameters of the model one batch of examples (e.g. the entire training set) at a time. This method computes stable update directions, at a greater computational cost.**
+
+&#10230;
+
+<br>
+
+
+**32. Fine-tuning models**
+
+&#10230;
+
+<br>
+
+
+**33. Hypothesis class ― A hypothesis class F is the set of possible predictors with a fixed ϕ(x) and varying w:**
+
+&#10230;
+
+<br>
+
+
+**34. Logistic function ― The logistic function σ, also called the sigmoid function, is defined as:**
+
+&#10230;
+
+<br>
+
+
+**35. Remark: we have σ′(z)=σ(z)(1−σ(z)).**
+
+&#10230;
+
+<br>
+
+
+**36. Backpropagation ― The forward pass is done through fi, which is the value for the subexpression rooted at i, while the backward pass is done through gi=∂out∂fi and represents how fi influences the output.**
+
+&#10230;
+
+<br>
+
+
+**37. Approximation and estimation error ― The approximation error ϵapprox represents how far the entire hypothesis class F is from the target predictor g∗, while the estimation error ϵest quantifies how good the predictor ^f is with respect to the best predictor f∗ of the hypothesis class F.**
+
+&#10230;
+
+<br>
+
+
+**38. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230;
+
+<br>
+
+
+**39. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230;
+
+<br>
+
+
+**40. Hyperparameters ― Hyperparameters are the properties of the learning algorithm, and include features, regularization parameter λ, number of iterations T, step size η, etc.**
+
+&#10230;
+
+<br>
+
+
+**41. Sets vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230;
+
+<br>
+
+
+**42. [Training set, Validation set, Testing set]**
+
+&#10230;
+
+<br>
+
+
+**43. [Model is trained, Usually 80% of the dataset, Model is assessed, Usually 20% of the dataset, Also called hold-out or development set, Model gives predictions, Unseen data]**
+
+&#10230;
+
+<br>
+
+
+**44. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+&#10230;
+
+<br>
+
+
+**45. [Dataset, Unseen data, train, validation, test]**
+
+&#10230;
+
+<br>
+
+
+**46. For a more detailed overview of the concepts above, check out the Machine Learning tips and tricks cheatsheets!**
+
+&#10230;
+
+<br>
+
+
+**47. Unsupervised Learning**
+
+&#10230;
+
+<br>
+
+
+**48. The class of unsupervised learning methods aims at discovering the structure of the data, which may have of rich latent structures.**
+
+&#10230;
+
+<br>
+
+
+**49. k-means**
+
+&#10230;
+
+<br>
+
+
+**50. Clustering ― Given a training set of input points Dtrain, the goal of a clustering algorithm is to assign each point ϕ(xi) to a cluster zi∈{1,...,k}**
+
+&#10230;
+
+<br>
+
+
+**51. Objective function ― The loss function for one of the main clustering algorithms, k-means, is given by:**
+
+&#10230;
+
+<br>
+
+
+**52. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230;
+
+<br>
+
+
+**53. and**
+
+&#10230;
+
+<br>
+
+
+**54. [Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230;
+
+<br>
+
+
+**55. Principal Component Analysis**
+
+&#10230;
+
+<br>
+
+
+**56. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+
+<br>
+
+
+**57. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+
+<br>
+
+
+**58. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230;
+
+<br>
+
+
+**59. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:**
+
+&#10230;
+
+<br>
+
+
+**60. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230;
+
+<br>
+
+
+**61. [where, and]**
+
+&#10230;
+
+<br>
+
+
+**62. [Step 2: Compute Σ=1mm∑i=1ϕ(xi)ϕ(xi)T∈Rn×n, which is symmetric with real eigenvalues., Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues., Step 4: Project the data on spanR(u1,...,uk).]**
+
+&#10230;
+
+<br>
+
+
+**63. This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230;
+
+<br>
+
+
+**64. [Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230;
+
+<br>
+
+
+**65. For a more detailed overview of the concepts above, check out the Unsupervised Learning cheatsheets!**
+
+&#10230;
+
+<br>
+
+
+**66. [Linear predictors, Feature vector, Linear classifier/regression, Margin]**
+
+&#10230;
+
+<br>
+
+
+**67. [Loss minimization, Loss function, Framework]**
+
+&#10230;
+
+<br>
+
+
+**68. [Non-linear predictors, k-nearest neighbors, Neural networks]**
+
+&#10230;
+
+<br>
+
+
+**69. [Stochastic gradient descent, Gradient, Stochastic updates, Batch updates]**
+
+&#10230;
+
+<br>
+
+
+**70. [Fine-tuning models, Hypothesis class, Backpropagation, Regularization, Sets vocabulary]**
+
+&#10230;
+
+<br>
+
+
+**71. [Unsupervised Learning, k-means, Principal components analysis]**
+
+&#10230;
+
+<br>
+
+
+**72. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**73. Original authors**
+
+&#10230;
+
+<br>
+
+
+**74. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**75. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**76. By X and Y**
+
+&#10230;
+
+<br>
+
+
+**77. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230;

From 25956717c5292a71f28f1ef6920ae106f69c1cdb Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Sat, 27 Jul 2019 23:50:47 +0300
Subject: [PATCH 286/531] [tr] Logic-based models

Going to continuing...
---
 tr/cs-221-logic-models.md | 462 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 462 insertions(+)
 create mode 100644 tr/cs-221-logic-models.md

diff --git a/tr/cs-221-logic-models.md b/tr/cs-221-logic-models.md
new file mode 100644
index 000000000..9f173aa05
--- /dev/null
+++ b/tr/cs-221-logic-models.md
@@ -0,0 +1,462 @@
+**Logic-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-logic-models)
+
+<br>
+
+**1. Logic-based models with propositional and first-order logic**
+
+&#10230; Önermeli ve birinci dereceden mantık (Lojik) temelli modeller
+
+<br>
+
+
+**2. Basics**
+
+&#10230; Temeller
+
+<br>
+
+
+**3. Syntax of propositional logic ― By noting f,g formulas, and ¬,∧,∨,→,↔ connectives, we can write the following logical expressions:**
+
+&#10230; Önerme mantığının sözdizimi ― f, g formülleri ve ¬,∧,∨,→,↔ bağlayıcılarını belirterek, aşağıdaki mantıksal ifadeleri yazabiliriz:
+
+<br>
+
+
+**4. [Name, Symbol, Meaning, Illustration]**
+
+&#10230; [Ad, Sembol, Anlamı, Gösterimi]
+
+<br>
+
+
+**5. [Affirmation, Negation, Conjunction, Disjunction, Implication, Biconditional]**
+
+&#10230; [Doğrulama, Dışlayan, Kesişim, Birleşim, Implication, İki koşullu]
+
+<br>
+
+
+**6. [not f, f and g, f or g, if f then g, f, that is to say g]**
+
+&#10230;  [f değil, f ve g, f veya g, eğer f'den g çıkarsa, f, f ve g'nin ortak olduğu bölge]
+
+<br>
+
+
+**7. Remark: formulas can be built up recursively out of these connectives.**
+
+&#10230; Not: Bu bağlantılar dışında tekrarlayan formüller oluşturulabilir.
+
+<br>
+
+
+**8. Model ― A model w denotes an assignment of binary weights to propositional symbols.**
+
+&#10230; Model - w modeli, ikili sembollerin önermeli sembollere atanmasını belirtir.
+
+<br>
+
+
+**9. Example: the set of truth values w={A:0,B:1,C:0} is one possible model to the propositional symbols A, B and C.**
+
+&#10230; Örnek: w = {A: 0, B: 1, C: 0} doğruluk değerleri kümesi, A, B ve C önermeli semboller için olası bir modeldir.
+
+<br>
+
+
+**10. Interpretation function ― The interpretation function I(f,w) outputs whether model w satisfies formula f:**
+
+&#10230; Yorumlama fonksiyonu - Yorumlama fonksiyonu I(f,w), w modelinin f formülüne uygun olup olmadığını gösterir:
+
+<br>
+
+
+**11. Set of models ― M(f) denotes the set of models w that satisfy formula f. Mathematically speaking, we define it as follows:**
+
+&#10230; Modellerin seti ― M(f), f formülünü sağlayan model setini belirtir. Matematiksel konuşursak, şöyle tanımlarız:
+
+<br>
+
+
+**12. Knowledge base**
+
+&#10230; Bilgi temelli
+
+<br>
+
+
+**13. Definition ― The knowledge base KB is the conjunction of all formulas that have been considered so far. The set of models of the knowledge base is the intersection of the set of models that satisfy each formula. In other words:**
+
+&#10230; Tanım ― Bilgi temeli (KB-Knowledgde Base), şu ana kadar düşünülen tüm formüllerin birleşimidir. Bilgi temelinin model kümesi, her formülü karşılayan model dizisinin kesişimidir. Diğer bir deyişle:
+
+<br>
+
+
+**14. Probabilistic interpretation ― The probability that query f is evaluated to 1 can be seen as the proportion of models w of the knowledge base KB that satisfy f, i.e.:**
+
+&#10230; Olasılıksal yorumlama ― f sorgusunun 1 olarak değerlendirilmesi olasılığı, f'yi sağlayan bilgi temeli KB'nin w modellerinin oranı olarak görülebilir, yani:
+
+<br>
+
+
+**15. Satisfiability ― The knowledge base KB is said to be satisfiable if at least one model w satisfies all its constraints. In other words:**
+
+&#10230; Gerçeklenebilirlik ― En az bir modelin tüm kısıtlamaları yerine getirmesi durumunda KB'nin bilgi temelinin gerçeklenebilir olduğu söylenir. Diğer bir deyişle:
+
+<br>
+
+
+**16. satisfiable**
+
+&#10230; Karşılanabilirlik
+
+<br>
+
+
+**17. Remark: M(KB) denotes the set of models compatible with all the constraints of the knowledge base.**
+
+&#10230; Not: M(KB), bilgi temelinin tüm kısıtları ile uyumlu model kümesini belirtir.
+
+<br>
+
+
+**18. Relation between formulas and knowledge base - We define the following properties between the knowledge base KB and a new formula f:**
+
+&#10230; Formüller ve bilgi temeli arasındaki ilişki - Bilgi temeli KB ile yeni bir formül f arasında aşağıdaki özellikleri tanımlarız:
+
+<br>
+
+
+**19. [Name, Mathematical formulation, Illustration, Notes]**
+
+&#10230; [Adı, Matematiksel formülü, Gösterim, Notlar]
+
+<br>
+
+
+**20. [KB entails f, KB contradicts f, f contingent to KB]**
+ 
+&#10230; [KB f içerir, KB f içermez, f koşullu KB]
+
+<br>
+
+
+**21. [f does not bring any new information, Also written KB⊨f, No model satisfies the constraints after adding f, Equivalent to KB⊨¬f, f does not contradict KB, f adds a non-trivial amount of information to KB]**
+
+&#10230; [f yeni bir bilgi getirmiyor, Ayrıca KB⊨f yazıyor, Hiçbir model f ekledikten sonra kısıtlamaları yerine getirmiyor, f KB'ye eşdeğer, f KB'ye aykırı değil, f KB'ye önemsiz miktarda bilgi ekliyor]
+
+<br>
+
+
+**22. Model checking ― A model checking algorithm takes as input a knowledge base KB and outputs whether it is satisfiable or not.**
+
+&#10230; Model denetimi - Bir model denetimi algoritması, KB'nin bilgi temelini girdi olarak alır ve bunun gerçeklenebilir/karşılanabilir olup olmadığını çıkarır.
+
+<br>
+
+
+**23. Remark: popular model checking algorithms include DPLL and WalkSat.**
+
+&#10230; Not: popüler model kontrol algoritmaları DPLL ve WalkSat'ı içerir.
+
+<br>
+
+
+**24. Inference rule ― An inference rule of premises f1,...,fk and conclusion g is written:**
+
+&#10230; Çıkarım kuralı - f1, ..., fk ve sonuç g yapısının çıkarım kuralı şöyle yazılmıştır:
+
+<br>
+
+
+**25. Forward inference algorithm ― From a set of inference rules Rules, this algorithm goes through all possible f1,...,fk and adds g to the knowledge base KB if a matching rule exists. This process is repeated until no more additions can be made to KB.**
+
+&#10230; İleri çıkarım algoritması - Çıkarım kurallarından Kurallar, bu algoritma mümkün olan tüm f1, ..., fk'den geçer ve eşleşen bir kural varsa, KB bilgi tabanına g ekler. Bu işlem KB'ye daha fazla ekleme yapılamayana kadar tekrar edilir.
+
+<br>
+
+
+**26. Derivation ― We say that KB derives f (written KB⊢f) with rules Rules if f already is in KB or gets added during the forward inference algorithm using the set of rules Rules.**
+
+&#10230; Türetme - f'nin KB içerisindeyse veya kurallar kurallarını kullanarak ileri çıkarım algoritması sırasında eklenmişse, KB'nin kurallar ile f (KB⊢f yazılır) türettiğini söylüyoruz.
+
+<br>
+
+
+**27. Properties of inference rules ― A set of inference rules Rules can have the following properties:**
+
+&#10230; Çıkarım kurallarının özellikleri - Çıkarım kurallarının kümesi Kurallar aşağıdaki özelliklere sahip olabilir:
+
+<br>
+
+
+**28. [Name, Mathematical formulation, Notes]**
+
+&#10230; [Adı, Matematiksel formülü, Notlar]
+
+<br>
+
+
+**29. [Soundness, Completeness]**
+
+&#10230; [Sağlamlık, Tamlık]
+
+<br>
+
+
+**30. [Inferred formulas are entailed by KB, Can be checked one rule at a time, "Nothing but the truth", Formulas entailing KB are either already in the knowledge base or inferred from it, "The whole truth"]**
+
+&#10230; [Çıkarılan formüller KB tarafından sağlanmıştır, Her defasında bir kural kontrol edilebilir, ya KB'yi içeren Formüller ya bilgi tabanında zaten vardır "Gerçeğinden başka bir şey yok", ya da ondan çıkarılan "Tüm gerçek" değerlerdir]
+
+<br>
+
+
+**31. Propositional logic**
+
+&#10230; Önerme mantığı
+
+<br>
+
+
+**32. In this section, we will go through logic-based models that use logical formulas and inference rules. The idea here is to balance expressivity and computational efficiency.**
+
+&#10230;
+
+<br>
+
+
+**33. Horn clause ― By noting p1,...,pk and q propositional symbols, a Horn clause has the form:**
+
+&#10230;
+
+<br>
+
+
+**34. Remark: when q=false, it is called a "goal clause", otherwise we denote it as a "definite clause".**
+
+&#10230;
+
+<br>
+
+
+**35. Modus ponens ― For propositional symbols f1,...,fk and p, the modus ponens rule is written:**
+
+&#10230;
+
+<br>
+
+
+**36. Remark: it takes linear time to apply this rule, as each application generate a clause that contains a single propositional symbol.**
+
+&#10230;
+
+<br>
+
+
+**37. Completeness ― Modus ponens is complete with respect to Horn clauses if we suppose that KB contains only Horn clauses and p is an entailed propositional symbol. Applying modus ponens will then derive p.**
+
+&#10230;
+
+<br>
+
+
+**38. Conjunctive normal form ― A conjunctive normal form (CNF) formula is a conjunction of clauses, where each clause is a disjunction of atomic formulas.**
+
+&#10230;
+
+<br>
+
+
+**39. Remark: in other words, CNFs are ∧ of ∨.**
+
+&#10230;
+
+<br>
+
+
+**40. Equivalent representation ― Every formula in propositional logic can be written into an equivalent CNF formula. The table below presents general conversion properties:**
+
+&#10230;
+
+<br>
+
+
+**41. [Rule name, Initial, Converted, Eliminate, Distribute, over]**
+
+&#10230;
+
+<br>
+
+
+**42. Resolution rule ― For propositional symbols f1,...,fn, and g1,...,gm as well as p, the resolution rule is written:**
+
+&#10230;
+
+<br>
+
+
+**43. Remark: it can take exponential time to apply this rule, as each application generates a clause that has a subset of the propositional symbols.**
+
+&#10230;
+
+<br>
+
+
+**44. [Resolution-based inference ― The resolution-based inference algorithm follows the following steps:, Step 1: Convert all formulas into CNF, Step 2: Repeatedly apply resolution rule, Step 3: Return unsatisfiable if and only if False, is derived]**
+
+&#10230;
+
+<br>
+
+
+**45. First-order logic**
+
+&#10230;
+
+<br>
+
+
+**46. The idea here is to use variables to yield more compact knowledge representations.**
+
+&#10230;
+
+<br>
+
+
+**47. [Model ― A model w in first-order logic maps:, constant symbols to objects, predicate symbols to tuple of objects]**
+
+&#10230;
+
+<br>
+
+
+**48. Horn clause ― By noting x1,...,xn variables and a1,...,ak,b atomic formulas, the first-order logic version of a horn clause has the form:**
+
+&#10230;
+
+<br>
+
+
+**49. Substitution ― A substitution θ maps variables to terms and Subst[θ,f] denotes the result of substitution θ on f.**
+
+&#10230;
+
+<br>
+
+
+**50. Unification ― Unification takes two formulas f and g and returns the most general substitution θ that makes them equal:**
+
+&#10230;
+
+<br>
+
+
+**51. such that**
+
+&#10230;
+
+<br>
+
+
+**52. Note: Unify[f,g] returns Fail if no such θ exists.**
+
+&#10230;
+
+<br>
+
+
+**53. Modus ponens ― By noting x1,...,xn variables, a1,...,ak and a′1,...,a′k atomic formulas and by calling θ=Unify(a′1∧...∧a′k,a1∧...∧ak) the first-order logic version of modus ponens can be written:**
+
+&#10230;
+
+<br>
+
+
+**54. Completeness ― Modus ponens is complete for first-order logic with only Horn clauses.**
+
+&#10230;
+
+<br>
+
+
+**55. Resolution rule ― By noting f1,...,fn, g1,...,gm, p, q formulas and by calling θ=Unify(p,q), the first-order logic version of the resolution rule can be written:**
+
+&#10230;
+
+<br>
+
+
+**56. [Semi-decidability ― First-order logic, even restricted to only Horn clauses, is semi-decidable., if KB⊨f, forward inference on complete inference rules will prove f in finite time, if KB⊭f, no algorithm can show this in finite time]**
+
+&#10230;
+
+<br>
+
+
+**57. [Basics, Notations, Model, Interpretation function, Set of models]**
+
+&#10230;
+
+<br>
+
+
+**58. [Knowledge base, Definition, Probabilistic interpretation, Satisfiability, Relationship with formulas, Forward inference, Rule properties]**
+
+&#10230;
+
+<br>
+
+
+**59. [Propositional logic, Clauses, Modus ponens, Conjunctive normal form, Representation equivalence, Resolution]**
+
+&#10230;
+
+<br>
+
+
+**60. [First-order logic, Substitution, Unification, Resolution rule, Modus ponens, Resolution, Semi-decidability]**
+
+&#10230;
+
+<br>
+
+
+**61. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**62. Original authors**
+
+&#10230;
+
+<br>
+
+
+**63. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**64. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**65. By X and Y**
+
+&#10230;
+
+<br>
+
+
+**66. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230;

From 70d1d7514a0ca62bee00cb8f7a58f00d588b3662 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Sun, 28 Jul 2019 15:00:36 +0300
Subject: [PATCH 287/531] Update cs-221-logic-models.md

The translation is finished for this logic models cheatsheet.  Thanks.
---
 tr/cs-221-logic-models.md | 70 +++++++++++++++++++--------------------
 1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/tr/cs-221-logic-models.md b/tr/cs-221-logic-models.md
index 9f173aa05..b7b8a1582 100644
--- a/tr/cs-221-logic-models.md
+++ b/tr/cs-221-logic-models.md
@@ -221,242 +221,242 @@
 
 **32. In this section, we will go through logic-based models that use logical formulas and inference rules. The idea here is to balance expressivity and computational efficiency.**
 
-&#10230;
+&#10230; Bu bölümde, mantıksal formülleri ve çıkarım kurallarını kullanan mantık tabanlı modelleri inceleyeceğiz. Buradaki fikir ifade ve hesaplamanın verimliliğini dengelemektir.
 
 <br>
 
 
 **33. Horn clause ― By noting p1,...,pk and q propositional symbols, a Horn clause has the form:**
 
-&#10230;
+&#10230; Horn cümlesi ― p1, ..., pk ve q önerme sembollerini not ederek, bir Horn cümlesi şu şekildedir (Matematiksel mantık ve mantık programlamada, kural gibi özel bir biçime sahip mantıksal formüllere Horn cümlesi denir.): 
 
 <br>
 
 
 **34. Remark: when q=false, it is called a "goal clause", otherwise we denote it as a "definite clause".**
 
-&#10230;
+&#10230; Not: q = false olduğunda, "hedeflenen bir cümle" olarak adlandırılır, aksi takdirde "kesin bir cümle" olarak adlandırırız
 
 <br>
 
 
 **35. Modus ponens ― For propositional symbols f1,...,fk and p, the modus ponens rule is written:**
 
-&#10230;
+&#10230; Modus ponens - f1, ..., fk ve p önermeli semboller için modus ponens kuralı yazılır (Modus ponens: Önerme mantığında, modus ponens bir çıkarım kuralıdır. "P, Q anlamına gelir ve P'nin doğru olduğu iddia edilir, bu yüzden Q doğru olmalı" şeklinde özetlenebilir. Modus ponens, başka bir geçerli argüman biçimi olan modus tollens ile yakından ilgilidir.):
 
 <br>
 
 
 **36. Remark: it takes linear time to apply this rule, as each application generate a clause that contains a single propositional symbol.**
 
-&#10230;
+&#10230; Not: Her uygulama tek bir önermeli sembol içeren bir cümle oluşturduğundan, bu kuralın uygulanması doğrusal bir zaman alır.
 
 <br>
 
 
 **37. Completeness ― Modus ponens is complete with respect to Horn clauses if we suppose that KB contains only Horn clauses and p is an entailed propositional symbol. Applying modus ponens will then derive p.**
 
-&#10230;
+&#10230; Tamlık ― KB'nin sadece Horn cümleleri içerdiğini ve p'nin zorunlu bir teklif sembolü olduğunu varsayalım, Hornus cümlelerine göre Modus ponenleri tamamlanmıştır. Modus ponens uygulanması daha sonra p'yi türetir.
 
 <br>
 
 
 **38. Conjunctive normal form ― A conjunctive normal form (CNF) formula is a conjunction of clauses, where each clause is a disjunction of atomic formulas.**
 
-&#10230;
+&#10230; Konjunktif (Birleştirici) normal form - Bir konjonktif normal form (CNF) formülü, her bir cümlenin atomik formüllerin bir ayrıntısı olduğu cümle birleşimidir.
 
 <br>
 
 
 **39. Remark: in other words, CNFs are ∧ of ∨.**
 
-&#10230;
+&#10230; Açıklama: başka bir deyişle, CNF'ler ∨ ait ∧ bulunmaktadır.
 
 <br>
 
 
 **40. Equivalent representation ― Every formula in propositional logic can be written into an equivalent CNF formula. The table below presents general conversion properties:**
 
-&#10230;
+&#10230; Eşdeğer temsil - Önerme mantığındaki her formül eşdeğer bir CNF formülüne yazılabilir. Aşağıdaki tabloda genel dönüşüm özellikleri gösterilmektedir:
 
 <br>
 
 
 **41. [Rule name, Initial, Converted, Eliminate, Distribute, over]**
 
-&#10230;
+&#10230; [Kural adı, Başlangıç, Dönüştürülmüş, Eleme, Dağıtma, üzerine]
 
 <br>
 
 
 **42. Resolution rule ― For propositional symbols f1,...,fn, and g1,...,gm as well as p, the resolution rule is written:**
 
-&#10230;
+&#10230; Çözünürlük kuralı - f1, ..., fn ve g1, ..., gm önerme sembolleri için, p, çözümleme kuralı yazılır:
 
 <br>
 
 
 **43. Remark: it can take exponential time to apply this rule, as each application generates a clause that has a subset of the propositional symbols.**
 
-&#10230;
+&#10230; Not: Her uygulama, teklif sembollerinin alt kümesine sahip bir cümle oluşturduğundan, bu kuralı uygulamak için üssel olarak zaman alabilir.
 
 <br>
 
 
 **44. [Resolution-based inference ― The resolution-based inference algorithm follows the following steps:, Step 1: Convert all formulas into CNF, Step 2: Repeatedly apply resolution rule, Step 3: Return unsatisfiable if and only if False, is derived]**
 
-&#10230;
+&#10230; [Çözünürlük tabanlı çıkarım - Çözünürlük tabanlı çıkarım algoritması, aşağıdaki adımları izler :, Adım 1: Tüm formülleri CNF'ye dönüştürün, Adım 2: Tekrar tekrar, çözünürlük kuralını uygulayın, Adım 3: Yanlışsa türetilmişse tatmin edici olmayan dönüş yapın]
 
 <br>
 
 
 **45. First-order logic**
 
-&#10230;
+&#10230; Birinci dereceden mantık
 
 <br>
 
 
 **46. The idea here is to use variables to yield more compact knowledge representations.**
 
-&#10230;
+&#10230; Buradaki fikir, daha kompakt bilgi sunumları sağlamak için değişkenleri kullanmaktır.
 
 <br>
 
 
 **47. [Model ― A model w in first-order logic maps:, constant symbols to objects, predicate symbols to tuple of objects]**
 
-&#10230;
+&#10230; [Model ― Birinci mertebeden mantık haritalarında bir w modeli :, nesnelere sabit semboller, nesnelerin dizisini sembolize etmek için tahmin]
 
 <br>
 
 
 **48. Horn clause ― By noting x1,...,xn variables and a1,...,ak,b atomic formulas, the first-order logic version of a horn clause has the form:**
 
-&#10230;
+&#10230; Horn cümlesi - x1, ..., xn değişkenleri ve a1, ..., ak, b atomik formüllerine dikkat çekerek, bir boynuz maddesinin birinci derece mantık versiyonu aşağıdaki şekildedir:
 
 <br>
 
 
 **49. Substitution ― A substitution θ maps variables to terms and Subst[θ,f] denotes the result of substitution θ on f.**
 
-&#10230;
+&#10230; Yer değiştirme - Bir yerdeğiştirme değişkenleri terimlerle eşler ve Subst[θ,f] yerdeğiştirme sonucunu f olarak belirtir.
 
 <br>
 
 
 **50. Unification ― Unification takes two formulas f and g and returns the most general substitution θ that makes them equal:**
 
-&#10230;
+&#10230; Birleştirme ― Birleştirme f ve g'nin iki formülünü alır ve onları eşit yapan en genel ikameyi θ verir:
 
 <br>
 
 
 **51. such that**
 
-&#10230;
+&#10230; öyle ki
 
 <br>
 
 
 **52. Note: Unify[f,g] returns Fail if no such θ exists.**
 
-&#10230;
+&#10230; Not: Unify[f,g], eğer böyle bir θ yoksa Fail döndürür.
 
 <br>
 
 
 **53. Modus ponens ― By noting x1,...,xn variables, a1,...,ak and a′1,...,a′k atomic formulas and by calling θ=Unify(a′1∧...∧a′k,a1∧...∧ak) the first-order logic version of modus ponens can be written:**
 
-&#10230;
+&#10230; Modus ponens ― x1, ..., xn değişkenleri, a1, ..., ak ve a′1, ..., a′k atomik formüllerine dikkat ederek ve θ=Unify(a′1∧...∧a′k,a1∧...∧ak) modus ponenlerin birinci dereceden mantık versiyonu yazılabilir:
 
 <br>
 
 
 **54. Completeness ― Modus ponens is complete for first-order logic with only Horn clauses.**
 
-&#10230;
+&#10230; Tamlık - Modus ponens sadece Horn cümleleriyle birinci dereceden mantık için tamamlanmıştır.
 
 <br>
 
 
 **55. Resolution rule ― By noting f1,...,fn, g1,...,gm, p, q formulas and by calling θ=Unify(p,q), the first-order logic version of the resolution rule can be written:**
 
-&#10230;
+&#10230; Çözünürlük kuralı ― f1,...,fn,g1,...,gm, p, q formüllerini not ederek ve θ=Unify(p,q) ifadesini kullanarak, çözümleme kuralının birinci dereceden mantık sürümü yazılabilir. :
 
 <br>
 
 
 **56. [Semi-decidability ― First-order logic, even restricted to only Horn clauses, is semi-decidable., if KB⊨f, forward inference on complete inference rules will prove f in finite time, if KB⊭f, no algorithm can show this in finite time]**
 
-&#10230;
+&#10230; Yarı karar verilebilirlik ― Birinci dereceden mantık, sadece Horn cümleleriyle sınırlı olsa bile,  yarı kararsıdır eğer KB⊨f ise f sonsuz zamanlıdır. KB⊭f ise sonsuz zamanlı olabilirliği gösteren algoritma yoktur.
 
 <br>
 
 
 **57. [Basics, Notations, Model, Interpretation function, Set of models]**
 
-&#10230;
+&#10230; [Temeller, Notasyon, Model, Yorumlama fonksiyonu, Modellerin kümesi]
 
 <br>
 
 
 **58. [Knowledge base, Definition, Probabilistic interpretation, Satisfiability, Relationship with formulas, Forward inference, Rule properties]**
 
-&#10230;
+&#10230;  [Bilgi temeli, Tanım, Olasılıksal yorumlama, Gerçeklenebilirlik, Formüllerle İlişki, İleri çıkarım, Kural özellikleri]
 
 <br>
 
 
 **59. [Propositional logic, Clauses, Modus ponens, Conjunctive normal form, Representation equivalence, Resolution]**
 
-&#10230;
+&#10230; [Önerme mantığı, Cümleler, Modus ponens, Eşlenik (Conjunctive) normal form, Temsil eşdeğeri, Çözüm]
 
 <br>
 
 
 **60. [First-order logic, Substitution, Unification, Resolution rule, Modus ponens, Resolution, Semi-decidability]**
 
-&#10230;
+&#10230; [Birinci derece mantık, Değiştirme, Birleştirme, Çözünürlük kuralı, Modus ponens, Çözünürlük, Yarı karar verilebilirlik]
 
 <br>
 
 
 **61. View PDF version on GitHub**
 
-&#10230;
+&#10230; GitHub'da PDF sürümünü görüntüleyin
 
 <br>
 
 
 **62. Original authors**
 
-&#10230;
+&#10230; Orijinal yazarlar
 
 <br>
 
 
 **63. Translated by X, Y and Z**
 
-&#10230;
+&#10230; X, Y ve Z tarafından çevrilmiştir
 
 <br>
 
 
 **64. Reviewed by X, Y and Z**
 
-&#10230;
+&#10230; X, Y ve Z tarafından gözden geçirilmiştir
 
 <br>
 
 
 **65. By X and Y**
 
-&#10230;
+&#10230; X ve Y ile
 
 <br>
 
 
 **66. The Artificial Intelligence cheatsheets are now available in [target language].**
 
-&#10230;
+&#10230; Yapay Zeka el kitabı şimdi [Türkçe] mevcuttur.

From 21738ededad4e22edff7056b876f067cdac0b60f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Sun, 28 Jul 2019 18:27:12 +0300
Subject: [PATCH 288/531] Update cs-221-logic-models.md

I translated the CS 221 - Logic-based models to Turkish. @shervinea can you please review it?
---
 tr/cs-221-logic-models.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tr/cs-221-logic-models.md b/tr/cs-221-logic-models.md
index b7b8a1582..d0eb75c23 100644
--- a/tr/cs-221-logic-models.md
+++ b/tr/cs-221-logic-models.md
@@ -81,7 +81,7 @@
 
 **12. Knowledge base**
 
-&#10230; Bilgi temelli
+&#10230; Bilgi temeli
 
 <br>
 

From a62a44f895f298c3b3aea0add16ae398fdb9bcbb Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 29 Jul 2019 00:21:04 -0700
Subject: [PATCH 289/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index ccc114c7a..23e1e9dd2 100644
--- a/README.md
+++ b/README.md
@@ -46,7 +46,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**日本語**|not started|not started|not started|not started|
 |**한국어**|not started|not started|not started|not started|
 |**Português**|not started|not started|not started|not started|
-|**Türkçe**|not started|not started|not started|not started|
+|**Türkçe**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/166)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/168)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/169)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/170)|
 |**Tiếng Việt**|not started|not started|not started|not started|
 |**中文**|not started|not started|not started|not started|
 

From d244897b054edc5e10fe0e1032ee5f86c15a9e5c Mon Sep 17 00:00:00 2001
From: shervinea <shervine.amidi@centraliens.net>
Date: Tue, 30 Jul 2019 23:11:04 -0700
Subject: [PATCH 290/531] Add logic in [fr]

---
 fr/cs-221-logic-models.md | 462 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 462 insertions(+)
 create mode 100644 fr/cs-221-logic-models.md

diff --git a/fr/cs-221-logic-models.md b/fr/cs-221-logic-models.md
new file mode 100644
index 000000000..2ecdbc81e
--- /dev/null
+++ b/fr/cs-221-logic-models.md
@@ -0,0 +1,462 @@
+**Logic-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-logic-models)
+
+<br>
+
+**1. Logic-based models with propositional and first-order logic**
+
+&#10230; Modèles logiques propositionnels et calcul des prédicats du premier ordre
+
+<br>
+
+
+**2. Basics**
+
+&#10230; Bases
+
+<br>
+
+
+**3. Syntax of propositional logic ― By noting f,g formulas, and ¬,∧,∨,→,↔ connectives, we can write the following logical expressions:**
+
+&#10230; Syntaxe de la logique propositionnelle - En notant f et g formules et ¬,∧,∨,→,↔ opérateurs, on peut écrire les expressions logiques suivantes :
+
+<br>
+
+
+**4. [Name, Symbol, Meaning, Illustration]**
+
+&#10230; [Nom, Symbole, Signification, Illustration]
+
+<br>
+
+
+**5. [Affirmation, Negation, Conjunction, Disjunction, Implication, Biconditional]**
+
+&#10230; [Affirmation, Négation, Conjonction, Disjonction, Implication, Biconditionnel]
+
+<br>
+
+
+**6. [not f, f and g, f or g, if f then g, f, that is to say g]**
+
+&#10230; [non f, f et g, f ou g, si f alors g, f, c'est à dire g]
+
+<br>
+
+
+**7. Remark: formulas can be built up recursively out of these connectives.**
+
+&#10230; Remarque : n'importe quelle formule peut être construite de manière récursive à partir de ces opérateurs.
+
+<br>
+
+
+**8. Model ― A model w denotes an assignment of binary weights to propositional symbols.**
+
+&#10230; [Modèle - Un modèle w dénote une combinaison de valeurs binaires liées à des symboles propositionnels]
+
+<br>
+
+
+**9. Example: the set of truth values w={A:0,B:1,C:0} is one possible model to the propositional symbols A, B and C.**
+
+&#10230; Exemple : l'ensemble de valeurs de vérité w={A:0,B:1,C:0} est un modèle possible pour les symboles propositionnels A, B et C.
+
+<br>
+
+
+**10. Interpretation function ― The interpretation function I(f,w) outputs whether model w satisfies formula f:**
+
+&#10230; Interprétation - L'interprétation I(f,w) nous renseigne si le modèle w satisfait la formule f :
+
+<br>
+
+
+**11. Set of models ― M(f) denotes the set of models w that satisfy formula f. Mathematically speaking, we define it as follows:**
+
+&#10230; Ensemble de modèles - M(f) dénote l'ensemble des modèles w qui satisfont la formule f. Sa définition mathématique est donnée par :
+
+<br>
+
+
+**12. Knowledge base**
+
+&#10230; Base de connaissance
+
+<br>
+
+
+**13. Definition ― The knowledge base KB is the conjunction of all formulas that have been considered so far. The set of models of the knowledge base is the intersection of the set of models that satisfy each formula. In other words:**
+
+&#10230; Définition - La base de connaissance KB est la conjonction de toutes les formules considérées jusqu'à présent. L'ensemble des modèles de la base de connaissance est l'intersection de l'ensemble des modèles satisfaisant chaque formule. En d'autres termes :
+
+<br>
+
+
+**14. Probabilistic interpretation ― The probability that query f is evaluated to 1 can be seen as the proportion of models w of the knowledge base KB that satisfy f, i.e.:**
+
+&#10230; Interprétation en termes de probabilités - La probabilté que la requête f soit évaluée à 1 peut être vue comme la proportion des modèles w de la base de connaissance KB qui satisfait f, i.e. :
+
+<br>
+
+
+**15. Satisfiability ― The knowledge base KB is said to be satisfiable if at least one model w satisfies all its constraints. In other words:**
+
+&#10230; Satisfaisabilité - La base de connaissance KB est dite satisfaisable si au moins un modèle w satisfait toutes ses contraintes. En d'autres termes :
+
+<br>
+
+
+**16. satisfiable**
+
+&#10230; satisfaisable
+
+<br>
+
+
+**17. Remark: M(KB) denotes the set of models compatible with all the constraints of the knowledge base.**
+
+&#10230; Remarque : M(KB) dénote l'ensemble des modèles compatibles avec toutes les contraintes de la base de connaissance.
+
+<br>
+
+
+**18. Relation between formulas and knowledge base - We define the following properties between the knowledge base KB and a new formula f:**
+
+&#10230; Relation entre formules et base de connaissance - On définit les propriétés suivantes entre la base de connaissance KB et une nouvelle formule f :
+
+<br>
+
+
+**19. [Name, Mathematical formulation, Illustration, Notes]**
+
+&#10230; [Nom, Formulation mathématique, Illustration, Notes]
+
+<br>
+
+
+**20. [KB entails f, KB contradicts f, f contingent to KB]**
+
+&#10230; [KB déduit f, KB contredit f, f est contingent à KB]
+
+<br>
+
+
+**21. [f does not bring any new information, Also written KB⊨f, No model satisfies the constraints after adding f, Equivalent to KB⊨¬f, f does not contradict KB, f adds a non-trivial amount of information to KB]**
+
+&#10230; [f n'apporte aucune nouvelle information, Aussi écrit KB⊨f, Aucun modèle ne satisfait les contraintes après l'ajout de f, Équivalent à KB⊨¬f, f ne contredit pas KB, f ajoute une quantité non-triviale d'information à KB]
+
+<br>
+
+
+**22. Model checking ― A model checking algorithm takes as input a knowledge base KB and outputs whether it is satisfiable or not.**
+
+&#10230; Vérification de modèles - Un algorithme de vérification de modèles (model checking en anglais) prend comme argument une base de connaissance KB et nous renseigne si celle-ci est satisfaisable ou pas.
+
+<br>
+
+
+**23. Remark: popular model checking algorithms include DPLL and WalkSat.**
+
+&#10230; Remarque : DPLL et WalkSat sont des exemples populaires d'algorithmes de vérification de modèles.
+
+<br>
+
+
+**24. Inference rule ― An inference rule of premises f1,...,fk and conclusion g is written:**
+
+&#10230;  Règle d'inférence - Une règle d'inférence de prémisses f1,...,fk et de conclusion g s'écrit :
+
+<br>
+
+
+**25. Forward inference algorithm ― From a set of inference rules Rules, this algorithm goes through all possible f1,...,fk and adds g to the knowledge base KB if a matching rule exists. This process is repeated until no more additions can be made to KB.**
+
+&#10230; Algorithme de chaînage avant (forward inference algorithm) - Partant d'un ensemble de règles d'inférence Rules, cet algorithme parcourt tous les f1,...,fk et ajoute g à la base de connaissance KB si une règle parvient à une telle conclusion. Cette démarche est répétée jusqu'à ce qu'aucun autre ajout ne puisse être fait à KB.
+
+<br>
+
+
+**26. Derivation ― We say that KB derives f (written KB⊢f) with rules Rules if f already is in KB or gets added during the forward inference algorithm using the set of rules Rules.**
+
+&#10230; Dérivation - On dit que KB dérive f (noté KB⊢f) par le biais des règles Rules soit si f est déjà dans KB ou si elle se fait ajouter pendant l'application du chaînage avant utilisant les règles Rules.
+
+<br>
+
+
+**27. Properties of inference rules ― A set of inference rules Rules can have the following properties:**
+
+&#10230; Propriétés des règles d'inférence - Un ensemble de règles d'inférence Rules peut avoir les propriétés suivantes :
+
+<br>
+
+
+**28. [Name, Mathematical formulation, Notes]**
+
+&#10230; [Nom, Formulation mathématique, Notes]
+
+<br>
+
+
+**29. [Soundness, Completeness]**
+
+&#10230; [Correction, Complétude]
+
+<br>
+
+
+**30. [Inferred formulas are entailed by KB, Can be checked one rule at a time, "Nothing but the truth", Formulas entailing KB are either already in the knowledge base or inferred from it, "The whole truth"]**
+
+&#10230; [Les formules inférées sont déduites par KB, Peut être vérifiée un règle à la fois, "Rien que la vérité", Les formules déduites par KB sont soit déjà dans la base de connaissance, soit inférées de celle-ci, "La vérité dans sa totalité"]
+
+<br>
+
+
+**31. Propositional logic**
+
+&#10230; Logique propositionnelle
+
+<br>
+
+
+**32. In this section, we will go through logic-based models that use logical formulas and inference rules. The idea here is to balance expressivity and computational efficiency.**
+
+&#10230; Dans cette section, nous allons parcourir les modèles logiques utilisant des formules logiques et des règles d'inférence. L'idée est de trouver le juste milieu entre expressivité et efficacité en termes de calculs.
+
+<br>
+
+
+**33. Horn clause ― By noting p1,...,pk and q propositional symbols, a Horn clause has the form:**
+
+&#10230; Clause de Horn - En notant p1,...,pk et q des symboles propositionnels, une clause de Horn s'écrit :
+
+<br>
+
+
+**34. Remark: when q=false, it is called a "goal clause", otherwise we denote it as a "definite clause".**
+
+&#10230; Remarque : quand q=false, cette clause de Horn est "négative", autrement elle est appelée "stricte".
+
+<br>
+
+
+**35. Modus ponens ― For propositional symbols f1,...,fk and p, the modus ponens rule is written:**
+
+&#10230; Modus ponens - Sur les symboles propositionnels f1,...,fk et p, la règle de modus ponens est écrite :
+
+<br>
+
+
+**36. Remark: it takes linear time to apply this rule, as each application generate a clause that contains a single propositional symbol.**
+
+&#10230; Remarque : l'application de cette règle se fait en temps linéaire, puisque chaque exécution génère une clause contenant un symbole propositionnel.
+
+<br>
+
+
+**37. Completeness ― Modus ponens is complete with respect to Horn clauses if we suppose that KB contains only Horn clauses and p is an entailed propositional symbol. Applying modus ponens will then derive p.**
+
+&#10230; Complétude - Modus ponens est complet lorsqu'on le munit des clauses de Horn si l'on suppose que KB contient uniquement des clauses de Horn et que p est un symbole propositionnel qui est déduit. L'application de modus ponens dérivera alors p.
+
+<br>
+
+
+**38. Conjunctive normal form ― A conjunctive normal form (CNF) formula is a conjunction of clauses, where each clause is a disjunction of atomic formulas.**
+
+&#10230; Forme normale conjonctive - La forme normale conjonctive (en anglais conjunctive normal form ou CNF) d'une formule est une conjonction de clauses, chacune d'entre elles étant une dijonction de formules atomiques.
+
+<br>
+
+
+**39. Remark: in other words, CNFs are ∧ of ∨.**
+
+&#10230; Remarque : en d'autres termes, les CNFs sont des ∧ de ∨.
+
+<br>
+
+
+**40. Equivalent representation ― Every formula in propositional logic can be written into an equivalent CNF formula. The table below presents general conversion properties:**
+
+&#10230; Représentation équivalente - Chaque formule en logique propositionnelle peut être écrite de manière équivalente sous la forme d'une formule CNF. Le tableau ci-dessous présente les propriétés principales permettant une telle conversion :
+
+<br>
+
+
+**41. [Rule name, Initial, Converted, Eliminate, Distribute, over]**
+
+&#10230; [Nom de la règle, Début, Résultat, Élimine, Distribue, sur]
+
+<br>
+
+
+**42. Resolution rule ― For propositional symbols f1,...,fn, and g1,...,gm as well as p, the resolution rule is written:**
+
+&#10230; Règle de résolution - Pour des symboles propositionnels f1,...,fn, et g1,...,gm ainsi que p, la règle de résolution s'écrit :
+
+<br>
+
+
+**43. Remark: it can take exponential time to apply this rule, as each application generates a clause that has a subset of the propositional symbols.**
+
+&#10230; Remarque : l'application de cette règle peut prendre un temps exponentiel, vu que chaque itération génère une clause constituée d'une partie des symboles propositionnels.
+
+<br>
+
+
+**44. [Resolution-based inference ― The resolution-based inference algorithm follows the following steps:, Step 1: Convert all formulas into CNF, Step 2: Repeatedly apply resolution rule, Step 3: Return unsatisfiable if and only if False, is derived]**
+
+&#10230; [Inférence basée sur la règle de résolution - L'algorithme d'inférence basée sur la règle de résolution se déroule en plusieurs étapes :, Étape 1 : Conversion de toutes les formules vers leur forme CNF, Étape 2 : Application répétée de la règle de résolution, Étape 3 : Renvoyer "non satisfaisable" si et seulement si False est dérivé]
+
+<br>
+
+
+**45. First-order logic**
+
+&#10230; Calcul des prédicats du premier ordre
+
+<br>
+
+
+**46. The idea here is to use variables to yield more compact knowledge representations.**
+
+&#10230; L'idée ici est d'utiliser des variables et ainsi permettre une représentation des connaissances plus compacte.
+
+<br>
+
+
+**47. [Model ― A model w in first-order logic maps:, constant symbols to objects, predicate symbols to tuple of objects]**
+
+&#10230; [Modèle - Un modèle w en calcul des prédicats du premier ordre lie :, des symboles constants à des objets, des prédicats à n-uplets d'objets]
+
+<br>
+
+
+**48. Horn clause ― By noting x1,...,xn variables and a1,...,ak,b atomic formulas, the first-order logic version of a horn clause has the form:**
+
+&#10230; Clause de Horn - En notant x1,...,xn variables et a1,...,ak,b formules atomiques, une clause de Horn pour le calcul des prédicats du premier ordre a la forme :
+
+<br>
+
+
+**49. Substitution ― A substitution θ maps variables to terms and Subst[θ,f] denotes the result of substitution θ on f.**
+
+&#10230; Substitution - Une substitution θ lie les variables aux termes et Subst[θ,f] désigne le résultat de la substitution θ sur f.
+
+<br>
+
+
+**50. Unification ― Unification takes two formulas f and g and returns the most general substitution θ that makes them equal:**
+
+&#10230; Unification - Une unification prend deux formules f et g et renvoie la substitution θ la plus générale les rendant égales :
+
+<br>
+
+
+**51. such that**
+
+&#10230; tel que
+
+<br>
+
+
+**52. Note: Unify[f,g] returns Fail if no such θ exists.**
+
+&#10230; Note : Unify[f,g] renvoie Fail si un tel θ n'existe pas.
+
+<br>
+
+
+**53. Modus ponens ― By noting x1,...,xn variables, a1,...,ak and a′1,...,a′k atomic formulas and by calling θ=Unify(a′1∧...∧a′k,a1∧...∧ak) the first-order logic version of modus ponens can be written:**
+
+&#10230; Modus ponens - En notant x1,...,xn variables, a1,...,ak et a′1,...,a′k formules atomiques et en notant θ=Unify(a′1∧...∧a′k,a1∧...∧ak), modus ponens pour le calcul des prédicats du premier ordre s'écrit :
+
+<br>
+
+
+**54. Completeness ― Modus ponens is complete for first-order logic with only Horn clauses.**
+
+&#10230; Complétude - Modus ponens est complet pour le calcul des prédicats du premier ordre lorsqu'il agit uniquement sur les clauses de Horn.
+
+<br>
+
+
+**55. Resolution rule ― By noting f1,...,fn, g1,...,gm, p, q formulas and by calling θ=Unify(p,q), the first-order logic version of the resolution rule can be written:**
+
+&#10230; Règle de résolution - En notant f1,...,fn, g1,...,gm, p, q formules et en posant θ=Unify(p,q), le règle de résolution pour le calcul des prédicats du premier ordre s'écrit :
+
+<br>
+
+
+**56. [Semi-decidability ― First-order logic, even restricted to only Horn clauses, is semi-decidable., if KB⊨f, forward inference on complete inference rules will prove f in finite time, if KB⊭f, no algorithm can show this in finite time]**
+
+&#10230; [Semi-décidabilité - Le calcul des prédicats du premier ordre, même restreint aux clauses de Horn, n'est que semi-décidable., si KB⊨f, l'algorithme de chaînage avant sur des règles d'inférence complètes prouvera f en temps fini, si KB⊭f, aucun algorithme ne peut le prouver en temps fini]
+
+<br>
+
+
+**57. [Basics, Notations, Model, Interpretation function, Set of models]**
+
+&#10230; [Bases, Notations, Modèle, Interprétation, Ensemble de modèles]
+
+<br>
+
+
+**58. [Knowledge base, Definition, Probabilistic interpretation, Satisfiability, Relationship with formulas, Forward inference, Rule properties]**
+
+&#10230; [Base de connaissance, Définition, Interprétation en termes de probabilité, Satisfaisabilité, Lien avec les formules, Chaînage en avant, Propriétés des règles]
+
+<br>
+
+
+**59. [Propositional logic, Clauses, Modus ponens, Conjunctive normal form, Representation equivalence, Resolution]**
+
+&#10230; [Logique propositionnelle, Clauses, Modus ponens, Forme normale conjonctive, Représentation équivalente, Résolution]
+
+<br>
+
+
+**60. [First-order logic, Substitution, Unification, Resolution rule, Modus ponens, Resolution, Semi-decidability]**
+
+&#10230; [Calcul des prédicats du premier ordre, Substitution, Unification, Règle de résolution, Modus ponens, Résolution, Semi-décidabilité]
+
+<br>
+
+
+**61. View PDF version on GitHub**
+
+&#10230; Voir la version PDF sur GitHub
+
+<br>
+
+
+**62. Original authors**
+
+&#10230; Auteurs originaux.
+
+<br>
+
+
+**63. Translated by X, Y and Z**
+
+&#10230; Traduit par X, Y et Z.
+
+<br>
+
+
+**64. Reviewed by X, Y and Z**
+
+&#10230; Revu par X, Y et Z.
+
+<br>
+
+
+**65. By X and Y**
+
+&#10230; Par X et Y.
+
+<br>
+
+
+**66. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230; Les pense-bêtes d'intelligence artificielle sont maintenant disponibles en français.

From aff731c18ae6adb58794e8802fa0bcfe811269ad Mon Sep 17 00:00:00 2001
From: shervinea <shervine.amidi@centraliens.net>
Date: Tue, 30 Jul 2019 23:44:05 -0700
Subject: [PATCH 291/531] Fix typo

---
 fr/cs-229-deep-learning.md                    |  4 +--
 fr/cs-229-linear-algebra.md                   | 12 ++++-----
 fr/cs-229-machine-learning-tips-and-tricks.md |  2 +-
 fr/cs-229-probability.md                      |  4 +--
 fr/cs-229-supervised-learning.md              | 12 ++++-----
 fr/cs-229-unsupervised-learning.md            | 14 +++++-----
 fr/cs-230-convolutional-neural-networks.md    | 26 +++++++++----------
 fr/cs-230-deep-learning-tips-and-tricks.md    | 20 +++++++-------
 fr/cs-230-recurrent-neural-networks.md        | 18 ++++++-------
 9 files changed, 56 insertions(+), 56 deletions(-)

diff --git a/fr/cs-229-deep-learning.md b/fr/cs-229-deep-learning.md
index 4045d723c..56073a5e8 100644
--- a/fr/cs-229-deep-learning.md
+++ b/fr/cs-229-deep-learning.md
@@ -120,7 +120,7 @@
 
 **21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
 
-&#10230; Pré-requis de la couche convolutionelle ― Si l'on note W la taille du volume d'entrée, F la taille de la couche de neurones convolutionelle, P la quantité de zero padding, alors le nombre de neurones N qui tient dans un volume donné est tel que :
+&#10230; Pré-requis de la couche convolutionnelle ― Si l'on note W la taille du volume d'entrée, F la taille de la couche de neurones convolutionnelle, P la quantité de zero padding, alors le nombre de neurones N qui tient dans un volume donné est tel que :
 
 <br>
 
@@ -132,7 +132,7 @@
 
 **23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
 
-&#10230; Cela est normalement effectué après une couche fully-connected/couche convolutionelle et avant une couche de non-linéarité et a pour but de permettre un taux d'apprentissage plus grand et de réduire une dépendance trop forte à l'initialisation.
+&#10230; Cela est normalement effectué après une couche fully-connected/couche convolutionnelle et avant une couche de non-linéarité et a pour but de permettre un taux d'apprentissage plus grand et de réduire une dépendance trop forte à l'initialisation.
 
 <br>
 
diff --git a/fr/cs-229-linear-algebra.md b/fr/cs-229-linear-algebra.md
index 37329faa3..f1aea7efd 100644
--- a/fr/cs-229-linear-algebra.md
+++ b/fr/cs-229-linear-algebra.md
@@ -42,7 +42,7 @@
 
 **8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
 
-&#10230; Matrice identitée ― La matrice identitée I∈Rn×n est une matrice carrée avec des 1 sur sa diagonale et des 0 partout ailleurs :
+&#10230; Matrice identité ― La matrice identité I∈Rn×n est une matrice carrée avec des 1 sur sa diagonale et des 0 partout ailleurs :
 
 <br>
 
@@ -150,7 +150,7 @@
 
 **26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
 
-&#10230; Trace ― La trace d'une matrice carée A, notée tr(A), est définie comme la somme de ses coefficients diagonaux:
+&#10230; Trace ― La trace d'une matrice carrée A, notée tr(A), est définie comme la somme de ses coefficients diagonaux:
 
 <br>
 
@@ -186,7 +186,7 @@
 
 **32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
 
-&#10230; Décomposition symmétrique ― Une matrice donnée A peut être exprimée en termes de ses parties symétrique et antisymétrique de la manière suivante :
+&#10230; Décomposition symétrique ― Une matrice donnée A peut être exprimée en termes de ses parties symétrique et antisymétrique de la manière suivante :
 
 <br>
 
@@ -252,7 +252,7 @@
 
 **43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
 
-&#10230; Remarque : de manière similaire, une matrice A est dite définie positive et est notée A≻0 si elle est semi-définie positive et que pour tout vector x non-nul, on a xTAx>0.
+&#10230; Remarque : de manière similaire, une matrice A est dite définie positive et est notée A≻0 si elle est semi-définie positive et que pour tout vecteur x non-nul, on a xTAx>0.
 
 <br>
 
@@ -264,7 +264,7 @@
 
 **45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230; Théorème spectral ― Soit A∈Rn×n. Si A est symmétrique, alors A est diagonalisable par une matrice orthogonale réelle U∈Rn×n. En notant Λ=diag(λ1,...,λn), on a :
+&#10230; Théorème spectral ― Soit A∈Rn×n. Si A est symétrique, alors A est diagonalisable par une matrice orthogonale réelle U∈Rn×n. En notant Λ=diag(λ1,...,λn), on a :
 
 <br>
 
@@ -300,7 +300,7 @@
 
 **51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
 
-&#10230; Hessienne ― Soit f:Rn→R une fonction et x∈Rn un vecteur. La hessienne de f par rapport à x est une matrice symmetrique n×n, notée ∇2xf(x), telle que : 
+&#10230; Hessienne ― Soit f:Rn→R une fonction et x∈Rn un vecteur. La hessienne de f par rapport à x est une matrice symétrique n×n, notée ∇2xf(x), telle que :
 
 <br>
 
diff --git a/fr/cs-229-machine-learning-tips-and-tricks.md b/fr/cs-229-machine-learning-tips-and-tricks.md
index d74182df0..2adf1db50 100644
--- a/fr/cs-229-machine-learning-tips-and-tricks.md
+++ b/fr/cs-229-machine-learning-tips-and-tricks.md
@@ -198,7 +198,7 @@
 
 **34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
-&#10230; [Réduit les coefficients à 0, Bon pour la sélection de variables, Rend les coefficients plus petits, Compromis entre la selection de variables et la réduction de coefficients]
+&#10230; [Réduit les coefficients à 0, Bon pour la sélection de variables, Rend les coefficients plus petits, Compromis entre la sélection de variables et la réduction de coefficients]
 
 <br>
 
diff --git a/fr/cs-229-probability.md b/fr/cs-229-probability.md
index fe4562f80..8e407b9b2 100644
--- a/fr/cs-229-probability.md
+++ b/fr/cs-229-probability.md
@@ -36,7 +36,7 @@
 
 **7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
 
-&#10230; Axiome 2 ― La probabilité qu'au moins un des évènements élementaires de tout l'univers se produise est 1, i.e.
+&#10230; Axiome 2 ― La probabilité qu'au moins un des évènements élémentaires de tout l'univers se produise est 1, i.e.
 
 <br>
 
@@ -120,7 +120,7 @@
 
 **21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
 
-&#10230; Variable aléatoire ― Une variable aléatoire, souvent notée X, est une fonction qui associe chaque élement de l'univers de probabilité à la droite des réels.
+&#10230; Variable aléatoire ― Une variable aléatoire, souvent notée X, est une fonction qui associe chaque élément de l'univers de probabilité à la droite des réels.
 
 <br>
 
diff --git a/fr/cs-229-supervised-learning.md b/fr/cs-229-supervised-learning.md
index 2f4850d1f..b79583323 100644
--- a/fr/cs-229-supervised-learning.md
+++ b/fr/cs-229-supervised-learning.md
@@ -42,7 +42,7 @@
 
 **8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
 
-&#10230; [Modèle discriminatif, Modèle génératif, But, Ce qui est appris, Illustration, Exemples]
+&#10230; [Modèle discriminant, Modèle génératif, But, Ce qui est appris, Illustration, Exemples]
 
 <br>
 
@@ -66,7 +66,7 @@
 
 **12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
 
-&#10230; Fonction de loss ― Une fonction de loss est une fonction L:(z,y)∈R×Y⟼L(z,y)∈R prennant comme entrée une valeur prédite z correspondant à une valeur réelle y, et nous renseigne sur la ressemblance de ces deux valeurs. Les fonctions de loss courantes sont récapitulées dans le tableau ci-dessous :
+&#10230; Fonction de loss ― Une fonction de loss est une fonction L:(z,y)∈R×Y⟼L(z,y)∈R prenant comme entrée une valeur prédite z correspondant à une valeur réelle y, et nous renseigne sur la ressemblance de ces deux valeurs. Les fonctions de loss courantes sont récapitulées dans le tableau ci-dessous :
 
 <br>
 
@@ -138,7 +138,7 @@
 
 **24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
 
-&#10230; Équations normales ― En notant X la matrice de design, la valeur de θ qui minimize la fonction de cost a une solution de forme fermée tel que :
+&#10230; Équations normales ― En notant X la matrice de design, la valeur de θ qui minimise la fonction de cost a une solution de forme fermée tel que :
 
 <br>
 
@@ -186,7 +186,7 @@
 
 **32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
 
-&#10230; Régression softmax ― Une régression softmax, aussi appelée un régression logistique multiclasse, est utilisée pour généraliser la régression logistique lorsqu'il y a plus de 2 classes à prédire. Par convention, on fixe θK=0, ce qui oblige le paramètre de Bernoulli ϕi de chaque classe i à être égal à :
+&#10230; Régression softmax ― Une régression softmax, aussi appelée un régression logistique multi-classe, est utilisée pour généraliser la régression logistique lorsqu'il y a plus de 2 classes à prédire. Par convention, on fixe θK=0, ce qui oblige le paramètre de Bernoulli ϕi de chaque classe i à être égal à :
 
 <br>
 
@@ -210,7 +210,7 @@
 
 **36. Here are the most common exponential distributions summed up in the following table:**
 
-&#10230; Les distributions exponentielles les plus communémment rencontrées sont récapitulées dans le tableau ci-dessous : 
+&#10230; Les distributions exponentielles les plus communément rencontrées sont récapitulées dans le tableau ci-dessous :
 
 <br>
 
@@ -324,7 +324,7 @@
 
 **55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
 
-&#10230; Un modèle génératif essaie d'abord d'apprendre comment les données sont générées en estimant P(x|y), nous permettant ensuite d'estimer P(y|x) par le biais du théorème de Bayes. 
+&#10230; Un modèle génératif essaie d'abord d'apprendre comment les données sont générées en estimant P(x|y), nous permettant ensuite d'estimer P(y|x) par le biais du théorème de Bayes.
 
 <br>
 
diff --git a/fr/cs-229-unsupervised-learning.md b/fr/cs-229-unsupervised-learning.md
index f64268a4b..7757f9539 100644
--- a/fr/cs-229-unsupervised-learning.md
+++ b/fr/cs-229-unsupervised-learning.md
@@ -12,7 +12,7 @@
 
 **3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
 
-&#10230; Motivation ― Le but de l'apprentissage non-supervisé est de trouver des formes cachées dans un jeu de données non-labelées {x(1),...,x(m)}.
+&#10230; Motivation ― Le but de l'apprentissage non-supervisé est de trouver des formes cachées dans un jeu de données non annotées {x(1),...,x(m)}.
 
 <br>
 
@@ -66,7 +66,7 @@
 
 **12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
 
-&#10230; M-step : Utiliser les probabilités postérieures Qi(z(i)) en tant que coefficients propres aux partitions sur les points x(i) pour ré-estimer séparemment chaque modèle de partition de la manière suivante :
+&#10230; M-step : Utiliser les probabilités postérieures Qi(z(i)) en tant que coefficients propres aux partitions sur les points x(i) pour ré-estimer séparément chaque modèle de partition de la manière suivante :
 
 <br>
 
@@ -102,7 +102,7 @@
 
 **18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
 
-&#10230; Fonction de distortion ― Pour voir si l'algorithme converge, on regarde la fonction de distortion définie de la manière suivante :
+&#10230; Fonction de distorsion ― Pour voir si l'algorithme converge, on regarde la fonction de distorsion définie de la manière suivante :
 
 <br>
 
@@ -192,7 +192,7 @@
 
 **33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230; Théorème spectral ― Soit A∈Rn×n. Si A est symmétrique, alors A est diagonalisable par une matrice réelle orthogonale U∈Rn×n. En notant Λ=diag(λ1,...,λn), on a :
+&#10230; Théorème spectral ― Soit A∈Rn×n. Si A est symétrique, alors A est diagonalisable par une matrice réelle orthogonale U∈Rn×n. En notant Λ=diag(λ1,...,λn), on a :
 
 <br>
 
@@ -222,7 +222,7 @@
 
 **38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
 
-&#10230; Étape 2 : Calculer Σ=1mm∑i=1x(i)x(i)T∈Rn×n, qui est symmétrique et aux valeurs propres réelles.
+&#10230; Étape 2 : Calculer Σ=1mm∑i=1x(i)x(i)T∈Rn×n, qui est symétrique et aux valeurs propres réelles.
 
 <br>
 
@@ -264,7 +264,7 @@
 
 **45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
 
-&#10230; Hypothèses ― On suppose que nos données x ont été générées par un vecteur source à n dimensions s=(s1,...,sn), où les si sont des variables aléatoires indépendantes, par le biais d'une matrice de mélange et inversible A de la manière suivante : 
+&#10230; Hypothèses ― On suppose que nos données x ont été générées par un vecteur source à n dimensions s=(s1,...,sn), où les si sont des variables aléatoires indépendantes, par le biais d'une matrice de mélange et inversible A de la manière suivante :
 
 <br>
 
@@ -294,4 +294,4 @@
 
 **50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
 
-&#10230; Par conséquent, l'algorithme du gradient stochastique est tel que pour chaque example de ensemble d'apprentissage x(i), on met à jour W de la manière suivante :
+&#10230; Par conséquent, l'algorithme du gradient stochastique est tel que pour chaque exemple de ensemble d'apprentissage x(i), on met à jour W de la manière suivante :
diff --git a/fr/cs-230-convolutional-neural-networks.md b/fr/cs-230-convolutional-neural-networks.md
index 3cdace39e..29cca030e 100644
--- a/fr/cs-230-convolutional-neural-networks.md
+++ b/fr/cs-230-convolutional-neural-networks.md
@@ -158,7 +158,7 @@
 
 **23. Filter hyperparameters**
 
-&#10230; Paramètres du filtre 
+&#10230; Paramètres du filtre
 
 <br>
 
@@ -200,7 +200,7 @@
 
 **29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
 
-&#10230; Zero-padding ― Le zero-padding est une technique consistant à ajouter P zeros à chaque côté des frontières de l'entrée. Cette valeur peut être spécifiée soit manuellement, soit automatiquement par le bias d'une des configurations détaillées ci-dessous :
+&#10230; Zero-padding ― Le zero-padding est une technique consistant à ajouter P zeros à chaque côté des frontières de l'entrée. Cette valeur peut être spécifiée soit manuellement, soit automatiquement par le biais d'une des configurations détaillées ci-dessous :
 
 <br>
 
@@ -277,7 +277,7 @@
 
 **40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
 
-&#10230; [L'entrée est aplatie, Un paramètre de bias par neurone, Le choix du nombre de neurones de FC est libre]
+&#10230; [L'entrée est aplatie, Un paramètre de biais par neurone, Le choix du nombre de neurones de FC est libre]
 
 <br>
 
@@ -305,7 +305,7 @@
 
 **44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
 
-&#10230; Unité linéaire rectifiée ― La couche d'unité linéaire rectifiée (en anglais <i>rectified linear unit layer</i>) (ReLU) est une fonction d'activiation g qui est utilisée sur tous les éléments du volume. Elle a pour but d'introduire des complexités non-linéaires au réseau. Ses variantes sont récapitulées dans le tableau suivant :
+&#10230; Unité linéaire rectifiée ― La couche d'unité linéaire rectifiée (en anglais <i>rectified linear unit layer</i>) (ReLU) est une fonction d'activation g qui est utilisée sur tous les éléments du volume. Elle a pour but d'introduire des complexités non-linéaires au réseau. Ses variantes sont récapitulées dans le tableau suivant :
 
 <br>
 
@@ -319,7 +319,7 @@
 
 **46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
 
-&#10230; [Complexités non-linéaires intereprétables d'un point de vue biologique, Repond au problème de <i>dying ReLU</i>, Dérivable partout]
+&#10230; [Complexités non-linéaires interprétables d'un point de vue biologique, Répond au problème de <i>dying ReLU</i>, Dérivable partout]
 
 <br>
 
@@ -368,7 +368,7 @@
 
 **53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
 
-&#10230; [Classifie une image, Predit la probabilité d'un objet, Détecte un objet dans une image, Prédit la probabilité de présence d'un objet et où il est situé, Peut détecter plusieurs objets dans une image, Prédit les probabilités de présence des objets et où ils sont situés]
+&#10230; [Classifie une image, Prédit la probabilité d'un objet, Détecte un objet dans une image, Prédit la probabilité de présence d'un objet et où il est situé, Peut détecter plusieurs objets dans une image, Prédit les probabilités de présence des objets et où ils sont situés]
 
 <br>
 
@@ -382,14 +382,14 @@
 
 **55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
 
-&#10230; Détection ― Dans le contexte de la détection d'objet, des methodes différentes sont utilisées selon si l'on veut juste localiser l'objet ou alors détecter une forme plus complexe dans l'image. Les deux méthodes principales sont résumées dans le tableau ci-dessous :
+&#10230; Détection ― Dans le contexte de la détection d'objet, des méthodes différentes sont utilisées selon si l'on veut juste localiser l'objet ou alors détecter une forme plus complexe dans l'image. Les deux méthodes principales sont résumées dans le tableau ci-dessous :
 
 <br>
 
 
 **56. [Bounding box detection, Landmark detection]**
 
-&#10230; [Détection de zone délimitante, Detection de forme complexe]
+&#10230; [Détection de zone délimitante, Détection de forme complexe]
 
 <br>
 
@@ -424,14 +424,14 @@
 
 **61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
 
-&#10230; Zone d'accroche ― La technique des zones d'accroche (en anglais <i>anchor boxing</i>) sert à prédire des zones délimitantes qui se chevauchent. En pratique, on permet au réseau de prédire plus d'une zone délimitante simultanément, où chaque zone prédite doit respecter une forme géométrique particulière. Par example, la première prédiction peut potentiellement être une zone rectangulaire d'une forme donnée, tandis qu'une seconde prédiction doit être une zone rectangulaire d'une autre forme.
+&#10230; Zone d'accroche ― La technique des zones d'accroche (en anglais <i>anchor boxing</i>) sert à prédire des zones délimitantes qui se chevauchent. En pratique, on permet au réseau de prédire plus d'une zone délimitante simultanément, où chaque zone prédite doit respecter une forme géométrique particulière. Par exemple, la première prédiction peut potentiellement être une zone rectangulaire d'une forme donnée, tandis qu'une seconde prédiction doit être une zone rectangulaire d'une autre forme.
 
 <br>
 
 
 **62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
 
-&#10230; Suppression non-max ― La technique de suppression non-max (en anglais <i>non-max suppression</i>) a pour but d'enlever des zones délimitantes qui se chevauchent et qui prédisent un seul et même objet, en sélectionnant les zones les plus representatives. Après avoir enlevé toutes les zones ayant une probabilité prédite de moins de 0.6, les étapes suivantes sont répétées pour éliminer les zones redondantes :
+&#10230; Suppression non-max ― La technique de suppression non-max (en anglais <i>non-max suppression</i>) a pour but d'enlever des zones délimitantes qui se chevauchent et qui prédisent un seul et même objet, en sélectionnant les zones les plus représentatives. Après avoir enlevé toutes les zones ayant une probabilité prédite de moins de 0.6, les étapes suivantes sont répétées pour éliminer les zones redondantes :
 
 <br>
 
@@ -466,7 +466,7 @@
 
 **67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
 
-&#10230; où pc est la probabilité de détecter un objet, bx,by,bh,bw sont les propriétés de la zone délimitante détectée, c1,...,cp est une répresentation binaire (en anglais <i>one-hot representation</i>) de l'une des p classes détectée, et k est le nombre de zones d'accroche.
+&#10230; où pc est la probabilité de détecter un objet, bx,by,bh,bw sont les propriétés de la zone délimitante détectée, c1,...,cp est une représentation binaire (en anglais <i>one-hot representation</i>) de l'une des p classes détectée, et k est le nombre de zones d'accroche.
 
 <br>
 
@@ -480,7 +480,7 @@
 
 **69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
 
-&#10230; [Image originale, Division en une grille de taille GxG, Prediction de zone délimitante, Suppression non-max]
+&#10230; [Image originale, Division en une grille de taille GxG, Prédiction de zone délimitante, Suppression non-max]
 
 <br>
 
@@ -550,7 +550,7 @@
 
 **79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
 
-&#10230; Réseaux siamois ― Les réseaux siamois (en anglais <i>Siamese Networks</i>) ont pour but d'apprendre comment encoder des images pour quantifier le degré de difference de deux images données. Pour une image d'entrée donnée x(i), l'encodage de sortie est souvent notée f(x(i)).
+&#10230; Réseaux siamois ― Les réseaux siamois (en anglais <i>Siamese Networks</i>) ont pour but d'apprendre comment encoder des images pour quantifier le degré de différence de deux images données. Pour une image d'entrée donnée x(i), l'encodage de sortie est souvent notée f(x(i)).
 
 <br>
 
diff --git a/fr/cs-230-deep-learning-tips-and-tricks.md b/fr/cs-230-deep-learning-tips-and-tricks.md
index de05f1f40..4c84b51f4 100644
--- a/fr/cs-230-deep-learning-tips-and-tricks.md
+++ b/fr/cs-230-deep-learning-tips-and-tricks.md
@@ -25,7 +25,7 @@
 
 **4. [Data processing, Data augmentation, Batch normalization]**
 
-&#10230; [Traitement des données, Augmentation des données, Normalization de lot]
+&#10230; [Traitement des données, Augmentation des données, Normalisation de lot]
 
 <br>
 
@@ -39,7 +39,7 @@
 
 **6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
 
-&#10230; [Ajustement de paramètres, Initialisation de Xavier, Apprentissage par transfert, Taux d'apprentissage, Taux d'apprentissage adaptifs]
+&#10230; [Ajustement de paramètres, Initialisation de Xavier, Apprentissage par transfert, Taux d'apprentissage, Taux d'apprentissage adaptatifs]
 
 <br>
 
@@ -81,14 +81,14 @@
 
 **12. [Original, Flip, Rotation, Random crop]**
 
-&#10230; [Original, Symmétrie axiale, Rotation, Recadrage aléatoire]
+&#10230; [Original, Symétrie axiale, Rotation, Recadrage aléatoire]
 
 <br>
 
 
 **13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
 
-&#10230; [Image sans aucune modification, Symmetrie par rapport à un axe pour lequel le sens de l'image est conservé, Rotation avec un petit angle, Reproduit une calibration imparfaite de l'horizon, Concentration aléatoire sur une partie de l'image, Plusieurs rognements aléatoires peuvent être faits à la suite]
+&#10230; [Image sans aucune modification, Symétrie par rapport à un axe pour lequel le sens de l'image est conservé, Rotation avec un petit angle, Reproduit une calibration imparfaite de l'horizon, Concentration aléatoire sur une partie de l'image, Plusieurs rognements aléatoires peuvent être faits à la suite]
 
 <br>
 
@@ -144,7 +144,7 @@
 
 **21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
 
-&#10230; Epoch ― Dans le contexte de l'entraînement d'un modèle, l'<i>epoch</i> est un terme utilisé pour réferer à une itération où le modèle voit tout le training set pour mettre à jour ses coefficients.
+&#10230; Epoch ― Dans le contexte de l'entraînement d'un modèle, l'<i>epoch</i> est un terme utilisé pour référer à une itération où le modèle voit tout le training set pour mettre à jour ses coefficients.
 
 <br>
 
@@ -235,7 +235,7 @@
 
 **34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
 
-&#10230; Apprentissage de transfert ― Entraîner un modèle d'apprentissage profond requière beaucoup de données et beaucoup de temps. Il est souvent utile de profiter de coefficients pre-entraînés sur des données énormes qui ont pris des jours/semaines pour être entraînés, et profiter de cela pour notre cas. Selon la quantité de données que l'on a sous la main, voici différentes manières d'utiliser cette methode :
+&#10230; Apprentissage de transfert ― Entraîner un modèle d'apprentissage profond requière beaucoup de données et beaucoup de temps. Il est souvent utile de profiter de coefficients pre-entraînés sur des données énormes qui ont pris des jours/semaines pour être entraînés, et profiter de cela pour notre cas. Selon la quantité de données que l'on a sous la main, voici différentes manières d'utiliser cette méthode :
 
 <br>
 
@@ -333,7 +333,7 @@
 
 **48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
 
-&#10230; Remarque : la plupart des frameworks d'apprentissage profond paramètrent le dropout à travers le paramètre 'garder' 1-p.
+&#10230; Remarque : la plupart des frameworks d'apprentissage profond paramétrisent le dropout à travers le paramètre 'garder' 1-p.
 
 <br>
 
@@ -387,7 +387,7 @@
 
 **55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
 
-&#10230; Gradient checking ― La méthode de gradient checking est utilisée durant l'implémentation d'un backward pass d'un réseau de neurones. Elle compare la valeur du gradient analytique par rapport au gradient numérique au niveau de certains points et joue un rôle de vérification élementaire.
+&#10230; Gradient checking ― La méthode de gradient checking est utilisée durant l'implémentation d'un backward pass d'un réseau de neurones. Elle compare la valeur du gradient analytique par rapport au gradient numérique au niveau de certains points et joue un rôle de vérification élémentaire.
 
 <br>
 
@@ -415,14 +415,14 @@
 
 **59. ['Exact' result, Direct computation, Used in the final implementation]**
 
-&#10230; [Resultat 'exact', Calcul direct, Utilisé dans l'implémentation finale]
+&#10230; [Résultat 'exact', Calcul direct, Utilisé dans l'implémentation finale]
 
 <br>
 
 
 **60. The Deep Learning cheatsheets are now available in [target language].**
 
-&#10230; Les pense-bêtes d'appentissage profond sont maintenant disponibles en français.
+&#10230; Les pense-bêtes d'apprentissage profond sont maintenant disponibles en français.
 
 <br>
 
diff --git a/fr/cs-230-recurrent-neural-networks.md b/fr/cs-230-recurrent-neural-networks.md
index 88f1ccec3..e7d8f5343 100644
--- a/fr/cs-230-recurrent-neural-networks.md
+++ b/fr/cs-230-recurrent-neural-networks.md
@@ -74,7 +74,7 @@
 
 **11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
 
-&#10230; Architecture d'un RNN traditionnel ― Les réseaux de neurones récurrents (en anglais <i>recurrent neural networks</i>), aussi appelés RNNs, sont une classe de réseaux de neurones qui permettent aux prédictions antérieures d'être utilisées comme entrées, par le bias d'états cachés (en anglais <i>hidden states</i>). Ils sont de la forme suivante :
+&#10230; Architecture d'un RNN traditionnel ― Les réseaux de neurones récurrents (en anglais <i>recurrent neural networks</i>), aussi appelés RNNs, sont une classe de réseaux de neurones qui permettent aux prédictions antérieures d'être utilisées comme entrées, par le biais d'états cachés (en anglais <i>hidden states</i>). Ils sont de la forme suivante :
 
 <br>
 
@@ -95,7 +95,7 @@
 
 **14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
 
-&#10230; où Wax,Waa,Wya,ba,by sont des coefficients indépendents du temps et où g1,g2 sont des fonctions d'activation.
+&#10230; où Wax,Waa,Wya,ba,by sont des coefficients indépendants du temps et où g1,g2 sont des fonctions d'activation.
 
 <br>
 
@@ -109,7 +109,7 @@
 
 **16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
 
-&#10230; [Avantages, Possibilité de prendre en compte des entrées de toute taille, La taille du modèle n'augmente pas avec la taille de l'entrée, Les calculs prennent en compte les informations antérieures, Les coefficients sont indépendents du temps]
+&#10230; [Avantages, Possibilité de prendre en compte des entrées de toute taille, La taille du modèle n'augmente pas avec la taille de l'entrée, Les calculs prennent en compte les informations antérieures, Les coefficients sont indépendants du temps]
 
 <br>
 
@@ -144,7 +144,7 @@
 
 **21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
 
-&#10230; [Réseau de neurones traditionnel, Géneration de musique, Classification de sentiment, Reconnaissance d'entité, Traduction machine]
+&#10230; [Réseau de neurones traditionnel, Génération de musique, Classification de sentiment, Reconnaissance d'entité, Traduction machine]
 
 <br>
 
@@ -249,7 +249,7 @@
 
 **36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
 
-&#10230; GRU/LSTM ― Les unités de porte récurrente (en anglais <i>Gated Recurrent Unit</i>) (GRU) et les unités de mémoire à long/court terme (en anglais <i>Long Short-Term Memory units</i>) (LSTM) appaisent le problème du gradient qui disparait rencontré par les RNNs traditionnels, où le LSTM peut être vu comme étant une généralisation du GRU. Le tableau ci-dessous résume les équations caractéristiques de chacune de ces architectures :
+&#10230; GRU/LSTM ― Les unités de porte récurrente (en anglais <i>Gated Recurrent Unit</i>) (GRU) et les unités de mémoire à long/court terme (en anglais <i>Long Short-Term Memory units</i>) (LSTM) apaisent le problème du gradient qui disparait rencontré par les RNNs traditionnels, où le LSTM peut être vu comme étant une généralisation du GRU. Le tableau ci-dessous résume les équations caractéristiques de chacune de ces architectures :
 
 <br>
 
@@ -270,7 +270,7 @@
 
 **39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
 
-&#10230; Variantes des RNNs ― Le tableau ci-dessous récapitule les autres architectures RNN commumément utilisées :
+&#10230; Variantes des RNNs ― Le tableau ci-dessous récapitule les autres architectures RNN communément utilisées :
 
 <br>
 
@@ -412,7 +412,7 @@
 Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
 
 &#10230; où f est une fonction à coefficients telle que Xi,j=0⟹f(Xi,j)=0.
-Étant donné la symmétrie que e et θ ont dans un modèle, la représentation du mot final e(final)w est donnée par :
+Étant donné la symétrie que e et θ ont dans un modèle, la représentation du mot final e(final)w est donnée par :
 
 <br>
 
@@ -531,7 +531,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
 
-&#10230; Largeur du faisceau ― La largeur du faisceau (en anglais <i>beam width</i>) B est un paramètre de la recherche en faisceau. De grandes valeurs de B conduisent à avoir de meilleurs résultats mais avec un coût de mémoire plus lourd et à un temps de calcul plus long. De faibles valeurs de B conduisent à de moins bons résultats mais avec un coût de calcul plus faible. Une valeur de B égale à 10 est standarde et est souvent utilisée.
+&#10230; Largeur du faisceau ― La largeur du faisceau (en anglais <i>beam width</i>) B est un paramètre de la recherche en faisceau. De grandes valeurs de B conduisent à avoir de meilleurs résultats mais avec un coût de mémoire plus lourd et à un temps de calcul plus long. De faibles valeurs de B conduisent à de moins bons résultats mais avec un coût de calcul plus faible. Une valeur de B égale à 10 est standard et est souvent utilisée.
 
 <br>
 
@@ -580,7 +580,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **82. where pn is the bleu score on n-gram only defined as follows:**
 
-&#10230; où pn est le score bleu uniqué basé sur les n-gram, défini par :
+&#10230; où pn est le score bleu uniquement basé sur les n-gram, défini par :
 
 <br>
 

From 0f6f18e323b0636d76884589198ec479b7b2b14d Mon Sep 17 00:00:00 2001
From: shervinea <shervine.amidi@centraliens.net>
Date: Wed, 31 Jul 2019 16:40:39 -0700
Subject: [PATCH 292/531] Add reflex translation in [fr]

---
 fr/cs-221-reflex-models.md | 539 +++++++++++++++++++++++++++++++++++++
 1 file changed, 539 insertions(+)
 create mode 100644 fr/cs-221-reflex-models.md

diff --git a/fr/cs-221-reflex-models.md b/fr/cs-221-reflex-models.md
new file mode 100644
index 000000000..0ec1fd159
--- /dev/null
+++ b/fr/cs-221-reflex-models.md
@@ -0,0 +1,539 @@
+**Reflex-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-reflex-models)
+
+<br>
+
+**1. Reflex-based models with Machine Learning**
+
+&#10230; Modèles basés sur le réflex à l'aide de l'apprentissage automatique
+
+<br>
+
+
+**2. Linear predictors**
+
+&#10230; Prédicteurs linéaires
+
+<br>
+
+
+**3. In this section, we will go through reflex-based models that can improve with experience, by going through samples that have input-output pairs.**
+
+&#10230; Dans cette section, nous allons explorer les modèles basés sur le réflex qui peuvent s'améliorer avec l'expérience s'appuyant sur des données ayant une correspondance entrée-sortie.
+
+<br>
+
+
+**4. Feature vector ― The feature vector of an input x is noted ϕ(x) and is such that:**
+
+&#10230; Vecteur caractéristique - Le vecteur caractéristique (en anglais feature vector) d'une entrée x est noté ϕ(x) et se décompose en :
+
+<br>
+
+
+**5. Score ― The score s(x,w) of an example (ϕ(x),y)∈Rd×R associated to a linear model of weights w∈Rd is given by the inner product:**
+
+&#10230; Score - Le score s(x,w) d'un exemple (ϕ(x),y)∈Rd×R associé à un modèle linéaire de paramètres w∈Rd est donné par le produit scalaire :
+
+<br>
+
+
+**6. Classification**
+
+&#10230; Classification
+
+<br>
+
+
+**7. Linear classifier ― Given a weight vector w∈Rd and a feature vector ϕ(x)∈Rd, the binary linear classifier fw is given by:**
+
+&#10230; Classifieur linéaire - Étant donnés un vecteur de paramètres w∈Rd et un vecteur caractéristique ϕ(x)∈Rd, le classifieur linéaire binaire est donné par :
+
+<br>
+
+
+**8. if**
+
+&#10230; si
+
+<br>
+
+
+**9. Margin ― The margin m(x,y,w)∈R of an example (ϕ(x),y)∈Rd×{−1,+1} associated to a linear model of weights w∈Rd quantifies the confidence of the prediction: larger values are better. It is given by:**
+
+&#10230; Marge - La marge (en anglais margin) m(x,y,w)∈R d'un exemple (ϕ(x),y)∈Rd×{−1,+1} associée à un modèle linéaire de paramètre w∈Rd quantifie la confiance associée à une prédiction : plus cette valeur est grande, mieux c'est. Cette quantité est donnée par :
+
+<br>
+
+
+**10. Regression**
+
+&#10230; Régression
+
+<br>
+
+
+**11. Linear regression ― Given a weight vector w∈Rd and a feature vector ϕ(x)∈Rd, the output of a linear regression of weights w denoted as fw is given by:**
+
+&#10230; Régression linéaire - Étant donnés un vecteur de paramètres w∈Rd et un vecteur caractéristique ϕ(x)∈Rd, le résultat d'une régression linéaire de paramètre w, notée fw, est donné par :
+
+<br>
+
+
+**12. Residual ― The residual res(x,y,w)∈R is defined as being the amount by which the prediction fw(x) overshoots the target y:**
+
+&#10230; Résidu - Le résidu res(x,y,w)∈R est défini comme étant la différence entre la prédiction fw(x) et la vraie valeur y.
+
+<br>
+
+
+**13. Loss minimization**
+
+&#10230; Minimisation de la fonction objectif
+
+<br>
+
+
+**14. Loss function ― A loss function Loss(x,y,w) quantifies how unhappy we are with the weights w of the model in the prediction task of output y from input x. It is a quantity we want to minimize during the training process.**
+
+&#10230; Fonction objectif - Une fonction objectif (en anglais loss function) Loss(x,y,w) traduit notre niveau d'insatisfaction avec les paramètres w du modèle dans la tâche de prédiction de la sortie y à partir de l'entrée x. C'est une quantité que l'on souhaite minimiser pendant la phase d'entraînement.
+
+<br>
+
+
+**15. Classification case - The classification of a sample x of true label y∈{−1,+1} with a linear model of weights w can be done with the predictor fw(x)≜sign(s(x,w)). In this situation, a metric of interest quantifying the quality of the classification is given by the margin m(x,y,w), and can be used with the following loss functions:**
+
+&#10230; Cas de la classification - Trouver la classe d'un exemple x appartenant à y∈{−1,+1} peut être faite par le biais d'un modèle linéaire de paramètre w à l'aide du prédicteur fw(x)≜sign(s(x,w)). La qualité de cette prédiction peut alors être évaluée au travers de la marge m(x,y,w) intervenant dans les fonctions objectif suivantes :
+
+<br>
+
+
+**16. [Name, Illustration, Zero-one loss, Hinge loss, Logistic loss]**
+
+&#10230; [Nom, Illustration, Fonction objectif zéro-un, Fonction objectif de Hinge, Fonction objectif logistique]
+
+<br>
+
+
+**17. Regression case - The prediction of a sample x of true label y∈R with a linear model of weights w can be done with the predictor fw(x)≜s(x,w). In this situation, a metric of interest quantifying the quality of the regression is given by the margin res(x,y,w) and can be used with the following loss functions:**
+
+&#10230; Cas de la régression - Prédire la valeur y∈R associée à l'exemple x peut être faite par le biais d'un modèle linéaire de paramètre w à l'aide du prédicteur fw(x)≜s(x,w). La qualité de cette prédiction peut alors être évaluée au travers du résidu res(x,y,w) intervenant dans les fonctions objectif suivantes :
+
+<br>
+
+
+**18. [Name, Squared loss, Absolute deviation loss, Illustration]**
+
+&#10230; [Nom, Erreur quadratique, Erreur absolue, Illustration]
+
+<br>
+
+
+**19. Loss minimization framework ― In order to train a model, we want to minimize the training loss is defined as follows:**
+
+&#10230; Processus de minimisation de la fonction objectif - Lors de l'entraînement d'un modèle, on souhaite minimiser la valeur de la fonction objectif évaluée sur l'ensemble d'entraînement :
+
+<br>
+
+
+**20. Non-linear predictors**
+
+&#10230; Prédicteurs non linéaires
+
+<br>
+
+
+**21. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
+
+&#10230; k plus proches voisins - L'algorithme des k plus proches voisins (en anglais k-nearest neighbors ou k-NN) est une approche non paramétrique où la réponse associée à un exemple est déterminée par la nature de ses k plus proches voisins de l'ensemble d'entraînement. Cette démarche peut être utilisée pour la classification et la régression.
+
+<br>
+
+
+**22. Remark: the higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
+
+&#10230; Remarque : plus le paramètre k est grand, plus le biais est élevé. À l'inverse, la variance devient plus élevée lorsque l'on réduit la valeur k.
+
+<br>
+
+
+**23. Neural networks ― Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks. The vocabulary around neural networks architectures is described in the figure below:**
+
+&#10230; Réseaux de neurones - Les réseaux de neurones (en anglais neural networks) constituent un type de modèle basés sur des couches (en anglais layers). Parmi les types de réseaux populaires, on peut compter les réseaux de neurones convolutionnels et récurrents (abbréviés respectivement en CNN et RNN en anglais). Une partie du vocabulaire associé aux réseaux de neurones est détaillée dans la figure ci-dessous :
+
+<br>
+
+
+**24. [Input layer, Hidden layer, Output layer]**
+
+&#10230; [Couche d'entrée, Couche cachée, Couche de sortie]
+
+<br>
+
+
+**25. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
+
+&#10230; En notant i la i-ème couche du réseau et j son j-ième neurone, on a :
+
+<br>
+
+
+**26. where we note w, b, x, z the weight, bias, input and non-activated output of the neuron respectively.**
+
+&#10230; où l'on note w, b, x, z le coefficient, le biais ainsi que la variable de sortie respectivement.
+
+<br>
+
+
+**27. For a more detailed overview of the concepts above, check out the Supervised Learning cheatsheets!**
+
+&#10230; Pour un aperçu plus détaillé des concepts ci-dessus, rendez-vous sur le pense-bête d'apprentissage supervisé !
+
+<br>
+
+
+**28. Stochastic gradient descent**
+
+&#10230; Algorithme du gradient stochastique
+
+<br>
+
+
+**29. Gradient descent ― By noting η∈R the learning rate (also called step size), the update rule for gradient descent is expressed with the learning rate and the loss function Loss(x,y,w) as follows:**
+
+&#10230; Descente de gradient - En notant η∈R le taux d'apprentissage (en anglais learning rate ou step size), la règle de mise à jour des coefficients pour cet algorithme utilise la fonction objectif Loss(x,y,w) de la manière suivante :
+
+<br>
+
+
+**30. Stochastic updates ― Stochastic gradient descent (SGD) updates the parameters of the model one training example (ϕ(x),y)∈Dtrain at a time. This method leads to sometimes noisy, but fast updates.**
+
+&#10230; Mises à jour stochastiques - L'algorithme du gradient stochastique (en anglais stochastic gradient descent ou SGD) met à jour les paramètres du modèle en parcourant les exemples (ϕ(x),y)∈Dtrain de l'ensemble d'entraînement un à un. Cette méthode engendre des mises à jour rapides à calculer mais qui manquent parfois de robustesse.
+
+<br>
+
+
+**31. Batch updates ― Batch gradient descent (BGD) updates the parameters of the model one batch of examples (e.g. the entire training set) at a time. This method computes stable update directions, at a greater computational cost.**
+
+&#10230; Mises à jour par lot - L'algorithme du gradient par lot (en anglais batch gradient descent ou BGD) met à jour les paramètre du modèle en utilisant des lots entiers d'exemples (e.g. la totalité de l'ensemble d'entraînement) à la fois. Cette méthode calcule des directions de mise à jour des coefficients plus stable au prix d'un plus grand nombre de calculs.
+
+<br>
+
+
+**32. Fine-tuning models**
+
+&#10230; Peaufinage de modèle
+
+<br>
+
+
+**33. Hypothesis class ― A hypothesis class F is the set of possible predictors with a fixed ϕ(x) and varying w:**
+
+&#10230; Classe d'hypothèses - Une classe d'hypothèses F est l'ensemble des prédicteurs candidats ayant un ϕ(x) fixé et dont le paramètre w peut varier.
+
+<br>
+
+
+**34. Logistic function ― The logistic function σ, also called the sigmoid function, is defined as:**
+
+&#10230; Fonction logistique - La fonction logistique σ, aussi appelée en anglais sigmoid function, est définie par :
+
+<br>
+
+
+**35. Remark: we have σ′(z)=σ(z)(1−σ(z)).**
+
+&#10230; Remarque : la dérivée de cette fonction s'écrit σ′(z)=σ(z)(1−σ(z)).
+
+<br>
+
+
+**36. Backpropagation ― The forward pass is done through fi, which is the value for the subexpression rooted at i, while the backward pass is done through gi=∂out∂fi and represents how fi influences the output.**
+
+&#10230; Rétropropagation du gradient (en anglais backpropagation) - La propagation avant (en anglais forward pass) est effectuée via fi, valeur correspondant à l'expression appliquée à l'étape i. La propagation de l'erreur vers l'arrière (en anglais backward pass) se fait via gi=∂out∂fi et décrit la manière dont fi agit sur la sortie du réseau.
+
+<br>
+
+
+**37. Approximation and estimation error ― The approximation error ϵapprox represents how far the entire hypothesis class F is from the target predictor g∗, while the estimation error ϵest quantifies how good the predictor ^f is with respect to the best predictor f∗ of the hypothesis class F.**
+
+&#10230; Erreur d'approximation et d'estimation - L'erreur d'approximation ϵapprox représente la distance entre la classe d'hypothèses F et le prédicteur optimal g∗. De son côté, l'erreur d'estimation quantifie la qualité du prédicteur ^f par rapport au meilleur prédicteur f∗ de la classe d'hypothèses F.
+
+<br>
+
+
+**38. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230; Régularisation - Le but de la régularisation est d'empêcher le modèle de surapprendre (en anglais overfit) les données en s'occupant ainsi des problèmes de variance élevée. La table suivante résume les différents types de régularisation couramment utilisés :
+
+<br>
+
+
+**39. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230; [Réduit les coefficients à 0, Bénéfique pour la sélection de variables, Rapetissit les coefficients, Compromis entre sélection de variables et coefficients de faible magnitude]
+
+<br>
+
+
+**40. Hyperparameters ― Hyperparameters are the properties of the learning algorithm, and include features, regularization parameter λ, number of iterations T, step size η, etc.**
+
+&#10230; Hyperparamètres - Les hyperparamètres sont les paramètres de l'algorithme d'apprentissage et incluent parmi d'autres le type de caractéristiques utilisé ainsi que le paramètre de régularisation λ, le nombre d'itérations T, le taux d'apprentissage η.
+
+<br>
+
+
+**41. Sets vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230; Vocabulaire ― Lors de la sélection d'un modèle, on divise les données en 3 différentes parties :
+
+<br>
+
+
+**42. [Training set, Validation set, Testing set]**
+
+&#10230; [Données d'entraînement, Données de validation, Données de test]
+
+<br>
+
+
+**43. [Model is trained, Usually 80% of the dataset, Model is assessed, Usually 20% of the dataset, Also called hold-out or development set, Model gives predictions, Unseen data]**
+
+&#10230; [Le modèle est entrainé, Constitue normalement 80% du jeu de données, Le modèle est évalué, Constitue normalement 20% du jeu de données, Aussi appelé données de développement (en anglais hold-out ou development set), Le modèle donne ses prédictions, Données jamais observées]
+
+<br>
+
+
+**44. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+&#10230; Une fois que le modèle a été choisi, il est entrainé sur le jeu de données entier et testé sur l'ensemble de test (qui n'a jamais été vu). Ces derniers sont représentés dans la figure ci-dessous :
+
+<br>
+
+
+**45. [Dataset, Unseen data, train, validation, test]**
+
+&#10230; [Jeu de données, Données inconnues, entrainement, validation, test]
+
+<br>
+
+
+**46. For a more detailed overview of the concepts above, check out the Machine Learning tips and tricks cheatsheets!**
+
+&#10230; Pour un aperçu plus détaillé des concepts ci-dessus, rendez-vous sur le pense-bête de petites astuces d'apprentissage automatique !
+
+<br>
+
+
+**47. Unsupervised Learning**
+
+&#10230; Apprentissage non supervisé
+
+<br>
+
+
+**48. The class of unsupervised learning methods aims at discovering the structure of the data, which may have of rich latent structures.**
+
+&#10230; Les méthodes d'apprentissage non supervisé visent à découvrir la structure (parfois riche) des données.
+
+<br>
+
+
+**49. k-means**
+
+&#10230; k-moyennes (en anglais k-means)
+
+<br>
+
+
+**50. Clustering ― Given a training set of input points Dtrain, the goal of a clustering algorithm is to assign each point ϕ(xi) to a cluster zi∈{1,...,k}**
+
+&#10230; Partitionnement - Étant donné un ensemble d'entraînement Dtrain, le but d'un algorithme de partitionnement (en anglais clustering) est d'assigner chaque point ϕ(xi) à une partition zi∈{1,...,k}.
+
+<br>
+
+
+**51. Objective function ― The loss function for one of the main clustering algorithms, k-means, is given by:**
+
+&#10230; Fonction objectif - La fonction objectif d'un des principaux algorithmes de partitionnement, k-moyennes, est donné par :
+
+<br>
+
+
+**52. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230; Après avoir aléatoirement initialisé les centroïdes de partitions μ1,μ2,...,μk∈Rn, l'algorithme k-moyennes répète l'étape suivante jusqu'à convergence :
+
+<br>
+
+
+**53. and**
+
+&#10230; et
+
+<br>
+
+
+**54. [Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230; [Initialisation des moyennes, Assignation de partition, Mise à jour des moyennes, Convergence]
+
+<br>
+
+
+**55. Principal Component Analysis**
+
+&#10230; Analyse des composantes principales
+
+<br>
+
+
+**56. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230; Étant donnée une matrice A∈Rn×n, λ est dite être une valeur propre de A s'il existe un vecteur z∈Rn∖{0}, appelé vecteur propre, tel que :
+
+<br>
+
+
+**57. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230; Théorème spectral ― Soit A∈Rn×n. Si A est symétrique, alors A est diagonalisable par une matrice réelle orthogonale U∈Rn×n. En notant Λ=diag(λ1,...,λn), on a :
+
+<br>
+
+
+**58. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230; Remarque : le vecteur propre associé à la plus grande valeur propre est appelé le vecteur propre principal de la matrice A.
+
+<br>
+
+
+**59. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:**
+
+&#10230; Algorithme ― La procédure d'analyse des composantes principales (en anglais PCA - Principal Component Analysis) est une technique de réduction de dimension qui projette les données sur k dimensions en maximisant la variance des données de la manière suivante :
+
+<br>
+
+
+**60. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230; Étape 1: Normaliser les données pour avoir une moyenne de 0 et un écart-type de 1.
+
+<br>
+
+
+**61. [where, and]**
+
+&#10230; [où, et]
+
+<br>
+
+
+**62. [Step 2: Compute Σ=1mm∑i=1ϕ(xi)ϕ(xi)T∈Rn×n, which is symmetric with real eigenvalues., Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues., Step 4: Project the data on spanR(u1,...,uk).]**
+
+&#10230; [Étape 2: Calculer Σ=1mm∑i=1ϕ(xi)ϕ(xi)T∈Rn×n, qui est symmétrique avec des valeurs propres réelles., Étape 3: Calculer u1,...,uk∈Rn les k valeurs propres principales orthogonales de Σ, i.e. les vecteurs propres orthogonaux des k valeurs propres les plus grandes., Étape 4: Projeter les données sur spanR(u1,...,uk).]
+
+<br>
+
+
+**63. This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230; Cette procédure maximise la variance sur tous les espaces à k dimensions.
+
+<br>
+
+
+**64. [Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230; [Données dans l'espace initial, Trouve les composantes principales, Données dans l'espace des composantes principales]
+
+<br>
+
+
+**65. For a more detailed overview of the concepts above, check out the Unsupervised Learning cheatsheets!**
+
+&#10230; Pour un aperçu plus détaillé des concepts ci-dessus, rendez-vous sur le pense-bête d'apprentissage non supervisé !
+
+<br>
+
+
+**66. [Linear predictors, Feature vector, Linear classifier/regression, Margin]**
+
+&#10230; [Prédicteurs linéaires, Vecteur caractéristique, Classification/régression linéaire, Marge]
+
+<br>
+
+
+**67. [Loss minimization, Loss function, Framework]**
+
+&#10230; [Minimisation de la fonction objectif, Fonction objectif, Cadre]
+
+<br>
+
+
+**68. [Non-linear predictors, k-nearest neighbors, Neural networks]**
+
+&#10230; [Prédicteurs non linéaires, k plus proches voisins, Réseaux de neurones]
+
+<br>
+
+
+**69. [Stochastic gradient descent, Gradient, Stochastic updates, Batch updates]**
+
+&#10230; [Algorithme du gradient stochastique, Gradient, Mises à jour stochastiques, Mises à jour par lots]
+
+<br>
+
+
+**70. [Fine-tuning models, Hypothesis class, Backpropagation, Regularization, Sets vocabulary]**
+
+&#10230; [Peaufiner les modèles, Classe d'hypothèses, Rétropropagation du gradient, Régularisation, Vocabulaire]
+
+<br>
+
+
+**71. [Unsupervised Learning, k-means, Principal components analysis]**
+
+&#10230; [Apprentissage non supervisé, k-means, Analyse des composantes principales]
+
+<br>
+
+
+**72. View PDF version on GitHub**
+
+&#10230; Voir la version PDF sur GitHub
+
+<br>
+
+
+**73. Original authors**
+
+&#10230; Auteurs d'origine
+
+<br>
+
+
+**74. Translated by X, Y and Z**
+
+&#10230; Traduit par X, Y et Z
+
+<br>
+
+
+**75. Reviewed by X, Y and Z**
+
+&#10230; Revu par X, Y et Z
+
+<br>
+
+
+**76. By X and Y**
+
+&#10230; De X et Y
+
+<br>
+
+
+**77. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230; Les pense-bêtes d'intelligence artificielle sont maintenant disponibles en français.

From 7a7e1d94db4960e39ef36010420e1900efde7775 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ba=C5=9Fak=20Buluz?=
 <41359672+basakbuluz@users.noreply.github.com>
Date: Thu, 1 Aug 2019 12:28:12 +0300
Subject: [PATCH 293/531] [tr] Variables-based models

Turkish translation has been completed. Ready for the review!
---
 tr/cs-221-variables-models.md | 617 ++++++++++++++++++++++++++++++++++
 1 file changed, 617 insertions(+)
 create mode 100644 tr/cs-221-variables-models.md

diff --git a/tr/cs-221-variables-models.md b/tr/cs-221-variables-models.md
new file mode 100644
index 000000000..4e108c95e
--- /dev/null
+++ b/tr/cs-221-variables-models.md
@@ -0,0 +1,617 @@
+**Variables-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-variables-models)
+
+<br>
+
+**1. Variables-based models with CSP and Bayesian networks**
+
+&#10230; 1. CSP  ile değişken-temelli modeller ve Bayesçi ağlar
+
+<br>
+
+
+**2. Constraint satisfaction problems**
+
+&#10230; 2. Kısıt memnuniyet problemleri
+
+<br>
+
+
+**3. In this section, our objective is to find maximum weight assignments of variable-based models. One advantage compared to states-based models is that these algorithms are more convenient to encode problem-specific constraints.**
+
+&#10230; 3. Bu bölümde hedefimiz değişken-temelli modellerin maksimum ağırlık seçimlerini bulmaktır. Durum temelli modellerle kıyaslandığında, bu algoritmaların probleme özgü kısıtları kodlamak için daha uygun olmaları bir avantajdır.  
+
+<br>
+
+
+**4. Factor graphs**
+
+&#10230; 4. Faktör grafikleri
+
+<br>
+
+
+**5. Definition ― A factor graph, also referred to as a Markov random field, is a set of variables X=(X1,...,Xn) where Xi∈Domaini and m factors f1,...,fm with each fj(X)⩾0.**
+
+&#10230;5. Tanımlama - Markov rasgele alanı olarak da adlandırılan faktör grafiği, Xi∈Domaini ve herbir fj(X)⩾0 olan f1,...,fm m faktör olmak üzere X=(X1,...,Xn) değişkenler kümesidir.
+
+<br>
+
+
+**6. Domain**
+
+&#10230; 6. Etki Alanı (Domain)
+
+<br>
+
+
+**7. Scope and arity ― The scope of a factor fj is the set of variables it depends on. The size of this set is called the arity.**
+
+&#10230; 7. Kapsam ve ilişki derecesi - Fj faktörünün kapsamı, dayandığı değişken kümesidir. Bu kümenin boyutuna ilişki derecesi (arity) denir.
+
+<br>
+
+
+**8. Remark: factors of arity 1 and 2 are called unary and binary respectively.**
+
+&#10230; 8. Not: Faktörlerin ilişki derecesi 1 ve 2 olanlarına sırasıyla tek ve ikili denir.
+
+<br>
+
+
+**9. Assignment weight ― Each assignment x=(x1,...,xn) yields a weight Weight(x) defined as being the product of all factors fj applied to that assignment. Its expression is given by:**
+
+&#10230;9. Atama ağırlığı - Her atama x = (x1, ..., xn), o atamaya uygulanan tüm faktörlerin çarpımı olarak tanımlanan bir Ağırlık (x) ağırlığı verir.Şöyle ifade edilir:
+
+<br> 
+
+
+**10. Constraint satisfaction problem ― A constraint satisfaction problem (CSP) is a factor graph where all factors are binary; we call them to be constraints:**
+
+&#10230; 10. Kısıt memnuniyet problemi - Kısıtlama memnuniyet problemi (constraint satisfaction problem-CSP), tüm faktörlerin ikili olduğu bir faktör grafiğidir; bunları kısıt olarak adlandırıyoruz:
+
+<br>
+
+
+**11. Here, the constraint j with assignment x is said to be satisfied if and only if fj(x)=1.**
+
+&#10230;11.Burada, j kısıtlı x ataması ancak ve ancak fj(x)=1 olduğunda memnundur denir.
+
+<br>
+
+
+**12. Consistent assignment ― An assignment x of a CSP is said to be consistent if and only if Weight(x)=1, i.e. all constraints are satisfied.**
+
+&#10230; 12.Tutarlı atama - Bir CSP'nin bir x atamasının, yalnızca Ağırlık (x) = 1 olduğunda, yani tüm kısıtların yerine getirilmesi durumunda tutarlı olduğu söylenir.
+
+<br>
+
+
+**13. Dynamic ordering**
+
+&#10230; 13. Dinamik düzenleşim
+
+<br>
+
+
+**14. Dependent factors ― The set of dependent factors of variable Xi with partial assignment x is called D(x,Xi), and denotes the set of factors that link Xi to already assigned variables.**
+
+&#10230;14.Bağımlı faktörler - X değişkeninin kısmi atamaya sahip bağımlı X değişken faktörlerinin kümesi D (x, Xi) ile gösterilir ve Xi'yi önceden atanmış değişkenlere bağlayan faktörler kümesini belirtir.
+
+<br>
+
+
+**15. Backtracking search ― Backtracking search is an algorithm used to find maximum weight assignments of a factor graph. At each step, it chooses an unassigned variable and explores its values by recursion. Dynamic ordering (i.e. choice of variables and values) and lookahead (i.e. early elimination of inconsistent options) can be used to explore the graph more efficiently, although the worst-case runtime stays exponential: O(|Domain|n).**
+
+&#10230; 15. Geri izleme araması - Geri izleme araması, bir faktör grafiğinin maksimum ağırlık atamalarını bulmak için kullanılan bir algoritmadır. Her adımda, atanmamış bir değişken seçer ve değerlerini özyineleme ile arar. Dinamik düzenleşim (yani değişkenlerin ve değerlerin seçimi) ve bakış açısı (yani tutarsız seçeneklerin erken elenmesi), en kötü durum çalışma süresi üssel olarak olsa da grafiği daha verimli aramak için kullanılabilir. O (| Domain | n).
+
+<br>
+
+
+**16. [Forward checking ― It is a one-step lookahead heuristic that preemptively removes inconsistent values from the domains of neighboring variables. It has the following characteristics:, After assigning a variable Xi, it eliminates inconsistent values from the domains of all its neighbors., If any of these domains becomes empty, we stop the local backtracking search., If we un-assign a variable Xi, we have to restore the domain of its neighbors.]**
+
+&#10230; 16. [İleri kontrol - Tutarsız değerleri komşu değişkenlerin etki alanlarından öncelikli bir şekilde ortadan kaldıran sezgisel bakış açısıdır. Aşağıdaki özelliklere sahiptir :, Bir Xi değişkenini atadıktan sonra, tüm komşularının etki alanlarından tutarsız değerleri eler., Bu etki alanlardan herhangi biri boş olursa, yerel geri arama araması durdurulur. , komşularının etki alanını eski haline getirilmek zorundadır.]
+
+<br>
+
+
+**17. Most constrained variable ― It is a variable-level ordering heuristic that selects the next unassigned variable that has the fewest consistent values. This has the effect of making inconsistent assignments to fail earlier in the search, which enables more efficient pruning.**
+
+&#10230; 17. En kısıtlı değişken - En az tutarlı değere sahip bir sonraki atanmamış değişkeni seçen, değişken seviyeli sezgisel düzenleşimdir. Bu, daha verimli budama olanağı sağlayan aramada daha önce başarısız olmak için tutarsız atamalar yapma etkisine sahiptir.
+
+<br>
+
+
+**18. Least constrained value ― It is a value-level ordering heuristic that assigns the next value that yields the highest number of consistent values of neighboring variables. Intuitively, this procedure chooses first the values that are most likely to work.**
+
+&#10230; 18. En düşük kısıtlı değer - Komşu değişkenlerin en yüksek tutarlı değerlerini elde ederek bir sonraki değeri veren değer seviyesi düzenleyici sezgisel bir değerdir. Sezgisel olarak, bu prosedür önce çalışması en muhtemel olan değerleri seçer.
+
+<br>
+
+
+**19. Remark: in practice, this heuristic is useful when all factors are constraints.**
+
+&#10230; 19. Not: Uygulamada, bu sezgisel yaklaşım tüm faktörler kısıtlı olduğunda kullanışlıdır.
+
+<br>
+
+
+**20. The example above is an illustration of the 3-color problem with backtracking search coupled with most constrained variable exploration and least constrained value heuristic, as well as forward checking at each step.**
+
+&#10230; 20. Yukarıdaki örnek, en kısıtlı değişken keşfi ve sezgisel en düşük kısıtlı değerin yanı sıra, her adımda ileri kontrol ile birleştirilmiş geri izleme arama ile 3 renkli problemin bir gösterimidir.
+
+<br>
+
+
+**21. [Arc consistency ― We say that arc consistency of variable Xl with respect to Xk is enforced when for each xl∈Domainl:, unary factors of Xl are non-zero, there exists at least one xk∈Domaink such that any factor between Xl and Xk is non-zero.]**
+
+&#10230; 21. [Ark tutarlılığı - Xl değişkeninin ark tutarlılığının Xk'ye göre her bir xl∈Domainl için geçerli olduğu söylenir :, Xl'in birleşik faktörleri sıfır olmadığında, en az bir xk∈Domaink vardır, öyle ki Xl ve Xk arasında sıfır olmayan herhangi bir faktör vardır.
+
+<br>
+
+
+**22. AC-3 ― The AC-3 algorithm is a multi-step lookahead heuristic that applies forward checking to all relevant variables. After a given assignment, it performs forward checking and then successively enforces arc consistency with respect to the neighbors of variables for which the domain change during the process.**
+
+&#10230; 22. AC-3 - AC-3 algoritması, tüm ilgili değişkenlere ileri kontrol uygulayan çok adımlı sezgisel bir bakış açısıdır. Belirli bir görevden sonra ileriye doğru kontrol yapar ve ardından işlem sırasında etki alanının değiştiği değişkenlerin komşularına göre ark tutarlılığını ardı ardına uygular.
+
+<br>
+
+
+**23. Remark: AC-3 can be implemented both iteratively and recursively.**
+
+&#10230; 23. Not: AC-3, tekrarlı ve özyinelemeli olarak uygulanabilir.
+
+<br>
+
+
+**24. Approximate methods**
+
+&#10230;24. Yaklaşık yöntemler
+
+<br>
+
+
+**25. Beam search ― Beam search is an approximate algorithm that extends partial assignments of n variables of branching factor b=|Domain| by exploring the K top paths at each step. The beam size K∈{1,...,bn} controls the tradeoff between efficiency and accuracy. This algorithm has a time complexity of O(n⋅Kblog(Kb)).**
+
+&#10230; 25. Işın araması - Işın araması, her adımda K en üst yollarını keşfederek, b=|Domain| dallanma faktörünün n değişkeninin kısmi atamalarını genişleten yaklaşık bir algoritmadır.
+
+<br>
+
+
+**26. The example below illustrates a possible beam search of parameters K=2, b=3 and n=5.**
+
+&#10230; 26. Aşağıdaki örnek, K = 2, b = 3 ve n = 5 parametreleri ile muhtemel kiriş aramasını (beam search) göstermektedir.
+
+<br>
+
+
+**27. Remark: K=1 corresponds to greedy search whereas K→+∞ is equivalent to BFS tree search.**
+
+&#10230; 27. Not: K = 1 açgözlü aramaya (greedy search) karşılık gelirken K → + ∞, BFS ağaç aramasına eşdeğerdir.
+
+<br>
+
+
+**28. Iterated conditional modes ― Iterated conditional modes (ICM) is an iterative approximate algorithm that modifies the assignment of a factor graph one variable at a time until convergence. At step i, we assign to Xi the value v that maximizes the product of all factors connected to that variable.**
+
+&#10230;28. Tekrarlanmış koşullu modlar - Tekrarlanmış koşullu modlar (Iterated conditional modes-ICM), yakınsamaya kadar bir seferde bir değişkenli bir faktör grafiğinin atanmasını değiştiren yinelemeli bir yaklaşık algoritmadır. İ adımında, Xi'ye, bu değişkene bağlı tüm faktörlerin çarpımını maksimize eden v değeri atanır.
+
+<br>
+
+
+**29. Remark: ICM may get stuck in local minima.**
+
+&#10230; 29. Not: ICM yerel minimumda takılıp kalabilir.
+
+<br>
+
+
+**30. [Gibbs sampling ― Gibbs sampling is an iterative approximate method that modifies the assignment of a factor graph one variable at a time until convergence. At step i:, we assign to each element u∈Domaini a weight w(u) that is the product of all factors connected to that variable, we sample v from the probability distribution induced by w and assign it to Xi.]**
+
+&#10230; 30. [Gibbs örneklemesi - Gibbs örneklemesi, yakınsamaya kadar bir seferde bir değişken grafik faktörünün atanmasını değiştiren yinelemeli bir yaklaşık yöntemdir. İ adımında, her bir u∈Domain olan öğeye , bu değişkene bağlı tüm faktörlerin çarpımı olan bir ağırlık w (u) atanır, v'yi w tarafından indüklenen olasılık dağılımından örnek alır ve Xi'ye atanır.]
+
+<br>
+
+
+**31. Remark: Gibbs sampling can be seen as the probabilistic counterpart of ICM. It has the advantage to be able to escape local minima in most cases.**
+
+&#10230; 31. Not: Gibbs örneklemesi, ICM'nin olasılıksal karşılığı olarak görülebilir. Çoğu durumda yerel minimumlardan kaçabilme avantajına sahiptir.
+
+<br>
+
+
+**32. Factor graph transformations**
+
+&#10230; 32. Faktör grafiği dönüşümleri
+
+<br>
+
+
+**33. Independence ― Let A,B be a partitioning of the variables X. We say that A and B are independent if there are no edges between A and B and we write:**
+
+&#10230; 33. Bağımsızlık - A, B, X değişkenlerinin bir bölümü olsun. A ve B arasında kenar yoksa A ve B'nin bağımsız olduğu söylenir ve şöyle ifade edilir:
+
+<br>
+
+
+**34. Remark: independence is the key property that allows us to solve subproblems in parallel.**
+
+&#10230; 34. Not: bağımsızlık, alt sorunları paralel olarak çözmemize olanak sağlayan bir kilit özelliktir.
+
+<br>
+
+
+**35. Conditional independence ― We say that A and B are conditionally independent given C if conditioning on C produces a graph in which A and B are independent. In this case, it is written:**
+
+&#10230; 35. Koşullu bağımsızlık - Eğer C'nin şartlandırılması, A ve B'nin bağımsız olduğu bir grafik üretiyorsa A ve B verilen C koşulundan bağımsızdır. Bu durumda şöyle yazılır:
+
+<br>
+
+
+**36. [Conditioning ― Conditioning is a transformation aiming at making variables independent that breaks up a factor graph into smaller pieces that can be solved in parallel and can use backtracking. In order to condition on a variable Xi=v, we do as follows:, Consider all factors f1,...,fk that depend on Xi, Remove Xi and f1,...,fk, Add gj(x) for j∈{1,...,k} defined as:]**
+
+&#10230; 36. [Koşullandırma - Koşullandırma, bir faktör grafiğini paralel olarak çözülebilen ve geriye doğru izlemeyi kullanabilen daha küçük parçalara bölen değişkenleri bağımsız kılmayı amaçlayan bir dönüşümdür. Xi = v değişkeninde koşullandırmak için aşağıdakileri yaparız: Xi'ye bağlı tüm f1, ..., fk faktörlerini göz önünde bulundurun, Xi ve f1, ..., fk öğelerini kaldırın, j∈ {1, ..., k} için gj (x) ekleyin:]
+
+<br>
+
+
+**37. Markov blanket ― Let A⊆X be a subset of variables. We define MarkovBlanket(A) to be the neighbors of A that are not in A.**
+
+&#10230; 37. Markov blanket - A⊆X değişkenlerin bir alt kümesi olsun. MarkovBlanket'i (A), A'da olmayan A'nın komşuları olarak tanımlıyoruz.
+
+<br>
+
+
+**38. Proposition ― Let C=MarkovBlanket(A) and B=X∖(A∪C). Then we have:**
+
+&#10230; Önerme - C = MarkovBlanket (A) ve B = X ∖ (A∪C) olsun.Bu durumda:
+
+<br>
+
+
+**39. [Elimination ― Elimination is a factor graph transformation that removes Xi from the graph and solves a small subproblem conditioned on its Markov blanket as follows:, Consider all factors fi,1,...,fi,k that depend on Xi, Remove Xi
+and fi,1,...,fi,k, Add fnew,i(x) defined as:]**
+
+&#10230; 39. [Eliminasyon - Eliminasyon, Xi'yi grafikten ayıran ve Markov blanket de şartlandırılmış küçük bir alt sorunu çözen bir faktör grafiği dönüşümüdür: Xi'ye bağlı tüm fi, 1, ..., fi, k faktörlerini göz önünde bulundurun, Xi ve fi, 1, ..., fi, k, kaldır, fnew ekleyin, i (x) şöyle tanımlanır:]
+
+<br>
+
+
+**40. Treewidth ― The treewidth of a factor graph is the maximum arity of any factor created by variable elimination with the best variable ordering. In other words,**
+
+&#10230; 40. Ağaç genişliği - Bir faktör grafiğinin ağaç genişliği, değişken elemeli en iyi değişken sıralamasıyla oluşturulan herhangi bir faktörün maksimum ilişki derecesidir. Diğer bir deyişle,
+
+<br>
+
+
+**41. The example below illustrates the case of a factor graph of treewidth 3.**
+
+&#10230; 41. Aşağıdaki örnek, ağaç genişliği 3 olan faktör grafiğini gösterir.
+
+<br>
+
+
+**42. Remark: finding the best variable ordering is a NP-hard problem.**
+
+&#10230; 42. Not: en iyi değişken sıralamasını bulmak NP-zor (NP-hard) bir problemdir.
+
+<br>
+
+
+**43. Bayesian networks**
+
+&#10230; 43. Bayesçi ağlar
+
+<br>
+
+
+**44. In this section, our goal will be to compute conditional probabilities. What is the probability of a query given evidence?**
+
+&#10230;44. Bu bölümün amacı koşullu olasılıkları hesaplamak olacaktır. Bir sorgunun kanıt verilmiş olma olasılığı nedir?
+
+<br>
+
+
+**45. Introduction**
+
+&#10230; 45. Giriş
+
+<br>
+
+
+**46. Explaining away ― Suppose causes C1 and C2 influence an effect E. Conditioning on the effect E and on one of the causes (say C1) changes the probability of the other cause (say C2). In this case, we say that C1 has explained away C2.**
+
+&#10230; 47. Açıklamalar - C1 ve C2 sebeplerinin E etkisini yarattığını varsayalım. E etkisinin durumu ve sebeplerden biri (C1 olduğunu varsayalım) üzerindeki etkisi, diğer sebep olan C2'nin olasılığını değiştirir. Bu durumda, C1'in C2'yi açıkladığı söylenir.
+
+<br>
+
+
+**47. Directed acyclic graph ― A directed acyclic graph (DAG) is a finite directed graph with no directed cycles.**
+
+&#10230;47. Yönlü çevrimsiz çizge - Yönlü çevrimsiz bir çizge (Directed acyclic graph-DAG), yönlendirilmiş çevrimleri olmayan sonlu bir yönlü çizgedir.
+
+<br>
+
+
+**48. Bayesian network ― A Bayesian network is a directed acyclic graph (DAG) that specifies a joint distribution over random variables X=(X1,...,Xn) as a product of local conditional distributions, one for each node:**
+
+&#10230;48. Bayesçi ağ - Her düğüm için bir tane olmak üzere, yerel koşullu dağılımların bir çarpımı olarak, X = (X1, ..., Xn) rasgele değişkenleri üzerindeki bir ortak dağılımı belirten yönlü bir çevrimsiz çizgedir:
+
+<br>
+
+
+**49. Remark: Bayesian networks are factor graphs imbued with the language of probability.**
+
+&#10230; 49. Not: Bayesçi ağlar olasılık diliyle bütünleşik faktör grafikleridir.
+
+<br>
+
+
+**50. Locally normalized ― For each xParents(i), all factors are local conditional distributions. Hence they have to satisfy:**
+
+&#10230; 50. Yerel olarak normalleştirilmiş - Her xParents (i) için tüm faktörler yerel koşullu dağılımlardır. Bu nedenle yerine getirmek zorundalar:
+
+<br>
+
+
+**51. As a result, sub-Bayesian networks and conditional distributions are consistent.**
+
+&#10230;51. Sonuç olarak, alt-Bayesçi ağlar ve koşullu dağılımlar tutarlıdır.
+
+<br>
+
+
+**52. Remark: local conditional distributions are the true conditional distributions.**
+
+&#10230; 52. Not: Yerel koşullu dağılımlar gerçek koşullu dağılımlardır.
+
+<br>
+
+
+**53. Marginalization ― The marginalization of a leaf node yields a Bayesian network without that node.**
+
+&#10230; 53. Marjinalleşme - Bir yaprak düğümünün marjinalleşmesi, o düğüm olmaksızın bir Bayesçi ağı sağlar.
+
+<br>
+
+
+**54. Probabilistic programs**
+
+&#10230; 54. Olasılık programları
+
+<br>
+
+
+**55. Concept ― A probabilistic program randomizes variables assignment. That way, we can write down complex Bayesian networks that generate assignments without us having to explicitly specify associated probabilities.**
+
+&#10230; 55. Konsept - Olasılıklı bir program değişkenlerin atanmasını randomize eder. Bu şekilde, ilişkili olasılıkları açıkça belirtmek zorunda kalmadan atamalar üreten karmaşık Bayesçi ağlar yazılabilir.
+
+<br>
+
+
+**56. Remark: examples of probabilistic programs include Hidden Markov model (HMM), factorial HMM, naive Bayes, latent Dirichlet allocation, diseases and symptoms and stochastic block models.**
+
+&#10230; 56. Not: Olasılık programlarına örnekler arasında Gizli Markov modeli (Hidden Markov model-HMM), faktöriyel HMM, naif Bayes (naive Bayes), gizli Dirichlet tahsisi (latent Dirichlet allocation), hastalıklar ve semptomlar ve stokastik blok modelleri bulunmaktadır.
+
+<br>
+
+
+**57. Summary ― The table below summarizes the common probabilistic programs as well as their applications:**
+
+&#10230; 57. Özet - Aşağıdaki tablo, ortak olasılıklı programları ve bunların uygulamalarını özetlemektedir:
+
+<br>
+
+
+**58. [Program, Algorithm, Illustration, Example]**
+
+&#10230; 58. [Program, Algoritma, İllüstrasyon, Örnek]
+
+<br>
+
+
+**59. [Markov Model, Hidden Markov Model (HMM), Factorial HMM, Naive Bayes, Latent Dirichlet Allocation (LDA)]**
+
+&#10230; 59. [Markov Modeli, Gizli Markov Modeli (HMM), Faktöriyel HMM, Naif Bayes, Gizli Dirichlet Tahsisi (Latent Dirichlet Allocation-LDA)]
+
+<br>
+
+
+**60. [Generate, distribution]**
+
+&#10230; 60. [Üretim, Dağılım]
+
+<br>
+
+
+**61. [Language modeling, Object tracking, Multiple object tracking, Document classification, Topic modeling]**
+
+&#10230; 61. [Dil modelleme, Nesne izleme, Çoklu nesne izleme, Belge sınıflandırma, Konu modelleme]
+
+<br>
+
+
+**62. Inference**
+
+&#10230; 62. Çıkarım
+
+<br>
+
+
+**63. [General probabilistic inference strategy ― The strategy to compute the probability P(Q|E=e) of query Q given evidence E=e is as follows:, Step 1: Remove variables that are not ancestors of the query Q or the evidence E by marginalization, Step 2: Convert Bayesian network to factor graph, Step 3: Condition on the evidence E=e, Step 4: Remove nodes disconnected from the query Q by marginalization, Step 5: Run a probabilistic inference algorithm (manual, variable elimination, Gibbs sampling, particle filtering)]**
+
+&#10230; 63. [Genel olasılıksal çıkarım stratejisi - E = e kanıtı verilen Q sorgusunun P (Q | E = e) olasılığını hesaplama stratejisi aşağıdaki gibidir :, Adım 1: Q sorgusunun ataları olmayan değişkenlerini ya da marjinalleştirme yoluyla E kanıtını silin, Adım 2: Bayesçi ağı faktör grafiğine dönüştürün, Adım 3: Kanıtın koşulu E = e, Adım 4: Q sorgusu ile bağlantısı kesilen düğümleri marjinalleştirme yoluyla silin, Adım 5: Olasılıklı bir çıkarım algoritması çalıştırın (kılavuz, değişken eleme, Gibbs örneklemesi, parçacık filtreleme)]
+
+<br>
+
+
+**64. Forward-backward algorithm ― This algorithm computes the exact value of P(H=hk|E=e) (smoothing query) for any k∈{1,...,L} in the case of an HMM of size L. To do so, we proceed in 3 steps:**
+
+&#10230; 64. İleri-geri algoritma - Bu algoritma, L boyutunda bir HMM durumunda herhangi bir k∈ {1, ..., L} için P (H = hk | E = e) (düzeltme sorgusu) değerini hesaplar. Bunu yapmak için 3 adımda ilerlenir:
+
+<br>
+
+
+**65. Step 1: for ..., compute ...**
+
+&#10230; 65. Adım 1: ... için (for), hesapla ...
+
+<br>
+
+
+**66. with the convention F0=BL+1=1. From this procedure and these notations, we get that**
+
+&#10230; 66. F0 = BL + 1 = 1 kuralı ile. Bu prosedürden ve bu notasyonlardan anlıyoruz ki
+
+<br>
+
+
+**67. Remark: this algorithm interprets each assignment to be a path where each edge hi−1→hi is of weight p(hi|hi−1)p(ei|hi).**
+
+&#10230; 67. Not: bu algoritma, her bir atamada her bir kenarın hi − 1 → hi'nin p (hi | hi − 1) p (ei | hi) olduğu bir yol olduğunu yorumlar.
+
+<br>
+
+
+**68. [Gibbs sampling ― This algorithm is an iterative approximate method that uses a small set of assignments (particles) to represent a large probability distribution. From a random assignment x, Gibbs sampling performs the following steps for i∈{1,...,n} until convergence:, For all u∈Domaini, compute the weight w(u) of assignment x where Xi=u, Sample v from the probability distribution induced by w: v∼P(Xi=v|X−i=x−i), Set Xi=v]**
+
+&#10230; 68. [Gibbs örneklemesi - Bu algoritma, büyük olasılık dağılımını temsil etmek için küçük bir dizi atama (parçacık) kullanan tekrarlı bir yaklaşık yöntemdir. Rasgele bir x atamasından Gibbs örneklemesi, i∈ {1, ..., n} için yakınsamaya kadar aşağıdaki adımları uygular :, Tüm u∈Domaini için, x atamasının x (u) ağırlığını hesaplayın, burada Xi = u, Sample w: v∼P (Xi = v | X − i = x − i), Set Xi = v] ile indüklenen olasılık dağılımından
+
+<br>
+
+
+**69. Remark: X−i denotes X∖{Xi} and x−i represents the corresponding assignment.**
+
+&#10230; 69. Not: X − i, X ∖ {Xi} ve x − i, karşılık gelen atamayı temsil eder.
+
+<br>
+
+
+**70. [Particle filtering ― This algorithm approximates the posterior density of state variables given the evidence of observation variables by keeping track of K particles at a time. Starting from a set of particles C of size K, we run the following 3 steps iteratively:, Step 1: proposal - For each old particle xt−1∈C, sample x from the transition probability distribution p(x|xt−1) and add x to a set C′., Step 2: weighting - Weigh each x of the set C′ by w(x)=p(et|x), where et is the evidence observed at time t., Step 3: resampling - Sample K elements from the set C′ using the probability distribution induced by w and store them in C: these are the current particles xt.]**
+
+&#10230;70. [Parçacık filtreleme - Bu algoritma, bir seferde K parçacıklarını takip ederek gözlem değişkenlerinin kanıtı olarak verilen durum değişkenlerinin önceki yoğunluğuna yaklaşır.K boyutunda bir C parçacığı kümesinden başlayarak, aşağıdaki 3 adım tekrarlı olarak çalıştırılır: Adım 1: teklif - Her eski parçacık xt − 1∈C için, geçiş olasılığı dağılımından p (x | xt − 1) örnek x'i alın ve C ′ye ekleyin. Adım 2: ağırlıklandırma - C ′nin her x değerini w (x) = p (et | x) ile ağırlıklandırın, burada et t zamanında gözlemlenen kanıttır, Adım 3: yeniden örnekleme - w ile indüklenen olasılık dağılımını kullanarak C kümesinden örnek K elemanlarını C cinsinden saklayın: bunlar şuanki xt parçacıklarıdır.]
+
+<br>
+
+
+**71. Remark: a more expensive version of this algorithm also keeps track of past particles in the proposal step.**
+
+&#10230; 71. Not: Bu algoritmanın daha pahalı bir versiyonu da teklif adımındaki geçmiş katılımcıların kaydını tutar.
+
+<br>
+
+
+**72. Maximum likelihood ― If we don't know the local conditional distributions, we can learn them using maximum likelihood.**
+
+&#10230; 72. Maksimum olabilirlik - Yerel koşullu dağılımları bilmiyorsak, maksimum olasılık kullanarak bunları öğrenebiliriz.
+
+<br>
+
+
+**73. Laplace smoothing ― For each distribution d and partial assignment (xParents(i),xi), add λ to countd(xParents(i),xi), then normalize to get probability estimates.**
+
+&#10230; 73. Laplace yumuşatma - Her d dağılımı ve (xParents (i), xi) kısmi ataması için, countd(xParents (i), xi)'a λ ekleyin, ardından olasılık tahminlerini almak için normalleştirin.
+
+<br>
+
+
+**74. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230; 74. Algoritma - Beklenti-Maksimizasyon (EM) algoritması, olasılığa art arda bir alt sınır oluşturarak (E-adım) tekrarlayarak ve bu alt sınırın (M-adımını) optimize ederek θ parametresini maksimum olasılık tahmini ile tahmin etmede aşağıdaki gibi etkin bir yöntem sunar :
+
+<br>
+
+
+**75. [E-step: Evaluate the posterior probability q(h) that each data point e came from a particular cluster h as follows:, M-step: Use the posterior probabilities q(h) as cluster specific weights on data points e to determine θ through maximum likelihood.]**
+
+&#10230; 75. [E-adım: Her bir (e) veri noktasının belirli bir (h) kümesinden geldiği gerideki q (h) durumunu şu şekilde değerlendirin: M-adım: (maksimum olasılığını belirlemek için e veri noktalarındaki küme özgül ağırlıkları olarak gerideki olasılıklar q (h) kullanın.]
+
+<br>
+
+
+**76. [Factor graphs, Arity, Assignment weight, Constraint satisfaction problem, Consistent assignment]**
+
+&#10230; 76. [Faktör grafikleri, İlişki Derecesi, Atama ağırlığı, Kısıt memnuniyet sorunu, Tutarlı atama]
+
+<br>
+
+
+**77. [Dynamic ordering, Dependent factors, Backtracking search, Forward checking, Most constrained variable, Least constrained value]**
+
+&#10230; 77. [Dinamik düzenleşim, Bağımlı faktörler, Geri izleme araması, İleriye dönük kontrol, En kısıtlı değişken, En düşük kısıtlanmış değer]
+
+<br>
+
+
+**78. [Approximate methods, Beam search, Iterated conditional modes, Gibbs sampling]**
+
+&#10230; 78. [Yaklaşık yöntemler, Işın arama , Tekrarlı koşullu modlar, Gibbs örneklemesi]
+
+<br>
+
+
+**79. [Factor graph transformations, Conditioning, Elimination]**
+
+&#10230; 79. [Faktör grafiği dönüşümleri, Koşullandırma, Eleme]
+
+<br>
+
+
+**80. [Bayesian networks, Definition, Locally normalized, Marginalization]**
+
+&#10230; 80. [Bayesçi ağlar, Tanım, Yerel normalleştirme, Marjinalleşme]
+
+<br>
+
+
+**81. [Probabilistic program, Concept, Summary]**
+
+&#10230; 81. [Olasılık programı, Kavram, Özet]
+
+<br>
+
+
+**82. [Inference, Forward-backward algorithm, Gibbs sampling, Laplace smoothing]**
+
+&#10230; 82. [Çıkarım, İleri-geri algoritması, Gibbs örneklemesi, Laplace yumuşatması]
+
+<br>
+
+
+**83. View PDF version on GitHub**
+
+&#10230; 83. GitHub'da PDF versiyonun görüntüleyin
+
+<br>
+
+
+**84. Original authors**
+
+&#10230; 84. Orijinal yazarlar
+
+<br>
+
+
+**85. Translated by X, Y and Z**
+
+&#10230; 85. X, Y ve Z tarafından çevrilmiştir.
+
+<br>
+
+
+**86. Reviewed by X, Y and Z**
+
+&#10230; 86. X,Y,Z tarafından kontrol edilmiştir.
+
+<br>
+
+
+**87. By X and Y**
+
+&#10230; 87. X ve Y ile
+
+<br>
+
+
+**88. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230;88. Yapay Zeka el kitapları artık [hedef dilde] mevcuttur.

From df6158cac07aabd12cc63033a1f7f8de6248d5c9 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Thu, 1 Aug 2019 13:46:23 -0700
Subject: [PATCH 294/531] Update [tr] link

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 23e1e9dd2..0ca9795a8 100644
--- a/README.md
+++ b/README.md
@@ -46,7 +46,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**日本語**|not started|not started|not started|not started|
 |**한국어**|not started|not started|not started|not started|
 |**Português**|not started|not started|not started|not started|
-|**Türkçe**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/166)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/168)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/169)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/170)|
+|**Türkçe**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/166)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/168)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/171)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/170)|
 |**Tiếng Việt**|not started|not started|not started|not started|
 |**中文**|not started|not started|not started|not started|
 

From b0680eb3d2ec2bbf8c18f57692314567cb38f974 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Yavuz=20K=C3=B6me=C3=A7o=C4=9Flu?=
 <komecoglu.yavuz@gmail.com>
Date: Sat, 3 Aug 2019 23:35:19 +0300
Subject: [PATCH 295/531] update [tr] Reflex-based models

translation in progress.
---
 tr/cs-221-reflex-models.md | 109 +++++++++++++++++++------------------
 1 file changed, 55 insertions(+), 54 deletions(-)

diff --git a/tr/cs-221-reflex-models.md b/tr/cs-221-reflex-models.md
index bb32af8b7..16e343c50 100644
--- a/tr/cs-221-reflex-models.md
+++ b/tr/cs-221-reflex-models.md
@@ -18,35 +18,35 @@
 
 **3. In this section, we will go through reflex-based models that can improve with experience, by going through samples that have input-output pairs.**
 
-&#10230;
+&#10230; Bu bölümde, girdi-çıktı çiftleri olan örneklerden geçerek, deneyim ile gelişebilecek refleks bazlı modelleri göreceğiz.
 
 <br>
 
 
 **4. Feature vector ― The feature vector of an input x is noted ϕ(x) and is such that:**
 
-&#10230;
+&#10230; Öznitelik vektörü - x girişinin öznitelik vektörü ϕ (x) olarak not edilir ve şöyledir:
 
 <br>
 
 
 **5. Score ― The score s(x,w) of an example (ϕ(x),y)∈Rd×R associated to a linear model of weights w∈Rd is given by the inner product:**
 
-&#10230;
+&#10230; Puan - Bir örneğin s(x, w)si ni (ϕ(x),y))∈Rd×R, w∈Rd doğrusal ağırlık modeline bağlı olarak:
 
 <br>
 
 
 **6. Classification**
 
-&#10230;
+&#10230; Sınıflandırma
 
 <br>
 
 
 **7. Linear classifier ― Given a weight vector w∈Rd and a feature vector ϕ(x)∈Rd, the binary linear classifier fw is given by:**
 
-&#10230;
+&#10230; Doğrusal sınıflandırıcı - Bir ağırlık vektörü w∈Rd ve bir öznitelik vektörü ϕ(x)∈Rd verildiğinde, ikili doğrusal sınıflandırıcı fw şöyle verilir:
 
 <br>
 
@@ -55,61 +55,61 @@
 
 &#10230;
 
-<br>
+<br> Eğer
 
 
 **9. Margin ― The margin m(x,y,w)∈R of an example (ϕ(x),y)∈Rd×{−1,+1} associated to a linear model of weights w∈Rd quantifies the confidence of the prediction: larger values are better. It is given by:**
 
-&#10230;
+&#10230; 
 
 <br>
 
 
 **10. Regression**
 
-&#10230;
+&#10230; Bağlanım (Regression)
 
 <br>
 
 
 **11. Linear regression ― Given a weight vector w∈Rd and a feature vector ϕ(x)∈Rd, the output of a linear regression of weights w denoted as fw is given by:**
 
-&#10230;
+&#10230; Doğrusal bağlanım (Linear regression) - w∈Rd bir ağırlık vektörü ve bir öznitelik vektörü ϕ(x)∈Rd verildiğinde, fw olarak belirtilen ağırlıkların doğrusal bir bağlanım" çıktısı şöyle verilir:
 
 <br>
 
 
 **12. Residual ― The residual res(x,y,w)∈R is defined as being the amount by which the prediction fw(x) overshoots the target y:**
 
-&#10230;
+&#10230; Artık (Residual) - Artık res(x,y,w)∈R, fw(x) tahmininin y hedefini aştığı miktar olarak tanımlanır:
 
 <br>
 
 
 **13. Loss minimization**
 
-&#10230;
+&#10230; Kayıp minimizasyonu
 
 <br>
 
 
 **14. Loss function ― A loss function Loss(x,y,w) quantifies how unhappy we are with the weights w of the model in the prediction task of output y from input x. It is a quantity we want to minimize during the training process.**
 
-&#10230;
+&#10230; Kayıp fonksiyonu - Kayıp fonksiyonu Loss(x,y,w), x girişinden y çıktısının öngörme görevindeki model ağırlıkları ile ne kadar mutsuz olduğumuzu belirler. Bu değer eğitim sürecinde en aza indirmek istediğimiz bir miktar.
 
 <br>
 
 
 **15. Classification case - The classification of a sample x of true label y∈{−1,+1} with a linear model of weights w can be done with the predictor fw(x)≜sign(s(x,w)). In this situation, a metric of interest quantifying the quality of the classification is given by the margin m(x,y,w), and can be used with the following loss functions:**
 
-&#10230;
+&#10230; 
 
 <br>
 
 
 **16. [Name, Illustration, Zero-one loss, Hinge loss, Logistic loss]**
-
-&#10230;
+ 
+&#10230; [Ad, Örnekleme, Sıfır-bir kayıp, Menteşe kaybı, Lojistik kaybı]
 
 <br>
 
@@ -123,63 +123,63 @@
 
 **18. [Name, Squared loss, Absolute deviation loss, Illustration]**
 
-&#10230;
+&#10230; [Ad, Kareler kaybı, Mutlak sapma kaybı, Örnekleme]
 
 <br>
 
 
 **19. Loss minimization framework ― In order to train a model, we want to minimize the training loss is defined as follows:**
 
-&#10230;
+&#10230; Kayıp minimize etme çerçevesi (framework) - Bir modeli eğitmek için, eğitim kaybını en aza indirmek istiyoruz;
 
 <br>
 
 
 **20. Non-linear predictors**
 
-&#10230;
+&#10230; Doğrusal olmayan öngörücüler
 
 <br>
 
 
 **21. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
 
-&#10230;
+&#10230; k-en yakın komşu - Yaygın olarak k-NN olarak bilinen k-en yakın komşu algoritması, bir veri noktasının tepkisinin eğitim kümesinden k komşularının yapısı tarafından belirlendiği parametrik olmayan bir yaklaşımdır. Hem sınıflandırma hem de regresyon ayarlarında kullanılabilir.
 
 <br>
 
 
 **22. Remark: the higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
 
-&#10230;
+&#10230; Not: k parametresi ne kadar yüksekse, önyargı (bias) o kadar yüksek ve k parametresi ne kadar düşükse, varyans o kadar yüksek olur.
 
 <br>
 
 
 **23. Neural networks ― Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks. The vocabulary around neural networks architectures is described in the figure below:**
 
-&#10230;
+&#10230; Yapay sinir ağları - Yapay sinir ağları katmanlarla oluşturulmuş bir model sınıfıdır. Yaygın olarak kullanılan sinir ağları, evrişimli ve tekrarlayan sinir ağlarını içerir. Yapay sinir ağları mimarisi etrafındaki kelime bilgisi aşağıdaki şekilde tanımlanmıştır:
 
 <br>
 
 
 **24. [Input layer, Hidden layer, Output layer]**
 
-&#10230;
+&#10230; [Giriş katmanı, Gizli katman, Çıkış katmanı]
 
 <br>
 
 
 **25. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
 
-&#10230;
+&#10230; i, ağın i. katmanı ve j, katmanın j. gizli birimi olacak şekilde aşağıdaki gibi ifade edilir:
 
 <br>
 
 
 **26. where we note w, b, x, z the weight, bias, input and non-activated output of the neuron respectively.**
 
-&#10230;
+&#10230; w, b, x, z değerlerinin sırasıyla nöronun ağırlık, önyargı (bias), girdi ve aktive edilmemiş çıkışını olarak ifade eder.
 
 <br>
 
@@ -340,200 +340,201 @@
 
 **49. k-means**
 
-&#10230;
+&#10230; k-ortalama
 
 <br>
 
 
 **50. Clustering ― Given a training set of input points Dtrain, the goal of a clustering algorithm is to assign each point ϕ(xi) to a cluster zi∈{1,...,k}**
 
-&#10230;
+&#10230; Kümeleme - Dtrain giriş noktalarından oluşan bir eğitim kümesi göz önüne alındığında, kümeleme algoritmasının amacı, her bir ϕ(xi) noktasını zi∈{1,...,k} kümesine atamaktır.
 
 <br>
 
 
 **51. Objective function ― The loss function for one of the main clustering algorithms, k-means, is given by:**
 
-&#10230;
-
+&#10230; Amaç fonksiyonu - Ana kümeleme algoritmalarından biri olan k-ortalama için kayıp fonksiyonu şöyle ifade edilir:
+ 
 <br>
 
 
 **52. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
 
-&#10230;
+&#10230; Algoritma - Küme merkezlerini μ1,μ2,...,μk∈Rn kümesini rasgele başlattıktan sonra, k-ortalama algoritması yakınsayana kadar aşağıdaki adımı tekrarlar:
 
 <br>
 
 
 **53. and**
 
-&#10230;
+&#10230; ve 
 
 <br>
 
 
 **54. [Means initialization, Cluster assignment, Means update, Convergence]**
 
-&#10230;
+&#10230; [Başlatma anlamına gelir, Kümeleme görevi, Güncelleme, Yakınsama anlamına gelir]
 
 <br>
 
 
 **55. Principal Component Analysis**
 
-&#10230;
+&#10230; Temel Bileşenler Analizi
 
 <br>
 
 
 **56. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
-&#10230;
+&#10230; Özdeğer, özvektör - Bir A∈Rn×n matrisi verildiğinde, z∈Rn∖{0} olacak şekilde bir vektör varsa λ, A'nın bir öz değeri olduğu söylenir, aşağıdaki gibi ifade edilir:
 
 <br>
 
 
 **57. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230;
+&#10230; Spektral teoremi - A∈Rn×n olsun. A simetrik ise, o zaman A gerçek ortogonal matris U∈Rn×n olacak şekilde köşegenleştirilebilir. Λ=diag(λ1,...,λn) formülü dikkate alınarak aşağıdaki gibi ifade edilir:
 
 <br>
 
 
 **58. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
 
-&#10230;
+&#10230; Not: En büyük özdeğerle ilişkilendirilen özvektör, A matrisinin temel özvektörüdür.
 
 <br>
 
 
 **59. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:**
 
-&#10230;
+&#10230; Algoritma - Temel Bileşenler Analizi (PCA) prosedürü, verilerin varyansını en üst düzeye çıkararak k boyutlarına indirgeyen bir boyut küçültme tekniğidir:
 
 <br>
 
 
 **60. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
 
-&#10230;
+&#10230; Adım 1: Verileri ortalama 0 ve 1 standart sapma olacak şekilde normalize edin.
 
 <br>
 
 
 **61. [where, and]**
 
-&#10230;
+&#10230; [koşul, ve]
 
 <br>
 
 
 **62. [Step 2: Compute Σ=1mm∑i=1ϕ(xi)ϕ(xi)T∈Rn×n, which is symmetric with real eigenvalues., Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues., Step 4: Project the data on spanR(u1,...,uk).]**
 
-&#10230;
+&#10230; [Adım 2: Hesaplama Σ=1mm∑i=1ϕ(xi)ϕ(xi)T∈Rn×n, ki bu, gerçek özdeğerlerle simetriktir., Adım 3: Hesaplama u1,...,uk∈Rn k'nin ortogonal ana özvektörleri, yani k en büyük özdeğerlerin ortogonal özvektörleri., Adım 4: spanR(u1,...,uk)'daki verilerin izdüşümünü al.
 
 <br>
 
 
 **63. This procedure maximizes the variance among all k-dimensional spaces.**
 
-&#10230;
+&#10230; Bu prosedür, tüm k boyutlu uzaylar arasındaki farkı en üst düzeye çıkarır.
 
 <br>
 
 
 **64. [Data in feature space, Find principal components, Data in principal components space]**
 
-&#10230;
+&#10230; [Öznitelik uzayındaki veriler, Asıl bileşenleri bulma, Asıl bileşenler uzayındaki veriler]
 
 <br>
 
 
 **65. For a more detailed overview of the concepts above, check out the Unsupervised Learning cheatsheets!**
 
-&#10230;
+&#10230; Yukarıdaki kavramlara daha ayrıntılı bir genel bakış için, Gözetimsiz Öğrenme el kitaplarına göz atın!
 
 <br>
 
 
 **66. [Linear predictors, Feature vector, Linear classifier/regression, Margin]**
 
-&#10230;
+&#10230; [Doğrusal öngörücüler, Öznitelik vektörü, Doğrusal sınıflandırıcı/regresyon, Margin]
 
 <br>
 
 
 **67. [Loss minimization, Loss function, Framework]**
 
-&#10230;
+&#10230; [Kayıp minimizasyonu, Kayıp fonksiyonu, Çerçeve (Framework)]
 
 <br>
 
 
 **68. [Non-linear predictors, k-nearest neighbors, Neural networks]**
 
-&#10230;
+&#10230; [Doğrusal olmayan öngörücüler, k-en yakın komşular, Yapay sinir ağları]
 
 <br>
 
 
 **69. [Stochastic gradient descent, Gradient, Stochastic updates, Batch updates]**
 
-&#10230;
+&#10230; [Stokastik Dereceli Azalma/Bayır İnişi, Gradyan, Stokastik güncellemeler, Yığın (Batch) güncellemeler]
 
 <br>
 
 
 **70. [Fine-tuning models, Hypothesis class, Backpropagation, Regularization, Sets vocabulary]**
 
-&#10230;
+&#10230; [Hassas ayar modeller, Hipotez sınıfı, Geri yayılım, Düzenlileştirme (Regularization), Kelime dizisi]
 
 <br>
 
 
 **71. [Unsupervised Learning, k-means, Principal components analysis]**
 
-&#10230;
+&#10230; [Gözetimsiz Öğrenme, k-ortalama, Temel bileşenler analizi]
 
 <br>
 
 
 **72. View PDF version on GitHub**
 
-&#10230;
+&#10230; GitHub'da PDF sürümünü görüntüleyin
 
 <br>
 
 
 **73. Original authors**
 
-&#10230;
+&#10230; Orijinal yazarlar
 
 <br>
 
 
 **74. Translated by X, Y and Z**
 
-&#10230;
+&#10230; X, Y ve Z tarafından çevrilmiştir
 
 <br>
 
 
 **75. Reviewed by X, Y and Z**
 
-&#10230;
+&#10230; X, Y ve Z tarafından gözden geçirilmiştir
 
 <br>
 
 
 **76. By X and Y**
 
-&#10230;
+&#10230; X ve Y ile
 
 <br>
 
 
 **77. The Artificial Intelligence cheatsheets are now available in [target language].**
 
-&#10230;
+&#10230; Yapay Zeka el kitabı şimdi [hedef dilde] mevcuttur.
+

From e0e0578afa39e7a478bd50b279f76a02e99ef659 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Yavuz=20K=C3=B6me=C3=A7o=C4=9Flu?=
 <komecoglu.yavuz@gmail.com>
Date: Sun, 4 Aug 2019 11:58:17 +0300
Subject: [PATCH 296/531] update [tr] Reflex-based models

Turkish translation completed.
---
 tr/cs-221-reflex-models.md | 51 +++++++++++++++++++-------------------
 1 file changed, 26 insertions(+), 25 deletions(-)

diff --git a/tr/cs-221-reflex-models.md b/tr/cs-221-reflex-models.md
index 16e343c50..c87bdbe91 100644
--- a/tr/cs-221-reflex-models.md
+++ b/tr/cs-221-reflex-models.md
@@ -102,7 +102,9 @@
 
 **15. Classification case - The classification of a sample x of true label y∈{−1,+1} with a linear model of weights w can be done with the predictor fw(x)≜sign(s(x,w)). In this situation, a metric of interest quantifying the quality of the classification is given by the margin m(x,y,w), and can be used with the following loss functions:**
 
-&#10230; 
+&#10230; Sınıflandırma durumu - Doğru etiket y∈{−1,+1} değerinin x örneğinin doğrusal ağırlık w modeliyle sınıflandırılması fw(x)≜sign(s(x,w)) belirleyicisi ile yapılabilir. Bu durumda, sınıflandırma kalitesini ölçen bir fayda ölçütü m(x,y,w) marjı ile verilir ve aşağıdaki kayıp fonksiyonlarıyla birlikte kullanılabilir:
+
+Doğru etiket y örneğinin x değerinin doğrusal ağırlık w modeli ile sınıflandırılması f öngörüsü ile yapılabilir.
 
 <br>
 
@@ -116,7 +118,7 @@
 
 **17. Regression case - The prediction of a sample x of true label y∈R with a linear model of weights w can be done with the predictor fw(x)≜s(x,w). In this situation, a metric of interest quantifying the quality of the regression is given by the margin res(x,y,w) and can be used with the following loss functions:**
 
-&#10230;
+&#10230; Regresyon durumu - Doğru etiket y∈R değerinin x örneğinin bir doğrusal ağırlık modeli w ile öngörülmesi fw(x)≜s(x,w) öngörüsü ile yapılabilir. Bu durumda, regresyonun kalitesini ölçen bir fayda ölçütü res(x,y,w) marjı ile verilir ve aşağıdaki kayıp fonksiyonlarıyla birlikte kullanılabilir:
 
 <br>
 
@@ -186,154 +188,153 @@
 
 **27. For a more detailed overview of the concepts above, check out the Supervised Learning cheatsheets!**
 
-&#10230;
+&#10230; Yukarıdaki kavramlara daha ayrıntılı bir bakış için, Gözetimli Öğrenme el kitabına göz atın!
 
 <br>
 
 
 **28. Stochastic gradient descent**
 
-&#10230;
+&#10230; Stokastik gradyan inişi (Bayır inişi)
 
 <br>
 
 
 **29. Gradient descent ― By noting η∈R the learning rate (also called step size), the update rule for gradient descent is expressed with the learning rate and the loss function Loss(x,y,w) as follows:**
 
-&#10230;
+&#10230; Gradyan inişi (Bayır inişi) - η∈R öğrenme oranını (aynı zamanda adım boyutu olarak da bilinir) dikkate alınarak, gradyan inişine ilişkin güncelleme kuralı, öğrenme oranı ve Loss(x,y,w) kayıp fonksiyonu ile aşağıdaki şekilde ifade edilir:
 
 <br>
 
 
 **30. Stochastic updates ― Stochastic gradient descent (SGD) updates the parameters of the model one training example (ϕ(x),y)∈Dtrain at a time. This method leads to sometimes noisy, but fast updates.**
 
-&#10230;
+&#10230; Stokastik güncellemeler - Stokastik gradyan inişi (SGİ / SGD), bir seferde bir eğitim örneğinin (ϕ(x),y)∈Değitim parametrelerini günceller. Bu yöntem bazen gürültülü, ancak hızlı güncellemeler yol açar.
 
 <br>
 
 
 **31. Batch updates ― Batch gradient descent (BGD) updates the parameters of the model one batch of examples (e.g. the entire training set) at a time. This method computes stable update directions, at a greater computational cost.**
 
-&#10230;
+&#10230; Yığın güncellemeler - Yığın gradyan inişi (YGİ / BGD), bir seferde bir grup örnek (örneğin, tüm eğitim kümesi) parametrelerini günceller. Bu yöntem daha yüksek bir hesaplama maliyetiyle kararlı güncelleme talimatlarını hesaplar.
 
 <br>
 
 
 **32. Fine-tuning models**
 
-&#10230;
+&#10230; İnce ayar modelleri
 
 <br>
 
 
 **33. Hypothesis class ― A hypothesis class F is the set of possible predictors with a fixed ϕ(x) and varying w:**
 
-&#10230;
+&#10230; Hipotez sınıfı - Bir hipotez sınıfı F, sabit bir ϕ (x) ve değişken w ile olası öngörücü kümesidir:
 
 <br>
 
 
 **34. Logistic function ― The logistic function σ, also called the sigmoid function, is defined as:**
 
-&#10230;
+&#10230; Lojistik fonksiyon - Ayrıca sigmoid fonksiyon olarak da adlandırılan lojistik fonksiyon σ, şöyle tanımlanır:
 
 <br>
 
 
 **35. Remark: we have σ′(z)=σ(z)(1−σ(z)).**
 
-&#10230;
+&#10230; Not: σ′(z)=σ(z)(1−σ(z)) şeklinde ifade edilir.
 
 <br>
 
 
 **36. Backpropagation ― The forward pass is done through fi, which is the value for the subexpression rooted at i, while the backward pass is done through gi=∂out∂fi and represents how fi influences the output.**
 
-&#10230;
+&#10230; Geri yayılım - İleriye geçiş, i'de yer alan alt ifadenin değeri olan fi ile yapılırken, geriye doğru geçiş gi=∂out∂fi aracılığıyla yapılır ve fi'nin çıkışı nasıl etkilediğini gösterir.
 
 <br>
 
 
 **37. Approximation and estimation error ― The approximation error ϵapprox represents how far the entire hypothesis class F is from the target predictor g∗, while the estimation error ϵest quantifies how good the predictor ^f is with respect to the best predictor f∗ of the hypothesis class F.**
 
-&#10230;
-
+&#10230; Yaklaşım ve kestirim hatası - Yaklaşım hatası ϵapprox, F tüm hipotez sınıfının hedef öngörücü g∗ ne kadar uzak olduğunu gösterirken, kestirim hatası ϵest öngörücüsü ^f, F hipotez sınıfının en iyi yordayıcısı f∗'ya göre ne kadar iyi olduğunu gösterir.
 <br>
 
 
 **38. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
 
-&#10230;
+&#10230; Düzenlileştirme (Regularization) - Düzenlileştirme prosedürü, modelin verilerin aşırı öğrenmesinden kaçınmayı amaçlar ve böylece yüksek değişkenlik sorunlarıyla ilgilenir. Aşağıdaki tablo, yaygın olarak kullanılan düzenlileştirme tekniklerinin farklı türlerini özetlemektedir:
 
 <br>
 
 
 **39. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
-&#10230;
+&#10230; [Katsayıları 0'a düşürür, Değişken seçimi için iyi, Katsayıları daha küçük yapar, Değişken seçimi ile küçük katsayılar arasında ödünleşim]
 
 <br>
 
 
 **40. Hyperparameters ― Hyperparameters are the properties of the learning algorithm, and include features, regularization parameter λ, number of iterations T, step size η, etc.**
 
-&#10230;
+&#10230; Hiperparametreler - Hiperparametreler öğrenme algoritmasının özellikleridir ve öznitelikler dahildir, λ normalizasyon parametresi, yineleme sayısı T, adım büyüklüğü η, vb.
 
 <br>
 
 
 **41. Sets vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
 
-&#10230;
+&#10230; Kümeler - Bir model seçerken, veriyi aşağıdaki gibi 3 farklı parçaya ayırırız:
 
 <br>
 
 
 **42. [Training set, Validation set, Testing set]**
 
-&#10230;
+&#10230; [Eğitim kümesi, Doğrulama kümesi, Test kümesi]
 
 <br>
 
 
 **43. [Model is trained, Usually 80% of the dataset, Model is assessed, Usually 20% of the dataset, Also called hold-out or development set, Model gives predictions, Unseen data]**
 
-&#10230;
+&#10230; [Model eğitilir, Veri kümesinin genellikle %80'i, Model değerlendirilir, Veri kümesinin genellikle %20'si, Ayrıca tutma veya geliştirme kümesi olarak da adlandırılır, Model tahminlerini verir, Görünmeyen veriler]
 
 <br>
 
 
 **44. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
 
-&#10230;
+&#10230; Model seçildikten sonra, tüm veri kümesi üzerinde eğitilir ve görünmeyen test kümesinde test edilir. Bunlar aşağıdaki şekilde gösterilmektedir:
 
 <br>
 
 
 **45. [Dataset, Unseen data, train, validation, test]**
 
-&#10230;
+&#10230; [Veri kümesi, Görünmeyen veriler, eğitim, doğrulama, test]
 
 <br>
 
 
 **46. For a more detailed overview of the concepts above, check out the Machine Learning tips and tricks cheatsheets!**
 
-&#10230;
+&#10230; Yukarıdaki kavramlara daha ayrıntılı bir bakış için, Makine Öğrenmesi ipuçları ve püf noktaları el kitabını göz atın!
 
 <br>
 
 
 **47. Unsupervised Learning**
 
-&#10230;
+&#10230; Gözetimsiz Öğrenme
 
 <br>
 
 
 **48. The class of unsupervised learning methods aims at discovering the structure of the data, which may have of rich latent structures.**
 
-&#10230;
+&#10230; Gözetimsiz öğrenme yöntemlerinin sınıfı, zengin gizli yapılara sahip olabilecek verilerin yapısını keşfetmeyi amaçlamaktadır.
 
 <br>
 

From df71a34a4904548fb74f20d7416a5075453490d2 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Yavuz=20K=C3=B6me=C3=A7o=C4=9Flu?=
 <komecoglu.yavuz@gmail.com>
Date: Tue, 6 Aug 2019 01:10:55 +0300
Subject: [PATCH 297/531] update [tr] Reflex-based models

All suggestions have been updated.
---
 tr/cs-221-reflex-models.md | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/tr/cs-221-reflex-models.md b/tr/cs-221-reflex-models.md
index c87bdbe91..e1aea4a79 100644
--- a/tr/cs-221-reflex-models.md
+++ b/tr/cs-221-reflex-models.md
@@ -4,7 +4,7 @@
 
 **1. Reflex-based models with Machine Learning**
 
-&#10230; Makine Öğrenmesi ile Refleks tabanlı modeller
+&#10230; Makine Öğrenmesi ile Refleks-temelli modeller
 
 <br>
 
@@ -18,14 +18,14 @@
 
 **3. In this section, we will go through reflex-based models that can improve with experience, by going through samples that have input-output pairs.**
 
-&#10230; Bu bölümde, girdi-çıktı çiftleri olan örneklerden geçerek, deneyim ile gelişebilecek refleks bazlı modelleri göreceğiz.
+&#10230; Bu bölümde, girdi-çıktı çiftleri olan örneklerden geçerek, deneyim ile gelişebilecek refleks-temelli modelleri göreceğiz.
 
 <br>
 
 
 **4. Feature vector ― The feature vector of an input x is noted ϕ(x) and is such that:**
 
-&#10230; Öznitelik vektörü - x girişinin öznitelik vektörü ϕ (x) olarak not edilir ve şöyledir:
+&#10230; Öznitelik vektörü ― x girişinin öznitelik vektörü ϕ (x) olarak not edilir ve şöyledir:
 
 <br>
 
@@ -60,7 +60,7 @@
 
 **9. Margin ― The margin m(x,y,w)∈R of an example (ϕ(x),y)∈Rd×{−1,+1} associated to a linear model of weights w∈Rd quantifies the confidence of the prediction: larger values are better. It is given by:**
 
-&#10230; 
+&#10230; Marj ― (ϕ(x),y)∈Rd×{−1,+1} örneğinin m(x,y,w)∈R marjları w∈Rd doğrusal ağırlık modeliyle ilişkili olarak, tahminin güvenirliği ölçülür: daha büyük değerler daha iyidir. Şöyle ifade edilir:
 
 <br>
 
@@ -88,7 +88,7 @@
 
 **13. Loss minimization**
 
-&#10230; Kayıp minimizasyonu
+&#10230; Kayıp/Yitim minimizasyonu
 
 <br>
 
@@ -104,8 +104,6 @@
 
 &#10230; Sınıflandırma durumu - Doğru etiket y∈{−1,+1} değerinin x örneğinin doğrusal ağırlık w modeliyle sınıflandırılması fw(x)≜sign(s(x,w)) belirleyicisi ile yapılabilir. Bu durumda, sınıflandırma kalitesini ölçen bir fayda ölçütü m(x,y,w) marjı ile verilir ve aşağıdaki kayıp fonksiyonlarıyla birlikte kullanılabilir:
 
-Doğru etiket y örneğinin x değerinin doğrusal ağırlık w modeli ile sınıflandırılması f öngörüsü ile yapılabilir.
-
 <br>
 
 
@@ -125,7 +123,7 @@ Doğru etiket y örneğinin x değerinin doğrusal ağırlık w modeli ile sın
 
 **18. [Name, Squared loss, Absolute deviation loss, Illustration]**
 
-&#10230; [Ad, Kareler kaybı, Mutlak sapma kaybı, Örnekleme]
+&#10230; [Ad, Kareler kaybı, Mutlak sapma kaybı, Görselleştirme]
 
 <br>
 
@@ -216,14 +214,14 @@ Doğru etiket y örneğinin x değerinin doğrusal ağırlık w modeli ile sın
 
 **31. Batch updates ― Batch gradient descent (BGD) updates the parameters of the model one batch of examples (e.g. the entire training set) at a time. This method computes stable update directions, at a greater computational cost.**
 
-&#10230; Yığın güncellemeler - Yığın gradyan inişi (YGİ / BGD), bir seferde bir grup örnek (örneğin, tüm eğitim kümesi) parametrelerini günceller. Bu yöntem daha yüksek bir hesaplama maliyetiyle kararlı güncelleme talimatlarını hesaplar.
+&#10230; Yığın/küme güncellemeler - Yığın gradyan inişi (YGİ / BGD), bir seferde bir grup örnek (örneğin, tüm eğitim kümesi) parametrelerini günceller. Bu yöntem daha yüksek bir hesaplama maliyetiyle kararlı güncelleme talimatlarını hesaplar.
 
 <br>
 
 
 **32. Fine-tuning models**
 
-&#10230; İnce ayar modelleri
+&#10230; İnce ayar (Fine-tuning) modelleri
 
 <br>
 
@@ -460,7 +458,7 @@ Doğru etiket y örneğinin x değerinin doğrusal ağırlık w modeli ile sın
 
 **66. [Linear predictors, Feature vector, Linear classifier/regression, Margin]**
 
-&#10230; [Doğrusal öngörücüler, Öznitelik vektörü, Doğrusal sınıflandırıcı/regresyon, Margin]
+&#10230; [Doğrusal öngörücüler, Öznitelik vektörü, Doğrusal sınıflandırıcı/regresyon, Marj]
 
 <br>
 
@@ -481,7 +479,7 @@ Doğru etiket y örneğinin x değerinin doğrusal ağırlık w modeli ile sın
 
 **69. [Stochastic gradient descent, Gradient, Stochastic updates, Batch updates]**
 
-&#10230; [Stokastik Dereceli Azalma/Bayır İnişi, Gradyan, Stokastik güncellemeler, Yığın (Batch) güncellemeler]
+&#10230; [Stokastik Dereceli Azalma/Bayır İnişi, Gradyan, Stokastik güncellemeler, Yığın/Küme (Batch) güncellemeler]
 
 <br>
 
@@ -538,4 +536,3 @@ Doğru etiket y örneğinin x değerinin doğrusal ağırlık w modeli ile sın
 **77. The Artificial Intelligence cheatsheets are now available in [target language].**
 
 &#10230; Yapay Zeka el kitabı şimdi [hedef dilde] mevcuttur.
-

From c0ac5d825cbdda827f7d2952b44ffa866df4b50b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ba=C5=9Fak=20Buluz?=
 <41359672+basakbuluz@users.noreply.github.com>
Date: Tue, 6 Aug 2019 01:42:30 +0300
Subject: [PATCH 298/531] Update cs-221-variables-models.md

---
 tr/cs-221-variables-models.md | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/tr/cs-221-variables-models.md b/tr/cs-221-variables-models.md
index 4e108c95e..06acc6a3f 100644
--- a/tr/cs-221-variables-models.md
+++ b/tr/cs-221-variables-models.md
@@ -74,7 +74,7 @@
 
 **11. Here, the constraint j with assignment x is said to be satisfied if and only if fj(x)=1.**
 
-&#10230;11.Burada, j kısıtlı x ataması ancak ve ancak fj(x)=1 olduğunda memnundur denir.
+&#10230;11.Burada, j kısıtlı x ataması ancak ve ancak fj(x)=1 olduğunda uygundur (satisfied) denir.
 
 <br>
 
@@ -88,7 +88,7 @@
 
 **13. Dynamic ordering**
 
-&#10230; 13. Dinamik düzenleşim
+&#10230; 13. Dinamik düzenleşim (Dynamic ordering)
 
 <br>
 
@@ -109,7 +109,7 @@
 
 **16. [Forward checking ― It is a one-step lookahead heuristic that preemptively removes inconsistent values from the domains of neighboring variables. It has the following characteristics:, After assigning a variable Xi, it eliminates inconsistent values from the domains of all its neighbors., If any of these domains becomes empty, we stop the local backtracking search., If we un-assign a variable Xi, we have to restore the domain of its neighbors.]**
 
-&#10230; 16. [İleri kontrol - Tutarsız değerleri komşu değişkenlerin etki alanlarından öncelikli bir şekilde ortadan kaldıran sezgisel bakış açısıdır. Aşağıdaki özelliklere sahiptir :, Bir Xi değişkenini atadıktan sonra, tüm komşularının etki alanlarından tutarsız değerleri eler., Bu etki alanlardan herhangi biri boş olursa, yerel geri arama araması durdurulur. , komşularının etki alanını eski haline getirilmek zorundadır.]
+&#10230; 16. [İleri kontrol - Tutarsız değerleri komşu değişkenlerin etki alanlarından öncelikli bir şekilde ortadan kaldıran sezgisel bakış açısıdır. Aşağıdaki özelliklere sahiptir : Bir Xi değişkenini atadıktan sonra, tüm komşularının etki alanlarından tutarsız değerleri eler. Bu etki alanlardan herhangi biri boş olursa, yerel geri arama araması durdurulur.Komşularının etki alanını eski haline getirilmek zorundadır.]
 
 <br>
 
@@ -123,7 +123,7 @@
 
 **18. Least constrained value ― It is a value-level ordering heuristic that assigns the next value that yields the highest number of consistent values of neighboring variables. Intuitively, this procedure chooses first the values that are most likely to work.**
 
-&#10230; 18. En düşük kısıtlı değer - Komşu değişkenlerin en yüksek tutarlı değerlerini elde ederek bir sonraki değeri veren değer seviyesi düzenleyici sezgisel bir değerdir. Sezgisel olarak, bu prosedür önce çalışması en muhtemel olan değerleri seçer.
+&#10230; 18. En düşük kısıtlı değer - Komşu değişkenlerin en yüksek tutarlı değerlerini elde ederek bir sonrakini veren seviye düzenleyici sezgisel bir değerdir. Sezgisel olarak, bu prosedür önce çalışması en muhtemel olan değerleri seçer.
 
 <br>
 
@@ -144,7 +144,7 @@
 
 **21. [Arc consistency ― We say that arc consistency of variable Xl with respect to Xk is enforced when for each xl∈Domainl:, unary factors of Xl are non-zero, there exists at least one xk∈Domaink such that any factor between Xl and Xk is non-zero.]**
 
-&#10230; 21. [Ark tutarlılığı - Xl değişkeninin ark tutarlılığının Xk'ye göre her bir xl∈Domainl için geçerli olduğu söylenir :, Xl'in birleşik faktörleri sıfır olmadığında, en az bir xk∈Domaink vardır, öyle ki Xl ve Xk arasında sıfır olmayan herhangi bir faktör vardır.
+&#10230; 21. [Ark tutarlılığı (Arc consistency) - Xl değişkeninin ark tutarlılığının Xk'ye göre her bir xl∈Domainl için geçerli olduğu söylenir : Xl'in birleşik faktörleri sıfır olmadığında, en az bir xk∈Domaink vardır, öyle ki Xl ve Xk arasında sıfır olmayan herhangi bir faktör vardır.
 
 <br>
 
@@ -172,14 +172,14 @@
 
 **25. Beam search ― Beam search is an approximate algorithm that extends partial assignments of n variables of branching factor b=|Domain| by exploring the K top paths at each step. The beam size K∈{1,...,bn} controls the tradeoff between efficiency and accuracy. This algorithm has a time complexity of O(n⋅Kblog(Kb)).**
 
-&#10230; 25. Işın araması - Işın araması, her adımda K en üst yollarını keşfederek, b=|Domain| dallanma faktörünün n değişkeninin kısmi atamalarını genişleten yaklaşık bir algoritmadır.
+&#10230; 25. Işın araması (Beam search) - Işın araması, her adımda K en üst yollarını keşfederek, b=|Domain| dallanma faktörünün n değişkeninin kısmi atamalarını genişleten yaklaşık bir algoritmadır.
 
 <br>
 
 
 **26. The example below illustrates a possible beam search of parameters K=2, b=3 and n=5.**
 
-&#10230; 26. Aşağıdaki örnek, K = 2, b = 3 ve n = 5 parametreleri ile muhtemel kiriş aramasını (beam search) göstermektedir.
+&#10230; 26. Aşağıdaki örnek, K = 2, b = 3 ve n = 5 parametreleri ile muhtemel ışın aramasını (beam search) göstermektedir.
 
 <br>
 
@@ -278,7 +278,7 @@ and fi,1,...,fi,k, Add fnew,i(x) defined as:]**
 
 **40. Treewidth ― The treewidth of a factor graph is the maximum arity of any factor created by variable elimination with the best variable ordering. In other words,**
 
-&#10230; 40. Ağaç genişliği - Bir faktör grafiğinin ağaç genişliği, değişken elemeli en iyi değişken sıralamasıyla oluşturulan herhangi bir faktörün maksimum ilişki derecesidir. Diğer bir deyişle,
+&#10230; 40. Ağaç genişliği (Treewidth) - Bir faktör grafiğinin ağaç genişliği, değişken elemeli en iyi değişken sıralamasıyla oluşturulan herhangi bir faktörün maksimum ilişki derecesidir. Diğer bir deyişle,
 
 <br>
 
@@ -390,7 +390,7 @@ and fi,1,...,fi,k, Add fnew,i(x) defined as:]**
 
 **56. Remark: examples of probabilistic programs include Hidden Markov model (HMM), factorial HMM, naive Bayes, latent Dirichlet allocation, diseases and symptoms and stochastic block models.**
 
-&#10230; 56. Not: Olasılık programlarına örnekler arasında Gizli Markov modeli (Hidden Markov model-HMM), faktöriyel HMM, naif Bayes (naive Bayes), gizli Dirichlet tahsisi (latent Dirichlet allocation), hastalıklar ve semptomlar ve stokastik blok modelleri bulunmaktadır.
+&#10230; 56. Not: Olasılık programlarına örnekler arasında Gizli Markov modeli (Hidden Markov model-HMM), faktöriyel HMM, naif Bayes (naive Bayes), gizli Dirichlet tahsisi (latent Dirichlet allocation), hastalıklar ve semptomları belirtirler ve stokastik blok modelleri bulunmaktadır.
 
 <br>
 
@@ -404,7 +404,7 @@ and fi,1,...,fi,k, Add fnew,i(x) defined as:]**
 
 **58. [Program, Algorithm, Illustration, Example]**
 
-&#10230; 58. [Program, Algoritma, İllüstrasyon, Örnek]
+&#10230; 58. [Program, Algoritma, Gösterim, Örnek]
 
 <br>
 
@@ -439,7 +439,7 @@ and fi,1,...,fi,k, Add fnew,i(x) defined as:]**
 
 **63. [General probabilistic inference strategy ― The strategy to compute the probability P(Q|E=e) of query Q given evidence E=e is as follows:, Step 1: Remove variables that are not ancestors of the query Q or the evidence E by marginalization, Step 2: Convert Bayesian network to factor graph, Step 3: Condition on the evidence E=e, Step 4: Remove nodes disconnected from the query Q by marginalization, Step 5: Run a probabilistic inference algorithm (manual, variable elimination, Gibbs sampling, particle filtering)]**
 
-&#10230; 63. [Genel olasılıksal çıkarım stratejisi - E = e kanıtı verilen Q sorgusunun P (Q | E = e) olasılığını hesaplama stratejisi aşağıdaki gibidir :, Adım 1: Q sorgusunun ataları olmayan değişkenlerini ya da marjinalleştirme yoluyla E kanıtını silin, Adım 2: Bayesçi ağı faktör grafiğine dönüştürün, Adım 3: Kanıtın koşulu E = e, Adım 4: Q sorgusu ile bağlantısı kesilen düğümleri marjinalleştirme yoluyla silin, Adım 5: Olasılıklı bir çıkarım algoritması çalıştırın (kılavuz, değişken eleme, Gibbs örneklemesi, parçacık filtreleme)]
+&#10230; 63. [Genel olasılıksal çıkarım stratejisi - E = e kanıtı verilen Q sorgusunun P (Q | E = e) olasılığını hesaplama stratejisi aşağıdaki gibidir : Adım 1: Q sorgusunun ataları olmayan değişkenlerini ya da marjinalleştirme yoluyla E kanıtını silin, Adım 2: Bayesçi ağı faktör grafiğine dönüştürün, Adım 3: Kanıtın koşulu E = e, Adım 4: Q sorgusu ile bağlantısı kesilen düğümleri marjinalleştirme yoluyla silin, Adım 5: Olasılıklı bir çıkarım algoritması çalıştırın (kılavuz, değişken eleme, Gibbs örneklemesi, parçacık filtreleme)]
 
 <br>
 
@@ -474,7 +474,7 @@ and fi,1,...,fi,k, Add fnew,i(x) defined as:]**
 
 **68. [Gibbs sampling ― This algorithm is an iterative approximate method that uses a small set of assignments (particles) to represent a large probability distribution. From a random assignment x, Gibbs sampling performs the following steps for i∈{1,...,n} until convergence:, For all u∈Domaini, compute the weight w(u) of assignment x where Xi=u, Sample v from the probability distribution induced by w: v∼P(Xi=v|X−i=x−i), Set Xi=v]**
 
-&#10230; 68. [Gibbs örneklemesi - Bu algoritma, büyük olasılık dağılımını temsil etmek için küçük bir dizi atama (parçacık) kullanan tekrarlı bir yaklaşık yöntemdir. Rasgele bir x atamasından Gibbs örneklemesi, i∈ {1, ..., n} için yakınsamaya kadar aşağıdaki adımları uygular :, Tüm u∈Domaini için, x atamasının x (u) ağırlığını hesaplayın, burada Xi = u, Sample w: v∼P (Xi = v | X − i = x − i), Set Xi = v] ile indüklenen olasılık dağılımından
+&#10230; 68. [Gibbs örneklemesi - Bu algoritma, büyük olasılık dağılımını temsil etmek için küçük bir dizi atama (parçacık) kullanan tekrarlı bir yaklaşık yöntemdir. Rasgele bir x atamasından Gibbs örneklemesi, i∈ {1, ..., n} için yakınsamaya kadar aşağıdaki adımları uygular :, Tüm u∈Domaini için, x atamasının x (u) ağırlığını hesaplayın, burada Xi = u, Sample w: v∼P (Xi = v | X − i = x − i), Set Xi = v] ile uyarılmış olasılık dağılımından
 
 <br>
 

From 5c53b3295ebfd7dc02efd5bb16bc6752ed713a91 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ba=C5=9Fak=20Buluz?=
 <41359672+basakbuluz@users.noreply.github.com>
Date: Tue, 6 Aug 2019 01:49:18 +0300
Subject: [PATCH 299/531] Update cs-221-variables-models.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Thank you so much for review @ayyucekizrak ✔

All proposed corrections have been made ✌ @shervinea
---
 tr/cs-221-variables-models.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tr/cs-221-variables-models.md b/tr/cs-221-variables-models.md
index 06acc6a3f..aac242e96 100644
--- a/tr/cs-221-variables-models.md
+++ b/tr/cs-221-variables-models.md
@@ -137,7 +137,7 @@
 
 **20. The example above is an illustration of the 3-color problem with backtracking search coupled with most constrained variable exploration and least constrained value heuristic, as well as forward checking at each step.**
 
-&#10230; 20. Yukarıdaki örnek, en kısıtlı değişken keşfi ve sezgisel en düşük kısıtlı değerin yanı sıra, her adımda ileri kontrol ile birleştirilmiş geri izleme arama ile 3 renkli problemin bir gösterimidir.
+&#10230; 20. Yukarıdaki örnek, en kısıtlı değişken keşfi ve sezgisel en düşük kısıtlı değerin yanı sıra, her adımda ileri kontrol ile birleştirilmiş geri izleme arama ile 3 renk probleminin bir gösterimidir.
 
 <br>
 
@@ -165,7 +165,7 @@
 
 **24. Approximate methods**
 
-&#10230;24. Yaklaşık yöntemler
+&#10230;24. Yaklaşık yöntemler (Approximate methods)
 
 <br>
 

From 2e675b5cda0932b617c007731bbd894c72fd1e8b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?ayy=C3=BCce=20k=C4=B1zrak?=
 <32331090+ayyucekizrak@users.noreply.github.com>
Date: Thu, 8 Aug 2019 10:21:32 +0300
Subject: [PATCH 300/531] Update cs-221-logic-models.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

I made a correction taking into account your feedback.
Semi-decidability is Yarı-karar verebilirlik.
Thank you!
---
 tr/cs-221-logic-models.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/tr/cs-221-logic-models.md b/tr/cs-221-logic-models.md
index d0eb75c23..23476dd86 100644
--- a/tr/cs-221-logic-models.md
+++ b/tr/cs-221-logic-models.md
@@ -25,7 +25,7 @@
 
 **4. [Name, Symbol, Meaning, Illustration]**
 
-&#10230; [Ad, Sembol, Anlamı, Gösterimi]
+&#10230; [Ad, Sembol, Anlamı, Gösterim]
 
 <br>
 
@@ -67,7 +67,7 @@
 
 **10. Interpretation function ― The interpretation function I(f,w) outputs whether model w satisfies formula f:**
 
-&#10230; Yorumlama fonksiyonu - Yorumlama fonksiyonu I(f,w), w modelinin f formülüne uygun olup olmadığını gösterir:
+&#10230; Yorumlama fonksiyonu ― Yorumlama fonksiyonu I(f,w), w modelinin f formülüne uygun olup olmadığını gösterir:
 
 <br>
 
@@ -81,7 +81,7 @@
 
 **12. Knowledge base**
 
-&#10230; Bilgi temeli
+&#10230; Bilgi temelli
 
 <br>
 
@@ -389,7 +389,7 @@
 
 **56. [Semi-decidability ― First-order logic, even restricted to only Horn clauses, is semi-decidable., if KB⊨f, forward inference on complete inference rules will prove f in finite time, if KB⊭f, no algorithm can show this in finite time]**
 
-&#10230; Yarı karar verilebilirlik ― Birinci dereceden mantık, sadece Horn cümleleriyle sınırlı olsa bile,  yarı kararsıdır eğer KB⊨f ise f sonsuz zamanlıdır. KB⊭f ise sonsuz zamanlı olabilirliği gösteren algoritma yoktur.
+&#10230; Yarı-karar verilebilirlik ― Birinci dereceden mantık, sadece Horn cümleleriyle sınırlı olsa bile,  yarı karar verilebilir eğer KB⊨f ise f sonsuz zamanlıdır. KB⊭f ise sonsuz zamanlı olabilirliği gösteren algoritma yoktur.
 
 <br>
 
@@ -417,7 +417,7 @@
 
 **60. [First-order logic, Substitution, Unification, Resolution rule, Modus ponens, Resolution, Semi-decidability]**
 
-&#10230; [Birinci derece mantık, Değiştirme, Birleştirme, Çözünürlük kuralı, Modus ponens, Çözünürlük, Yarı karar verilebilirlik]
+&#10230; [Birinci derece mantık, Değiştirme, Birleştirme, Çözünürlük kuralı, Modus ponens, Çözünürlük, Yarı-karar verilebilirlik]
 
 <br>
 

From 90d032abd277d24e2294a60e4e89e7c0f55a1128 Mon Sep 17 00:00:00 2001
From: Cemal GURPINAR <36713268+cemalgurpinar@users.noreply.github.com>
Date: Sat, 10 Aug 2019 23:51:31 +0300
Subject: [PATCH 301/531] [tr] States-based models

I translated the CS 221 - State-based models into Turkish. @shervinea can you please review it?

Best regards.
---
 tr/cs-221-states-models.md | 980 +++++++++++++++++++++++++++++++++++++
 1 file changed, 980 insertions(+)
 create mode 100644 tr/cs-221-states-models.md

diff --git a/tr/cs-221-states-models.md b/tr/cs-221-states-models.md
new file mode 100644
index 000000000..3d93b3f8c
--- /dev/null
+++ b/tr/cs-221-states-models.md
@@ -0,0 +1,980 @@
+**States-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-states-models)
+
+<br>
+
+**1. States-based models with search optimization and MDP**
+
+&#10230; Arama optimizasyonu ve Markov karar sürecine (MDP) sahip durum-temelli modeller
+
+<br>
+
+
+**2. Search optimization**
+
+&#10230;  Arama optimizasyonu
+
+<br>
+
+
+**3. In this section, we assume that by accomplishing action a from state s, we deterministically arrive in state Succ(s,a). The goal here is to determine a sequence of actions (a1,a2,a3,a4,...) that starts from an initial state and leads to an end state. In order to solve this kind of problem, our objective will be to find the minimum cost path by using states-based models.**
+
+&#10230; Bu bölümde, s durumunda a eylemini gerçekleştirdiğimizde, Succ(s,a) durumuna varacağımızı varsayıyoruz. Burada amaç, başlangıç durumundan başlayıp bitiş durumuna götüren bir eylem dizisi (a1,a2,a3,a4,...) belirlenmesidir. Bu tür bir problemi çözmek için, amacımız durum-temelli modelleri kullanarak asgari (minimum) maliyet yolunu bulmak olacaktır.
+
+<br>
+
+
+**4. Tree search**
+
+&#10230; Ağaç arama
+
+<br>
+
+
+**5. This category of states-based algorithms explores all possible states and actions. It is quite memory efficient, and is suitable for huge state spaces but the runtime can become exponential in the worst cases.**
+
+&#10230; Bu durum-temelli algoritmalar, olası bütün durum ve eylemleri araştırırlar. Oldukça bellek verimli ve büyük durum uzayları için uygundurlar ancak çalışma zamanı en kötü durumlarda üstel olabilir.
+
+<br>
+
+
+**6. [Self-loop, More than a parent, Cycle, More than a root, Valid tree]**
+
+&#10230; [Kendinden-Döngü, Bir ebeveynden daha fazlası, Çevrim, Bir kökten daha fazlası, Geçerli ağaç]
+
+<br>
+
+
+**7. [Search problem ― A search problem is defined with:, a starting state sstart, possible actions Actions(s) from state s, action cost Cost(s,a) from state s with action a, successor Succ(s,a) of state s after action a, whether an end state was reached IsEnd(s)]**
+
+&#10230; [Arama problemi ― Bir arama problemi aşağıdaki şekilde tanımlanmaktadır:, bir başlangıç durumu sstart, s durumunda gerçekleşebilecek olası eylemler Actions(s), s durumunda gerçekleşen a eyleminin eylem maliyeti Cost(s,a), a eyleminden sonraki varılacak durum Succ(s,a), son duruma ulaşılıp ulaşılamadığı IsEnd(s)]
+
+<br>
+
+
+**8. The objective is to find a path that minimizes the cost.**
+
+&#10230; Amaç, maliyeti en aza indiren bir yol bulmaktır.
+
+<br>
+
+
+**9. Backtracking search ― Backtracking search is a naive recursive algorithm that tries all possibilities to find the minimum cost path. Here, action costs can be either positive or negative.**
+
+&#10230; Geri izleme araması ― Geri izleme araması, asgari (minimum) maliyet yolunu bulmak için tüm olasılıkları deneyen saf bir özyinelemeli algoritmadır. Burada, eylem maliyetleri pozitif ya da negatif olabilir.
+
+<br>
+
+
+**10. Breadth-first search (BFS) ― Breadth-first search is a graph search algorithm that does a level-by-level traversal. We can implement it iteratively with the help of a queue that stores at each step future nodes to be visited. For this algorithm, we can assume action costs to be equal to a constant c⩾0.**
+
+&#10230; Genişlik-ilk arama (BFS) ― Genişlik-ilk arama, seviye seviye arama yapan bir çizge arama algoritmasıdır. Gelecekte her adımda ziyaret edilecek düğümleri tutan bir kuyruk yardımıyla yinelemeli olarak gerçekleyebiliriz. Bu algoritma için, eylem maliyetlerinin belirli bir sabite c⩾0 eşit olduğunu kabul edebiliriz.
+
+<br>
+
+
+**11. Depth-first search (DFS) ― Depth-first search is a search algorithm that traverses a graph by following each path as deep as it can. We can implement it recursively, or iteratively with the help of a stack that stores at each step future nodes to be visited. For this algorithm, action costs are assumed to be equal to 0.**
+
+&#10230; Derinlik-ilk arama (DFS) ― Derinlik-ilk arama, her bir yolu olabildiğince derin bir şekilde takip ederek çizgeyi dolaşan bir arama algoritmasıdır. Bu algoritmayı, ziyaret edilecek gelecek düğümleri her adımda bir yığın yardımıyla saklayarak, yinelemeli (recursively) ya da tekrarlı (iteratively) olarak uygulayabiliriz. Bu algoritma için eylem maliyetlerinin 0 olduğu varsayılmaktadır.
+
+<br>
+
+
+**12. Iterative deepening ― The iterative deepening trick is a modification of the depth-first search algorithm so that it stops after reaching a certain depth, which guarantees optimality when all action costs are equal. Here, we assume that action costs are equal to a constant c⩾0.**
+
+&#10230; Tekrarlı derinleşme ― Tekrarlı derinleşme hilesi, derinlik-ilk arama algoritmasının tadil edilmiş bir halidir, böylece belirli bir derinliğe ulaştıktan sonra durur, bu da tüm işlem maliyetleri eşit olduğunda en iyiliği (optimal) garanti eder. Burada, işlem maliyetlerinin sabit bir değere eşit olduğunu varsayıyoruz c⩾0.
+
+<br>
+
+
+**13. Tree search algorithms summary ― By noting b the number of actions per state, d the solution depth, and D the maximum depth, we have:**
+
+&#10230; Ağaç arama algoritmaları özeti ― B durum başına eylem sayısını, d çözüm derinliğini ve D en yüksek (maksimum) derinliği ifade ederse, o zaman:
+
+<br>
+
+
+**14. [Algorithm, Action costs, Space, Time]**
+
+&#10230; [Algoritma, Eylem maliyetleri, Arama uzayı, Zaman]
+
+<br>
+
+
+**15. [Backtracking search, any, Breadth-first search, Depth-first search, DFS-Iterative deepening]**
+
+&#10230; [Geri izleme araması, herhangi bir şey, Genişlik-ilk arama, Derinlik-ilk arama, Derinlik-ilk arama - Tekrarlı derinleşme]
+
+<br>
+
+
+**16. Graph search**
+
+&#10230; Çizge arama
+
+<br>
+
+
+**17. This category of states-based algorithms aims at constructing optimal paths, enabling exponential savings. In this section, we will focus on dynamic programming and uniform cost search.**
+
+&#10230; Bu durum-temelli algoritmalar kategorisi, üssel tasarruf sağlayan en iyi (optimal) yolları oluşturmayı amaçlar. Bu bölümde, dinamik programlama ve tek tip maliyet araştırması üzerinde duracağız.
+
+<br>
+
+
+**18. Graph ― A graph is comprised of a set of vertices V (also called nodes) as well as a set of edges E (also called links).**
+
+&#10230; Çizge ― Bir çizge, V köşeler (düğüm olarak da adlandırılır) kümesi ile E kenarlar (bağlantı olarak da adlandırılır) kümesinden oluşur.
+
+<br>
+
+
+**19. Remark: a graph is said to be acylic when there is no cycle.**
+
+&#10230; Not: çevrim olmadığında, bir çizgenin asiklik (çevrimsiz) olduğu söylenir.
+
+<br>
+
+
+**20. State ― A state is a summary of all past actions sufficient to choose future actions optimally.**
+
+&#10230; Durum ― Bir durum gelecekteki eylemleri en iyi (optimal) şekilde seçmek için, yeterli tüm geçmiş eylemlerin özetidir.
+
+<br>
+
+
+**21. Dynamic programming ― Dynamic programming (DP) is a backtracking search algorithm with memoization (i.e. partial results are saved) whose goal is to find a minimum cost path from state s to an end state send. It can potentially have exponential savings compared to traditional graph search algorithms, and has the property to only work for acyclic graphs. For any given state s, the future cost is computed as follows:**
+
+&#10230; Dinamik programlama ― Dinamik programlama (DP), amacı s durumundan bitiş durumuna,send, kadar asgari(minimum) maliyet yolunu bulmak olan hatırlamalı (memoization) (başka bir deyişle kısmi sonuçlar kaydedilir) bir geri izleme (backtracking) arama algoritmasıdır. Geleneksel çizge arama algoritmalarına kıyasla üstel olarak tasarruf sağlayabilir ve yalnızca asiklik (çevrimsiz) çizgeler ile çalışma özelliğine sahiptir. Herhangi bir durum için gelecekteki maliyet aşağıdaki gibi hesaplanır:
+
+<br>
+
+
+**22. [if, otherwise]**
+
+&#10230; [eğer, aksi taktirde]
+
+<br>
+
+
+**23. Remark: the figure above illustrates a bottom-to-top approach whereas the formula provides the intuition of a top-to-bottom problem resolution.**
+
+&#10230; Not: Yukarıdaki şekil, aşağıdan yukarıya bir yaklaşımı sergilerken, formül ise yukarıdan aşağıya bir önsezi ile problem çözümü sağlar.
+
+<br>
+
+
+**24. Types of states ― The table below presents the terminology when it comes to states in the context of uniform cost search:**
+
+&#10230; Durum türleri ― Tek tip maliyet araştırması bağlamındaki durumlara ilişkin terminoloji aşağıdaki tabloda sunulmaktadır:
+
+<br>
+
+
+**25. [State, Explanation]**
+
+&#10230; [Durum, Açıklama]
+
+<br>
+
+
+**26. [Explored, Frontier, Unexplored]**
+
+&#10230; [Keşfedilmiş, Sırada (Frontier), Keşfedilmemiş]
+
+<br>
+
+
+**27. [States for which the optimal path has already been found, States seen for which we are still figuring out how to get there with the cheapest cost, States not seen yet]**
+
+&#10230; [En iyi (optimal) yolun daha önce bulunduğu durumlar, Görülen ancak hala en ucuza nasıl gidileceği hesaplanmaya çalışılan durumlar, Daha önce görülmeyen durumlar]
+
+<br>
+
+
+**28. Uniform cost search ― Uniform cost search (UCS) is a search algorithm that aims at finding the shortest path from a state sstart to an end state send. It explores states s in increasing order of PastCost(s) and relies on the fact that all action costs are non-negative.**
+
+&#10230; Tek tip maliyet araması ― Tek tip maliyet araması (Uniform cost search - UCS) bir başlangıç durumu,Sstart, ile bir bitiş durumu,Send, arasındaki en kısa yolu bulmayı amaçlayan bir arama algoritmasıdır. Bu algoritma s durumlarını artan geçmiş maliyetlerine,PastCost(s), göre araştırır ve eylem maliyetlerinin negatif olmayacağı kuralına dayanır.
+
+<br>
+
+
+**29. Remark 1: the UCS algorithm is logically equivalent to Djikstra's algorithm.**
+
+&#10230; Not 1: UCS algoritması mantıksal olarak Djikstra algoritması ile aynıdır.
+
+<br>
+
+
+**30. Remark 2: the algorithm would not work for a problem with negative action costs, and adding a positive constant to make them non-negative would not solve the problem since this would end up being a different problem.**
+
+&#10230; Not 2: Algoritma, negatif eylem maliyetleriyle ilgili bir problem için çalışmaz ve negatif olmayan bir hale getirmek için pozitif bir sabit eklemek problemi çözmez, çünkü problem farklı bir problem haline gelmiş olur.
+
+<br>
+
+
+**31. Correctness theorem ― When a state s is popped from the frontier F and moved to explored set E, its priority is equal to PastCost(s) which is the minimum cost path from sstart to s.**
+
+&#10230; Doğruluk teoremi ― S durumu sıradaki (frontier) F'den çıkarılır ve daha önceden keşfedilmiş olan E kümesine taşınırsa, önceliği başlangıç durumundan,Sstart, s durumuna kadar asgari (minimum) maliyet yolu olan PastCost(s)'e eşittir.
+
+<br>
+
+
+**32. Graph search algorithms summary ― By noting N the number of total states, n of which are explored before the end state send, we have:**
+
+&#10230; Çizge arama algoritmaları özeti ― N toplam durumların sayısı, n-bitiş durumu(Send)'ndan önce keşfedilen durum sayısı ise:
+
+<br>
+
+
+**33. [Algorithm, Acyclicity, Costs, Time/space]**
+
+&#10230; [Algoritma, Asiklik (Çevrimsizlik), Maliyetler, Zaman/arama uzayı]
+
+<br>
+
+
+**34. [Dynamic programming, Uniform cost search]**
+
+&#10230; [Dinamik programlama, Tek tip maliyet araması]
+
+<br>
+
+
+**35. Remark: the complexity countdown supposes the number of possible actions per state to be constant.**
+
+&#10230; Not: Karmaşıklık geri sayımı, her durum için olası eylemlerin sayısını sabit olarak kabul eder.
+
+<br>
+
+
+**36. Learning costs**
+
+&#10230; Öğrenme maliyetleri
+
+<br>
+
+
+**37. Suppose we are not given the values of Cost(s,a), we want to estimate these quantities from a training set of minimizing-cost-path sequence of actions (a1,a2,...,ak).**
+
+&#10230; Diyelim ki, Cost(s,a) değerleri verilmedi ve biz bu değerleri maliyet yolu eylem dizisini,(a1,a2,...,ak), en aza indiren bir eğitim kümesinden tahmin etmek istiyoruz.
+
+<br>
+
+
+**38. [Structured perceptron ― The structured perceptron is an algorithm aiming at iteratively learning the cost of each state-action pair. At each step, it:, decreases the estimated cost of each state-action of the true minimizing path y given by the training data, increases the estimated cost of each state-action of the current predicted path y' inferred from the learned weights.]**
+
+&#10230; [Yapılandırılmış algılayıcı ― Yapılandırılmış algılayıcı, her bir durum-eylem çiftinin maliyetini tekrarlı (iteratively) olarak öğrenmeyi amaçlayan bir algoritmadır. Her bir adımda, algılayıcı:, eğitim verilerinden elde edilen gerçek asgari (minimum) y yolunun her bir durum-eylem çiftinin tahmini (estimated) maliyetini azaltır, öğrenilen ağırlıklardan elde edilen şimdiki tahmini(predicted) y' yolununun durum-eylem çiftlerinin tahmini maliyetini artırır.]
+
+<br>
+
+
+**39. Remark: there are several versions of the algorithm, one of which simplifies the problem to only learning the cost of each action a, and the other parametrizes Cost(s,a) to a feature vector of learnable weights.**
+
+&#10230; Not: Algoritmanın birkaç sürümü vardır, bunlardan biri problemi sadece her bir a eyleminin maliyetini öğrenmeye indirger, bir diğeri ise öğrenilebilir ağırlık öznitelik vektörünü, Cost(s,a)'nın parametresi haline getirir.
+
+<br>
+
+
+**40. A* search**
+
+&#10230; A* arama
+
+<br>
+
+
+**41. Heuristic function ― A heuristic is a function h over states s, where each h(s) aims at estimating FutureCost(s), the cost of the path from s to send.**
+
+&#10230; Sezgisel işlev ― Sezgisel, s durumu üzerinde işlem yapan bir h fonksiyonudur, burada her bir h(s), s ile send arasındaki yol maliyeti olan FutureCost(s)'yi tahmin etmeyi amaçlar.
+
+<br>
+
+
+**42. Algorithm ― A∗ is a search algorithm that aims at finding the shortest path from a state s to an end state send. It explores states s in increasing order of PastCost(s)+h(s). It is equivalent to a uniform cost search with edge costs Cost′(s,a) given by:**
+
+&#10230; Algoritma ― A∗, s durumu ile send bitiş durumu arasındaki en kısa yolu bulmayı amaçlayan bir arama algoritmasıdır. Bahse konu algoritma PastCost(s)+h(s)'yi artan sıra ile araştırır. Aşağıda verilenler ışığında kenar maliyetlerini de içeren tek tip maliyet aramasına eşittir:
+
+<br>
+
+
+**43. Remark: this algorithm can be seen as a biased version of UCS exploring states estimated to be closer to the end state.**
+
+&#10230; Not: Bu algoritma, son duruma yakın olduğu tahmin edilen durumları araştıran tek tip maliyet aramasının taraflı bir sürümü olarak görülebilir.
+
+<br>
+
+
+**44. [Consistency ― A heuristic h is said to be consistent if it satisfies the two following properties:, For all states s and actions a, The end state verifies the following:]**
+
+&#10230; [Tutarlılık ― Bir sezgisel h, aşağıdaki iki özelliği sağlaması durumunda tutarlıdır denilebilir:, Bütün s durumları ve a eylemleri için, bitiş durumu aşağıdakileri doğrular:]
+
+<br>
+
+
+**45. Correctness ― If h is consistent, then A∗ returns the minimum cost path.**
+
+&#10230; Doğruluk ― Eğer h tutarlı ise o zaman A∗ algoritması asgari (minimum) maliyet yolunu döndürür.
+
+<br>
+
+
+**46. Admissibility ― A heuristic h is said to be admissible if we have:**
+
+&#10230; Kabul edilebilirlik ― Bir sezgisel h kabul edilebilirdir eğer:
+
+<br>
+
+
+**47. Theorem ― Let h(s) be a given heuristic. We have:**
+
+&#10230; Teorem ― h(s) verilen sezgisel olsun ve:
+
+<br>
+
+
+**48. [consistent, admissible]**
+
+&#10230; [tutarlı, kabul edilebilir]
+
+<br>
+
+
+**49. Efficiency ― A* explores all states s satisfying the following equation:**
+
+&#10230; Verimlilik ― A* algoritması aşağıdaki eşitliği sağlayan bütün s durumlarını araştırır:
+
+<br>
+
+
+**50. Remark: larger values of h(s) is better as this equation shows it will restrict the set of states s going to be explored.**
+
+&#10230; Not: h(s)'nin yüksek değerleri, bu eşitliğin araştırılacak olan s durum kümesini kısıtlayacak olması nedeniyle daha iyidir.
+
+<br>
+
+
+**51. Relaxation**
+
+&#10230; Rahatlama
+
+<br>
+
+
+**52. It is a framework for producing consistent heuristics. The idea is to find closed-form reduced costs by removing constraints and use them as heuristics.**
+
+&#10230; Bu tutarlı sezgisel için bir altyapıdır (framework). Buradaki fikir, kısıtlamaları kaldırarak kapalı şekilli (closed-form) düşük maliyetler bulmak ve bunları sezgisel olarak kullanmaktır.
+
+<br>
+
+
+**53. Relaxed search problem ― The relaxation of search problem P with costs Cost is noted Prel with costs Costrel, and satisfies the identity:**
+
+&#10230; Rahat arama problemi ― Cost maliyetli bir arama probleminin rahatlaması, Costrel maliyetli Prel ile ifade edilir ve kimliği (satisfy) karşılar:
+
+<br>
+
+
+**54. Relaxed heuristic ― Given a relaxed search problem Prel, we define the relaxed heuristic h(s)=FutureCostrel(s) as the minimum cost path from s to an end state in the graph of costs Costrel(s,a).**
+
+&#10230; Rahat sezgisel ― Bir Prel rahat arama problemi verildiğinde, h(s)=FutureCostrel(s) rahat sezgisel eşitliğini Costrel(s,a) maliyet çizgesindeki s durumu ile bir bitiş durumu arasındaki asgari(minimum) maliyet yolu olarak tanımlarız.
+
+<br>
+
+
+**55. Consistency of relaxed heuristics ― Let Prel be a given relaxed problem. By theorem, we have:**
+
+&#10230; Rahat sezgisel tutarlılığı ― Prel bir rahat problem olarak verilmiş olsun. Teoreme göre:
+
+<br>
+
+
+**56. consistent**
+
+&#10230; tutarlı
+
+<br>
+
+
+**57. [Tradeoff when choosing heuristic ― We have to balance two aspects in choosing a heuristic:, Computational efficiency: h(s)=FutureCostrel(s) must be easy to compute. It has to produce a closed form, easier search and independent subproblems., Good enough approximation: the heuristic h(s) should be close to FutureCost(s) and we have thus to not remove too many constraints.]**
+
+&#10230; [Sezgisel seçiminde ödünleşim (tradeoff) ― Sezgisel seçiminde iki yönü dengelemeliyiz:, Hesaplamalı verimlilik: h(s)=FutureCostrel(s) eşitliği kolay hesaplanabilir olmalıdır. Kapalı bir şekil, daha kolay arama ve bağımsız alt problemler üretmesi gerekir., Yeterince iyi yaklaşım: sezgisel h(s), FutureCost(s) işlevine yakın olmalı ve bu nedenle çok fazla kısıtlamayı ortadan kaldırmamalıyız.]
+
+<br>
+
+
+**58. Max heuristic ― Let h1(s), h2(s) be two heuristics. We have the following property:**
+
+&#10230; En yüksek sezgisel ― h1(s) ve h2(s) aşağıdaki özelliklere sahip iki adet sezgisel olsun:
+
+<br>
+
+
+**59. Markov decision processes**
+
+&#10230; Markov karar süreçleri
+
+<br>
+
+
+**60. In this section, we assume that performing action a from state s can lead to several states s′1,s′2,... in a probabilistic manner. In order to find our way between an initial state and an end state, our objective will be to find the maximum value policy by using Markov decision processes that help us cope with randomness and uncertainty.**
+
+&#10230; Bu bölümde, s durumunda a eyleminin gerçekleştirilmesinin olasılıksal olarak birden fazla durum,(s′1,s′2,...), ile sonuçlanacağını kabul ediyoruz. Başlangıç durumu ile bitiş durumu arasındaki yolu bulmak için amacımız, rastgelelilik ve belirsizlik ile başa çıkabilmek için yardımcı olan Markov karar süreçlerini kullanarak en yüksek değer politikasını bulmak olacaktır.
+
+<br>
+
+
+**61. Notations**
+
+&#10230; Gösterimler
+
+<br>
+
+
+**62. [Definition ― The objective of a Markov decision process is to maximize rewards. It is defined with:, a starting state sstart, possible actions Actions(s) from state s, transition probabilities T(s,a,s′) from s to s′ with action a, rewards Reward(s,a,s′) from s to s′ with action a, whether an end state was reached IsEnd(s), a discount factor 0⩽γ⩽1]**
+
+&#10230; [Tanım ― Markov karar sürecinin amacı ödülleri en yüksek seviyeye çıkarmaktır. Markov karar süreci aşağıdaki bileşenlerden oluşmaktadır:, başlangıç durumu sstart, s durumunda gerçekleştirilebilecek olası eylemler Actions(s), s durumunda a eyleminin gerçekleştirilmesi ile s′ durumuna geçiş olasılıkları T(s,a,s′), s durumunda a eyleminin gerçekleştirilmesi ile elde edilen ödüller Reward(s,a,s′), bitiş durumuna ulaşılıp ulaşılamadığı IsEnd(s), indirim faktörü 0⩽γ⩽1]
+
+<br>
+
+
+**63. Transition probabilities ― The transition probability T(s,a,s′) specifies the probability of going to state s′ after action a is taken in state s. Each s′↦T(s,a,s′) is a probability distribution, which means that:**
+
+&#10230; Geçiş olasılıkları ― Geçiş olasılığı T(s,a,s′) s durumundayken gerçekleştirilen a eylemi neticesinde s′ durumuna gitme olasılığını belirtir. Her bir s′↦T(s,a,s′) aşağıda belirtildiği gibi bir olasılık dağılımıdır:
+
+<br>
+
+
+**64. states**
+
+&#10230; durumlar
+
+<br>
+
+
+**65. Policy ― A policy π is a function that maps each state s to an action a, i.e.**
+
+&#10230; Politika ― Bir π politikası her s durumunu bir a eylemi ile ilişkilendiren bir işlevdir.
+
+<br>
+
+
+**66. Utility ― The utility of a path (s0,...,sk) is the discounted sum of the rewards on that path. In other words,**
+
+&#10230; Fayda ― Bir (s0,...,sk) yolunun faydası, o yol üzerindeki ödüllerin indirimli toplamıdır. Diğer bir deyişle,
+
+<br>
+
+
+**67. The figure above is an illustration of the case k=4.**
+
+&#10230; Yukarıdaki şekil k=4 durumunun bir gösterimidir.
+
+<br>
+
+
+**68. Q-value ― The Q-value of a policy π at state s with action a, also noted Qπ(s,a), is the expected utility from state s after taking action a and then following policy π. It is defined as follows:**
+
+&#10230; Q-değeri ― S durumunda gerçekleştirilen bir a eylemi için π politikasının Q-değeri, Qπ(s,a) olarak da gösterilir, a eylemini gerçekleştirip ve sonrasında π politikasını takiben s durumundan beklenen faydadır. Q-değeri aşağıdaki şekilde tanımlanmaktadır:
+
+<br>
+
+
+**69. Value of a policy ― The value of a policy π from state s, also noted Vπ(s), is the expected utility by following policy π from state s over random paths. It is defined as follows:**
+
+&#10230; Bir politikanın değeri ― S durumundaki π politikasının değeri,Vπ(s) olarak da gösterilir, rastgele yollar üzerinde s durumundaki π politikasını izleyerek elde edilen beklenen faydadır. S durumundaki π politikasının değeri aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**70. Remark: Vπ(s) is equal to 0 if s is an end state.**
+
+&#10230; Not: Eğer s bitiş durumu ise Vπ(s) sıfıra eşittir.
+
+<br>
+
+
+**71. Applications**
+
+&#10230; Uygulamalar
+
+<br>
+
+
+**72. [Policy evaluation ― Given a policy π, policy evaluation is an iterative algorithm that aims at estimating Vπ. It is done as follows:, Initialization: for all states s, we have:, Iteration: for t from 1 to TPE, we have, with]**
+
+&#10230; [Politika değerlendirme ― bir π politikası verildiğinde, politika değerlendirmesini,Vπ, tahmin etmeyi amaçlayan bir tekrarlı (iterative) algoritmadır. Politika değerlendirme aşağıdaki gibi yapılmaktadır:, İlklendirme: bütün s durumları için:, Tekrar: 1'den TPE'ye kadar her t için, ile]
+
+<br>
+
+
+**73. Remark: by noting S the number of states, A the number of actions per state, S′ the number of successors and T the number of iterations, then the time complexity is of O(TPESS′).**
+
+&#10230; Not: S durum sayısını, A her bir durum için eylem sayısını, S′ ardılların (successors) sayısını ve T yineleme sayısını gösterdiğinde, zaman karmaşıklığı O(TPESS′) olur.
+
+<br>
+
+
+**74. Optimal Q-value ― The optimal Q-value Qopt(s,a) of state s with action a is defined to be the maximum Q-value attained by any policy starting. It is computed as follows:**
+
+&#10230; En iyi Q-değeri ― S durumunda a eylemi gerçekleştirildiğinde bu durumun en iyi Q-değeri,Qopt(s,a), herhangi bir politika başlangıcında elde edilen en yüksek Q-değeri olarak tanımlanmaktadır. En iyi Q-değeri aşağıdaki gibi hesaplanmaktadır:
+
+<br>
+
+
+**75. Optimal value ― The optimal value Vopt(s) of state s is defined as being the maximum value attained by any policy. It is computed as follows:**
+
+&#10230; En iyi değer ― S durumunun en iyi değeri,Vopt(s), herhangi bir politika ile elde edilen en yüksek değer olarak tanımlanmaktadır. En iyi değer aşağıdaki gibi hesaplanmaktadır:
+
+<br>
+
+
+**76. actions**
+
+&#10230; eylemler
+
+<br>
+
+
+**77. Optimal policy ― The optimal policy πopt is defined as being the policy that leads to the optimal values. It is defined by:**
+
+&#10230; En iyi politika ― En iyi politika,πopt, en iyi değerlere götüren politika olarak tanımlanmaktadır. En iyi politika aşağıdaki gibi tanımlanmaktadır:
+
+<br>
+
+
+**78. [Value iteration ― Value iteration is an algorithm that finds the optimal value Vopt as well as the optimal policy πopt. It is done as follows:, Initialization: for all states s, we have:, Iteration: for t from 1 to TVI, we have:, with]**
+
+&#10230; [Değer tekrarı(iteration) ― Değer tekrarı(iteration) en iyi politikanın,πopt, yanında en iyi değeri,Vopt, bulan bir algoritmadır. Değer tekrarı(iteration) aşağıdaki gibi yapılmaktadır:, İlklendirme: bütün s durumları için:, Tekrar: 1'den TVI'ya kadar her bir t için:, ile]
+
+<br>
+
+
+**79. Remark: if we have either γ<1 or the MDP graph being acyclic, then the value iteration algorithm is guaranteed to converge to the correct answer.**
+
+&#10230; Not: Eğer γ<1 ya da Markov karar süreci (Markov Decision Process - MDP) asiklik (çevrimsiz) olursa, o zaman değer tekrarı algoritmasının doğru cevaba yakınsayacağı garanti edilir.
+
+<br>
+
+
+**80. When unknown transitions and rewards**
+
+&#10230; Bilinmeyen geçişler ve ödüller
+
+<br>
+
+
+**81. Now, let's assume that the transition probabilities and the rewards are unknown.**
+
+&#10230; Şimdi, geçiş olasılıklarının ve ödüllerin bilinmediğini varsayalım.
+
+<br>
+
+
+**82. Model-based Monte Carlo ― The model-based Monte Carlo method aims at estimating T(s,a,s′) and Reward(s,a,s′) using Monte Carlo simulation with: **
+
+&#10230; Model-temelli Monte Carlo ― Model-temelli Monte Carlo yöntemi, T(s,a,s′) ve Reward(s,a,s′) işlevlerini Monte Carlo benzetimi kullanarak aşağıdaki formüllere uygun bir şekilde tahmin etmeyi amaçlar:
+
+<br>
+
+
+**83. [# times (s,a,s′) occurs, and]**
+
+&#10230; [# (s,a,s′) gerçekleşme sayısı, ve]
+
+<br>
+
+
+**84. These estimations will be then used to deduce Q-values, including Qπ and Qopt.**
+
+&#10230; Bu tahminler daha sonra Qπ ve Qopt'yi içeren Q-değerleri çıkarımı için kullanılacaktır.
+
+<br>
+
+
+**85. Remark: model-based Monte Carlo is said to be off-policy, because the estimation does not depend on the exact policy.**
+
+&#10230; Not: model-tabanlı Monte Carlo'nun politika dışı olduğu söyleniyor, çünkü tahmin kesin politikaya bağlı değildir.
+
+<br>
+
+
+**86. Model-free Monte Carlo ― The model-free Monte Carlo method aims at directly estimating Qπ, as follows:**
+
+&#10230; Model içermeyen Monte Carlo ― Model içermeyen Monte Carlo yöntemi aşağıdaki şekilde doğrudan Qπ'yi tahmin etmeyi amaçlar:
+
+<br>
+
+
+**87. Qπ(s,a)=average of ut where st−1=s,at=a**
+
+&#10230; Qπ(s,a)= ortalama ut , st−1=s ve at=a olduğunda
+
+<br>
+
+
+**88. where ut denotes the utility starting at step t of a given episode.**
+
+&#10230; ut belirli bir bölümün t anında başlayan faydayı ifade etmektedir.
+
+<br>
+
+
+**89. Remark: model-free Monte Carlo is said to be on-policy, because the estimated value is dependent on the policy π used to generate the data.**
+
+&#10230; Not: model içermeyen Monte Carlo'nun politikaya dahil olduğu söyleniyor, çünkü tahmini değer veriyi üretmek için kullanılan π politikasına bağlıdır.
+
+<br>
+
+
+**90. Equivalent formulation - By introducing the constant η=11+(#updates to (s,a)) and for each (s,a,u) of the training set, the update rule of model-free Monte Carlo has a convex combination formulation:**
+
+&#10230; Eşdeğer formülasyon - Sabit tanımı η=11+(#güncelleme sayısı (s,a) ) ve eğitim kümesinin her bir (s,a,u) üçlemesi için, model içermeyen Monte Carlo'nun güncelleme kuralı dışbükey bir kombinasyon formülasyonuna sahiptir:
+
+<br>
+
+
+**91. as well as a stochastic gradient formulation:**
+
+&#10230; olasılıksal bayır formülasyonu yanında:
+
+<br>
+
+
+**92. SARSA ― State-action-reward-state-action (SARSA) is a boostrapping method estimating Qπ by using both raw data and estimates as part of the update rule. For each (s,a,r,s′,a′), we have:**
+
+&#10230; SARSA ― Durum-eylem-ödül-durum-eylem (State-Action-Reward-State-Action - SARSA), hem ham verileri hem de güncelleme kuralının bir parçası olarak tahminleri kullanarak Qπ'yi tahmin eden bir destekleme yöntemidir. Her bir (s,a,r,s′,a′) için:
+
+<br>
+
+
+**93. Remark: the SARSA estimate is updated on the fly as opposed to the model-free Monte Carlo one where the estimate can only be updated at the end of the episode.**
+
+&#10230; Not: the SARSA tahmini, tahminin yalnızca bölüm sonunda güncellenebildiği model içermeyen Monte Carlo yönteminin aksine anında güncellenir.
+
+<br>
+
+
+**94. Q-learning ― Q-learning is an off-policy algorithm that produces an estimate for Qopt. On each (s,a,r,s′,a′), we have:**
+
+&#10230; Q-öğrenme ― Q-öğrenme, Qopt için tahmin üreten politikaya dahil olmayan bir algoritmadır. Her bir (s,a,r,s′,a′) için:
+
+<br>
+
+
+**95. Epsilon-greedy ― The epsilon-greedy policy is an algorithm that balances exploration with probability ϵ and exploitation with probability 1−ϵ. For a given state s, the policy πact is computed as follows:**
+
+&#10230; Epsilon-açgözlü ― Epsilon-açgözlü politika, ϵ olasılıkla araştırmayı ve 1−ϵ olasılıkla sömürüyü dengeleyen bir algoritmadır. Her bir s durumu için, politika, πact, aşağıdaki şekilde hesaplanır:
+
+<br>
+
+
+**96. [with probability, random from Actions(s)]**
+
+&#10230; [olasılıkla, Actions(s) eylem kümesi içinden rastgele]
+
+<br>
+
+
+**97. Game playing**
+
+&#10230; Oyun oynama
+
+<br>
+
+
+**98. In games (e.g. chess, backgammon, Go), other agents are present and need to be taken into account when constructing our policy.**
+
+&#10230; Oyunlarda (örneğin satranç, tavla, Go), başka oyuncular vardır ve politikamızı oluştururken göz önünde bulundurulması gerekir.
+
+<br>
+
+
+**99. Game tree ― A game tree is a tree that describes the possibilities of a game. In particular, each node is a decision point for a player and each root-to-leaf path is a possible outcome of the game.**
+
+&#10230; Oyun ağacı ― Oyun ağacı, bir oyunun olasılıklarını tarif eden bir ağaçtır. Özellikle, her bir düğüm, oyuncu için bir karar noktasıdır ve her bir kökten (root) yaprağa (leaf) giden yol oyunun olası bir sonucudur.
+
+<br>
+
+
+**100. [Two-player zero-sum game ― It is a game where each state is fully observed and such that players take turns. It is defined with:, a starting state sstart, possible actions Actions(s) from state s, successors Succ(s,a) from states s with actions a, whether an end state was reached IsEnd(s), the agent's utility Utility(s) at end state s, the player Player(s) who controls state s]**
+
+&#10230; [İki oyunculu sıfır toplamlı oyun ― Her durumun tamamen gözlendiği ve oyuncuların sırayla oynadığı bir oyundur. Aşağıdaki gibi tanımlanır:, bir başlangıç durumu sstart, s durumunda gerçekleştirilebilecek olası eylemler Actions(s), s durumunda a eylemi gerçekleştirildiğindeki ardıllar Succ(s,a), bir bitiş durumuna ulaşılıp ulaşılmadığı IsEnd(s), s bitiş durumunda etmenin elde ettiği fayda Utility(s), s durumunu kontrol eden oyuncu Player(s)]
+
+<br>
+
+
+**101. Remark: we will assume that the utility of the agent has the opposite sign of the one of the opponent.**
+
+&#10230; Not: Oyuncu faydasının işaretinin, rakibinin faydasının tersi olacağını varsayacağız.
+
+<br>
+
+
+**102. [Types of policies ― There are two types of policies:, Deterministic policies, noted πp(s), which are actions that player p takes in state s., Stochastic policies, noted πp(s,a)∈[0,1], which are probabilities that player p takes action a in state s.]**
+
+&#10230; [Politika türleri ― İki tane politika türü vardır:, Belirlenimci politikalar, πp(s) olarak gösterilir, p oyuncusunun s durumunda gerçekleştirdiği eylemler., Olasılıksal politikalar, πp(s,a)∈[0,1] olarak gösterilir, p oyuncusunun s durumunda a eylemini gerçekleştirme olasılıkları.]
+
+<br>
+
+
+**103. Expectimax ― For a given state s, the expectimax value Vexptmax(s) is the maximum expected utility of any agent policy when playing with respect to a fixed and known opponent policy πopp. It is computed as follows:**
+
+&#10230; En yüksek beklenen değer(Expectimax) ― Belirli bir s durumu için, en yüksek beklenen değer, Vexptmax(s), sabit ve bilinen bir rakip politikasına,πopp, göre oynarken, bir oyuncu politikasının en yüksek beklenen faydasıdır. En yüksek beklenen değer(Expectimax) aşağıdaki gibi hesaplanmaktadır:
+
+<br>
+
+
+**104. Remark: expectimax is the analog of value iteration for MDPs.**
+
+&#10230; Not: En yüksek beklenen değer(Expectimax), MDP'ler için değer yinelemenin analog halidir.
+
+<br>
+
+
+**105. Minimax ― The goal of minimax policies is to find an optimal policy against an adversary by assuming the worst case, i.e. that the opponent is doing everything to minimize the agent's utility. It is done as follows:**
+
+&#10230; En küçük-en büyük (minimax) ― En küçük-enbüyük (minimax) politikaların amacı en kötü durumu kabul ederek, diğer bir deyişle, rakip, oyuncunun faydasını en aza indirmek için her şeyi yaparken, rakibe karşı en iyi politikayı bulmaktır. En küçük-en büyük(minimax) aşağıdaki şekilde yapılır:
+
+<br>
+
+
+**106. Remark: we can extract πmax and πmin from the minimax value Vminimax.**
+
+&#10230; Not: πmax ve πmin değerleri, en küçük-en büyükten,Vminimax, elde edilebilir.
+
+<br>
+
+
+**107. Minimax properties ― By noting V the value function, there are 3 properties around minimax to have in mind:**
+
+&#10230; En küçük-en büyük (minimax) özellikleri ― V değer fonksiyonunu ifade ederse, En küçük-en büyük (minimax) ile ilgili aklımızda bulundurmamız gereken 3 özellik vardır:
+
+<br>
+
+
+**108. Property 1: if the agent were to change its policy to any πagent, then the agent would be no better off.**
+
+&#10230; Özellik 1: Oyuncu politikasını herhangi bir πagent ile değiştirecek olsaydı, o zaman oyuncu daha iyi olmazdı.
+
+<br>
+
+
+**109. Property 2: if the opponent changes its policy from πmin to πopp, then he will be no better off.**
+
+&#10230; Özellik 2: Eğer rakip oyuncu politikasını πmin'den πopp'a değiştirecek olsaydı, o zaman rakip oyuncu daha iyi olamazdı.
+
+<br>
+
+
+**110. Property 3: if the opponent is known to be not playing the adversarial policy, then the minimax policy might not be optimal for the agent.**
+
+&#10230; Özellik 3: Eğer rakip oyuncunun muhalif (adversarial) politikayı oynamadığı biliniyorsa, o zaman en küçük-en büyük(minimax) politika oyuncu için ey iyi (optimal) olmayabilir.
+
+<br>
+
+
+**111. In the end, we have the following relationship:**
+
+&#10230; Sonunda, aşağıda belirtildiği gibi bir ilişkiye sahip oluruz:
+
+<br>
+
+
+**112. Speeding up minimax**
+
+&#10230; En küçük-en büyük (minimax) hızlandırma
+
+<br>
+
+
+**113. Evaluation function ― An evaluation function is a domain-specific and approximate estimate of the value Vminimax(s). It is noted Eval(s).**
+
+&#10230; Değerlendirme işlevi ― Değerlendirme işlevi, alana özgü (domain-specific) ve Vminimax(s) değerinin yaklaşık bir tahminidir. Eval(s) olarak ifade edilmektedir.
+
+<br>
+
+
+**114. Remark: FutureCost(s) is an analogy for search problems.**
+
+&#10230; Not: FutureCost(s) arama problemleri için bir benzetmedir(analogy).
+
+<br>
+
+
+**115. Alpha-beta pruning ― Alpha-beta pruning is a domain-general exact method optimizing the minimax algorithm by avoiding the unnecessary exploration of parts of the game tree. To do so, each player keeps track of the best value they can hope for (stored in α for the maximizing player and in β for the minimizing player). At a given step, the condition β<α means that the optimal path is not going to be in the current branch as the earlier player had a better option at their disposal.**
+
+&#10230; Alpha-beta budama ― Alfa-beta budama, oyun ağacının parçalarının gereksiz yere keşfedilmesini önleyerek en küçük-en büyük(minimax) algoritmasını en iyileyen (optimize eden) alana-özgü olmayan genel bir yöntemdir. Bunu yapmak için, her oyuncu ümit edebileceği en iyi değeri takip eder (maksimize eden oyuncu için α'da ve minimize eden oyuncu için β'de saklanır). Belirli bir adımda, β <α koşulu, önceki oyuncunun emrinde daha iyi bir seçeneğe sahip olması nedeniyle en iyi (optimal) yolun mevcut dalda olamayacağı anlamına gelir.
+
+<br>
+
+
+**116. TD learning ― Temporal difference (TD) learning is used when we don't know the transitions/rewards. The value is based on exploration policy. To be able to use it, we need to know rules of the game Succ(s,a). For each (s,a,r,s′), the update is done as follows:**
+
+&#10230; TD öğrenme ― Geçici fark (Temporal difference - TD) öğrenmesi, geçiş/ödülleri bilmediğimiz zaman kullanılır. Değer, keşif politikasına dayanır. Bunu kullanabilmek için, oyununun kurallarını,Succ (s, a), bilmemiz gerekir. Her bir (s,a,r,s′) için, güncelleme aşağıdaki şekilde yapılır:
+
+<br>
+
+
+**117. Simultaneous games**
+
+&#10230; Eşzamanlı oyunlar
+
+<br>
+
+
+**118. This is the contrary of turn-based games, where there is no ordering on the player's moves.**
+
+&#10230; Bu, oyuncunun hamlelerinin sıralı olmadığı sıra temelli oyunların tam tersidir.
+ 
+<br>
+
+
+**119. Single-move simultaneous game ― Let there be two players A and B, with given possible actions. We note V(a,b) to be A's utility if A chooses action a, B chooses action b. V is called the payoff matrix.**
+
+&#10230; Tek-hamleli eşzamanlı oyun ― Olası hareketlere sahip A ve B iki oyuncu olsun. V(a,b), A'nın a eylemini ve B'nin de b eylemini seçtiği A'nın faydasını ifade eder. V, getiri dizeyi olarak adlandırılır.
+
+<br>
+
+
+**120. [Strategies ― There are two main types of strategies:, A pure strategy is a single action:, A mixed strategy is a probability distribution over actions:]**
+
+&#10230; [Stratejiler ― İki tane ana strateji türü vardır:, Saf strateji, tek bir eylemdir:, Karışık strateji, eylemler üzerindeki bir olasılık dağılımıdır:]
+
+<br>
+
+
+**121. Game evaluation ― The value of the game V(πA,πB) when player A follows πA and player B follows πB is such that:**
+
+&#10230; Oyun değerlendirme ― oyuncu A πA'yı ve oyuncu B de πB'yi izlediğinde, Oyun değeri V(πA,πB):
+
+<br>
+
+
+**122. Minimax theorem ― By noting πA,πB ranging over mixed strategies, for every simultaneous two-player zero-sum game with a finite number of actions, we have:**
+
+&#10230; En küçük-en büyük (minimax) teoremi ― ΠA, πB’nin karma stratejilere göre değiştiğini belirterek, sonlu sayıda eylem ile eşzamanlı her iki oyunculu sıfır toplamlı oyun için:
+
+<br>
+
+
+**123. Non-zero-sum games**
+
+&#10230; Sıfır toplamı olmayan oyunlar
+
+<br>
+
+
+**124. Payoff matrix ― We define Vp(πA,πB) to be the utility for player p.**
+
+&#10230; Getiri dizeyi ― Vp(πA,πB)'yi oyuncu p'nin faydası olarak tanımlıyoruz.
+
+<br>
+
+
+**125. Nash equilibrium ― A Nash equilibrium is (π∗A,π∗B) such that no player has an incentive to change its strategy. We have:**
+
+&#10230; Nash dengesi ― Nash dengesi (π ∗ A, π ∗ B) öyle birşey ki hiçbir oyuncuyu, stratejisini değiştirmeye teşvik etmiyor:
+
+<br>
+
+
+**126. and**
+
+&#10230; ve
+
+<br>
+
+
+**127. Remark: in any finite-player game with finite number of actions, there exists at least one Nash equilibrium.**
+
+&#10230; Not: sonlu sayıda eylem olan herhangi bir sonlu oyunculu oyunda, en azından bir tane Nash denegesi mevcuttur.
+
+<br>
+
+
+**128. [Tree search, Backtracking search, Breadth-first search, Depth-first search, Iterative deepening]**
+
+&#10230; [Ağaç arama, Geri izleme araması, Genişlik-ilk arama, Derinlik-ilk arama, Tekrarlı (Iterative) derinleşme]
+
+<br>
+
+
+**129. [Graph search, Dynamic programming, Uniform cost search]**
+
+&#10230; [Çizge arama, Dinamik programlama, Tek tip maliyet araması]
+
+<br>
+
+
+**130. [Learning costs, Structured perceptron]**
+
+&#10230; [Öğrenme maliyetleri, Yapısal algılayıcı]
+
+<br>
+
+
+**131. [A star search, Heuristic function, Algorithm, Consistency, correctness, Admissibility, efficiency]**
+
+&#10230; [A yıldız arama, Sezgisel işlev, Algoritma, Tutarlılık, doğruluk, kabul edilebilirlik, verimlilik]
+
+<br>
+
+
+**132. [Relaxation, Relaxed search problem, Relaxed heuristic, Max heuristic]**
+
+&#10230; [Rahatlama, Rahat arama problemi, Rahat sezgisel, En yüksek sezgisel]
+
+<br>
+
+
+**133. [Markov decision processes, Overview, Policy evaluation, Value iteration, Transitions, rewards]**
+
+&#10230; [Markov karar süreçleri, Genel bakış, Politika değerlendirme, Değer yineleme, Geçişler, ödüller]
+
+<br>
+
+
+**134. [Game playing, Expectimax, Minimax, Speeding up minimax, Simultaneous games, Non-zero-sum games]**
+
+&#10230; [Oyun oynama, En yüksek beklenti, En küçük-en büyük, En küçük-en büyük hızlandırma, Eşzamanlı oyunlar, Sıfır toplamı olmayan oyunlar]
+
+<br>
+
+
+**135. View PDF version on GitHub**
+
+&#10230; GitHub'da PDF sürümünü görüntüleyin
+
+<br>
+
+
+**136. Original authors**
+
+&#10230; Asıl yazarlar
+
+<br>
+
+
+**137. Translated by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından tercüme edilmiştir.
+
+<br>
+
+
+**138. Reviewed by X, Y and Z**
+
+&#10230; X,Y,Z tarafından gözden geçirilmiştir.
+
+<br>
+
+
+**139. By X and Y**
+
+&#10230; X ve Y ile
+
+<br>
+
+
+**140. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230; Yapay Zeka el kitapları artık [hedef dilde] mevcuttur.

From 46c7b6a6857a5db253360730bf78791ad9044097 Mon Sep 17 00:00:00 2001
From: Cemal GURPINAR <36713268+cemalgurpinar@users.noreply.github.com>
Date: Sun, 11 Aug 2019 13:09:43 +0300
Subject: [PATCH 302/531] Update [tr] States-based models

I have updated the file according to your comment @basakbuluz and this time is your turn @shervinea :)

Best regards.

Cemal GURPINAR
---
 tr/cs-221-states-models.md | 48 +++++++++++++++++++-------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/tr/cs-221-states-models.md b/tr/cs-221-states-models.md
index 3d93b3f8c..dccab7884 100644
--- a/tr/cs-221-states-models.md
+++ b/tr/cs-221-states-models.md
@@ -39,7 +39,7 @@
 
 **6. [Self-loop, More than a parent, Cycle, More than a root, Valid tree]**
 
-&#10230; [Kendinden-Döngü, Bir ebeveynden daha fazlası, Çevrim, Bir kökten daha fazlası, Geçerli ağaç]
+&#10230; [Kendinden-Döngü(Self-loop), Bir ebeveynden (parent) daha fazlası, Çevrim, Bir kökten daha fazlası, Geçerli ağaç]
 
 <br>
 
@@ -60,28 +60,28 @@
 
 **9. Backtracking search ― Backtracking search is a naive recursive algorithm that tries all possibilities to find the minimum cost path. Here, action costs can be either positive or negative.**
 
-&#10230; Geri izleme araması ― Geri izleme araması, asgari (minimum) maliyet yolunu bulmak için tüm olasılıkları deneyen saf bir özyinelemeli algoritmadır. Burada, eylem maliyetleri pozitif ya da negatif olabilir.
+&#10230; Geri izleme araması ― Geri izleme araması, asgari (minimum) maliyet yolunu bulmak için tüm olasılıkları deneyen saf (naive) bir özyinelemeli algoritmadır. Burada, eylem maliyetleri pozitif ya da negatif olabilir.
 
 <br>
 
 
 **10. Breadth-first search (BFS) ― Breadth-first search is a graph search algorithm that does a level-by-level traversal. We can implement it iteratively with the help of a queue that stores at each step future nodes to be visited. For this algorithm, we can assume action costs to be equal to a constant c⩾0.**
 
-&#10230; Genişlik-ilk arama (BFS) ― Genişlik-ilk arama, seviye seviye arama yapan bir çizge arama algoritmasıdır. Gelecekte her adımda ziyaret edilecek düğümleri tutan bir kuyruk yardımıyla yinelemeli olarak gerçekleyebiliriz. Bu algoritma için, eylem maliyetlerinin belirli bir sabite c⩾0 eşit olduğunu kabul edebiliriz.
+&#10230; Genişlik öncelikli arama (Breadth-first search-BFS) ― Genişlik öncelikli arama, seviye seviye arama yapan bir çizge arama algoritmasıdır. Gelecekte her adımda ziyaret edilecek düğümleri tutan bir kuyruk yardımıyla yinelemeli olarak gerçekleyebiliriz. Bu algoritma için, eylem maliyetlerinin belirli bir sabite c⩾0 eşit olduğunu kabul edebiliriz.
 
 <br>
 
 
 **11. Depth-first search (DFS) ― Depth-first search is a search algorithm that traverses a graph by following each path as deep as it can. We can implement it recursively, or iteratively with the help of a stack that stores at each step future nodes to be visited. For this algorithm, action costs are assumed to be equal to 0.**
 
-&#10230; Derinlik-ilk arama (DFS) ― Derinlik-ilk arama, her bir yolu olabildiğince derin bir şekilde takip ederek çizgeyi dolaşan bir arama algoritmasıdır. Bu algoritmayı, ziyaret edilecek gelecek düğümleri her adımda bir yığın yardımıyla saklayarak, yinelemeli (recursively) ya da tekrarlı (iteratively) olarak uygulayabiliriz. Bu algoritma için eylem maliyetlerinin 0 olduğu varsayılmaktadır.
+&#10230; Derinlik öncelikli arama (Depth-first search-DFS) ― Derinlik öncelikli arama, her bir yolu olabildiğince derin bir şekilde takip ederek çizgeyi dolaşan bir arama algoritmasıdır. Bu algoritmayı, ziyaret edilecek gelecek düğümleri her adımda bir yığın yardımıyla saklayarak, yinelemeli (recursively) ya da tekrarlı (iteratively) olarak uygulayabiliriz. Bu algoritma için eylem maliyetlerinin 0 olduğu varsayılmaktadır.
 
 <br>
 
 
 **12. Iterative deepening ― The iterative deepening trick is a modification of the depth-first search algorithm so that it stops after reaching a certain depth, which guarantees optimality when all action costs are equal. Here, we assume that action costs are equal to a constant c⩾0.**
 
-&#10230; Tekrarlı derinleşme ― Tekrarlı derinleşme hilesi, derinlik-ilk arama algoritmasının tadil edilmiş bir halidir, böylece belirli bir derinliğe ulaştıktan sonra durur, bu da tüm işlem maliyetleri eşit olduğunda en iyiliği (optimal) garanti eder. Burada, işlem maliyetlerinin sabit bir değere eşit olduğunu varsayıyoruz c⩾0.
+&#10230; Tekrarlı derinleşme ― Tekrarlı derinleşme hilesi, derinlik-ilk arama algoritmasının değiştirilmiş bir halidir, böylece belirli bir derinliğe ulaştıktan sonra durur, bu da tüm işlem maliyetleri eşit olduğunda en iyiliği (optimal) garanti eder. Burada, işlem maliyetlerinin c⩾0 gibi sabit bir değere eşit olduğunu varsayıyoruz.
 
 <br>
 
@@ -102,7 +102,7 @@
 
 **15. [Backtracking search, any, Breadth-first search, Depth-first search, DFS-Iterative deepening]**
 
-&#10230; [Geri izleme araması, herhangi bir şey, Genişlik-ilk arama, Derinlik-ilk arama, Derinlik-ilk arama - Tekrarlı derinleşme]
+&#10230; [Geri izleme araması, herhangi bir şey, Genişlik öncelikli arama, Derinlik öncelikli arama, DFS - Tekrarlı derinleşme]
 
 <br>
 
@@ -144,7 +144,7 @@
 
 **21. Dynamic programming ― Dynamic programming (DP) is a backtracking search algorithm with memoization (i.e. partial results are saved) whose goal is to find a minimum cost path from state s to an end state send. It can potentially have exponential savings compared to traditional graph search algorithms, and has the property to only work for acyclic graphs. For any given state s, the future cost is computed as follows:**
 
-&#10230; Dinamik programlama ― Dinamik programlama (DP), amacı s durumundan bitiş durumuna,send, kadar asgari(minimum) maliyet yolunu bulmak olan hatırlamalı (memoization) (başka bir deyişle kısmi sonuçlar kaydedilir) bir geri izleme (backtracking) arama algoritmasıdır. Geleneksel çizge arama algoritmalarına kıyasla üstel olarak tasarruf sağlayabilir ve yalnızca asiklik (çevrimsiz) çizgeler ile çalışma özelliğine sahiptir. Herhangi bir durum için gelecekteki maliyet aşağıdaki gibi hesaplanır:
+&#10230; Dinamik programlama ― Dinamik programlama (DP), amacı s durumundan bitiş durumu olan send'e kadar asgari(minimum) maliyet yolunu bulmak olan hatırlamalı (memoization) (başka bir deyişle kısmi sonuçlar kaydedilir) bir geri izleme (backtracking) arama algoritmasıdır. Geleneksel çizge arama algoritmalarına kıyasla üstel olarak tasarruf sağlayabilir ve yalnızca asiklik (çevrimsiz) çizgeler ile çalışma özelliğine sahiptir. Herhangi bir durum için gelecekteki maliyet aşağıdaki gibi hesaplanır:
 
 <br>
 
@@ -193,7 +193,7 @@
 
 **28. Uniform cost search ― Uniform cost search (UCS) is a search algorithm that aims at finding the shortest path from a state sstart to an end state send. It explores states s in increasing order of PastCost(s) and relies on the fact that all action costs are non-negative.**
 
-&#10230; Tek tip maliyet araması ― Tek tip maliyet araması (Uniform cost search - UCS) bir başlangıç durumu,Sstart, ile bir bitiş durumu,Send, arasındaki en kısa yolu bulmayı amaçlayan bir arama algoritmasıdır. Bu algoritma s durumlarını artan geçmiş maliyetlerine,PastCost(s), göre araştırır ve eylem maliyetlerinin negatif olmayacağı kuralına dayanır.
+&#10230; Tek tip maliyet araması ― Tek tip maliyet araması (Uniform cost search - UCS) bir başlangıç durumu olan Sstart, ile bir bitiş durumu olan Send arasındaki en kısa yolu bulmayı amaçlayan bir arama algoritmasıdır. Bu algoritma s durumlarını artan geçmiş maliyetleri olan PastCost(s)'a göre araştırır ve eylem maliyetlerinin negatif olmayacağı kuralına dayanır.
 
 <br>
 
@@ -214,7 +214,7 @@
 
 **31. Correctness theorem ― When a state s is popped from the frontier F and moved to explored set E, its priority is equal to PastCost(s) which is the minimum cost path from sstart to s.**
 
-&#10230; Doğruluk teoremi ― S durumu sıradaki (frontier) F'den çıkarılır ve daha önceden keşfedilmiş olan E kümesine taşınırsa, önceliği başlangıç durumundan,Sstart, s durumuna kadar asgari (minimum) maliyet yolu olan PastCost(s)'e eşittir.
+&#10230; Doğruluk teoremi ― S durumu sıradaki (frontier) F'den çıkarılır ve daha önceden keşfedilmiş olan E kümesine taşınırsa, önceliği başlangıç durumu olan Sstart'dan, s durumuna kadar asgari (minimum) maliyet yolu olan PastCost(s)'e eşittir.
 
 <br>
 
@@ -284,7 +284,7 @@
 
 **41. Heuristic function ― A heuristic is a function h over states s, where each h(s) aims at estimating FutureCost(s), the cost of the path from s to send.**
 
-&#10230; Sezgisel işlev ― Sezgisel, s durumu üzerinde işlem yapan bir h fonksiyonudur, burada her bir h(s), s ile send arasındaki yol maliyeti olan FutureCost(s)'yi tahmin etmeyi amaçlar.
+&#10230; Sezgisel işlev(Heuristic function) ― Sezgisel, s durumu üzerinde işlem yapan bir h fonksiyonudur, burada her bir h(s), s ile send arasındaki yol maliyeti olan FutureCost(s)'yi tahmin etmeyi amaçlar.
 
 <br>
 
@@ -326,7 +326,7 @@
 
 **47. Theorem ― Let h(s) be a given heuristic. We have:**
 
-&#10230; Teorem ― h(s) verilen sezgisel olsun ve:
+&#10230; Teorem ― h(s) sezgisel olsun ve:
 
 <br>
 
@@ -368,14 +368,14 @@
 
 **53. Relaxed search problem ― The relaxation of search problem P with costs Cost is noted Prel with costs Costrel, and satisfies the identity:**
 
-&#10230; Rahat arama problemi ― Cost maliyetli bir arama probleminin rahatlaması, Costrel maliyetli Prel ile ifade edilir ve kimliği (satisfy) karşılar:
+&#10230; Rahat arama problemi (Relaxed search problem) ― Cost maliyetli bir arama probleminin rahatlaması, Costrel maliyetli Prel ile ifade edilir ve kimliği karşılar (satisfies the identity) :
 
 <br>
 
 
 **54. Relaxed heuristic ― Given a relaxed search problem Prel, we define the relaxed heuristic h(s)=FutureCostrel(s) as the minimum cost path from s to an end state in the graph of costs Costrel(s,a).**
 
-&#10230; Rahat sezgisel ― Bir Prel rahat arama problemi verildiğinde, h(s)=FutureCostrel(s) rahat sezgisel eşitliğini Costrel(s,a) maliyet çizgesindeki s durumu ile bir bitiş durumu arasındaki asgari(minimum) maliyet yolu olarak tanımlarız.
+&#10230; Rahat sezgisel (Relaxed heuristic) ― Bir Prel rahat arama problemi verildiğinde, h(s)=FutureCostrel(s) rahat sezgisel eşitliğini Costrel(s,a) maliyet çizgesindeki s durumu ile bir bitiş durumu arasındaki asgari(minimum) maliyet yolu olarak tanımlarız.
 
 <br>
 
@@ -522,7 +522,7 @@
 
 **75. Optimal value ― The optimal value Vopt(s) of state s is defined as being the maximum value attained by any policy. It is computed as follows:**
 
-&#10230; En iyi değer ― S durumunun en iyi değeri,Vopt(s), herhangi bir politika ile elde edilen en yüksek değer olarak tanımlanmaktadır. En iyi değer aşağıdaki gibi hesaplanmaktadır:
+&#10230; En iyi değer ― S durumunun en iyi değeri olan Vopt(s), herhangi bir politika ile elde edilen en yüksek değer olarak tanımlanmaktadır. En iyi değer aşağıdaki gibi hesaplanmaktadır:
 
 <br>
 
@@ -536,14 +536,14 @@
 
 **77. Optimal policy ― The optimal policy πopt is defined as being the policy that leads to the optimal values. It is defined by:**
 
-&#10230; En iyi politika ― En iyi politika,πopt, en iyi değerlere götüren politika olarak tanımlanmaktadır. En iyi politika aşağıdaki gibi tanımlanmaktadır:
+&#10230; En iyi politika ― En iyi politika olan πopt, en iyi değerlere götüren politika olarak tanımlanmaktadır. En iyi politika aşağıdaki gibi tanımlanmaktadır:
 
 <br>
 
 
 **78. [Value iteration ― Value iteration is an algorithm that finds the optimal value Vopt as well as the optimal policy πopt. It is done as follows:, Initialization: for all states s, we have:, Iteration: for t from 1 to TVI, we have:, with]**
 
-&#10230; [Değer tekrarı(iteration) ― Değer tekrarı(iteration) en iyi politikanın,πopt, yanında en iyi değeri,Vopt, bulan bir algoritmadır. Değer tekrarı(iteration) aşağıdaki gibi yapılmaktadır:, İlklendirme: bütün s durumları için:, Tekrar: 1'den TVI'ya kadar her bir t için:, ile]
+&#10230; [Değer tekrarı(iteration) ― Değer tekrarı(iteration) en iyi politika olan πopt, yanında en iyi değeri Vopt'ı, bulan bir algoritmadır. Değer tekrarı(iteration) aşağıdaki gibi yapılmaktadır:, İlklendirme: bütün s durumları için:, Tekrar: 1'den TVI'ya kadar her bir t için:, ile]
 
 <br>
 
@@ -578,7 +578,7 @@
 
 **83. [# times (s,a,s′) occurs, and]**
 
-&#10230; [# (s,a,s′) gerçekleşme sayısı, ve]
+&#10230; [# kere (s,a,s′) gerçekleşme sayısı, ve]
 
 <br>
 
@@ -662,7 +662,7 @@
 
 **95. Epsilon-greedy ― The epsilon-greedy policy is an algorithm that balances exploration with probability ϵ and exploitation with probability 1−ϵ. For a given state s, the policy πact is computed as follows:**
 
-&#10230; Epsilon-açgözlü ― Epsilon-açgözlü politika, ϵ olasılıkla araştırmayı ve 1−ϵ olasılıkla sömürüyü dengeleyen bir algoritmadır. Her bir s durumu için, politika, πact, aşağıdaki şekilde hesaplanır:
+&#10230; Epsilon-açgözlü ― Epsilon-açgözlü politika, ϵ olasılıkla araştırmayı ve 1−ϵ olasılıkla sömürüyü dengeleyen bir algoritmadır. Her bir s durumu için, πact politikası aşağıdaki şekilde hesaplanır:
 
 <br>
 
@@ -711,14 +711,14 @@
 
 **102. [Types of policies ― There are two types of policies:, Deterministic policies, noted πp(s), which are actions that player p takes in state s., Stochastic policies, noted πp(s,a)∈[0,1], which are probabilities that player p takes action a in state s.]**
 
-&#10230; [Politika türleri ― İki tane politika türü vardır:, Belirlenimci politikalar, πp(s) olarak gösterilir, p oyuncusunun s durumunda gerçekleştirdiği eylemler., Olasılıksal politikalar, πp(s,a)∈[0,1] olarak gösterilir, p oyuncusunun s durumunda a eylemini gerçekleştirme olasılıkları.]
+&#10230; [Politika türleri ― İki tane politika türü vardır:, πp(s) olarak gösterilen belirlenimci politikalar , p oyuncusunun s durumunda gerçekleştirdiği eylemler., πp(s,a)∈[0,1] olarak gösterilen olasılıksal politikalar, p oyuncusunun s durumunda a eylemini gerçekleştirme olasılıkları.]
 
 <br>
 
 
 **103. Expectimax ― For a given state s, the expectimax value Vexptmax(s) is the maximum expected utility of any agent policy when playing with respect to a fixed and known opponent policy πopp. It is computed as follows:**
 
-&#10230; En yüksek beklenen değer(Expectimax) ― Belirli bir s durumu için, en yüksek beklenen değer, Vexptmax(s), sabit ve bilinen bir rakip politikasına,πopp, göre oynarken, bir oyuncu politikasının en yüksek beklenen faydasıdır. En yüksek beklenen değer(Expectimax) aşağıdaki gibi hesaplanmaktadır:
+&#10230; En yüksek beklenen değer(Expectimax) ― Belirli bir s durumu için, en yüksek beklenen değer olan Vexptmax(s), sabit ve bilinen bir rakip politikası olan πopp'a göre oynarken, bir oyuncu politikasının en yüksek beklenen faydasıdır. En yüksek beklenen değer(Expectimax) aşağıdaki gibi hesaplanmaktadır:
 
 <br>
 
@@ -732,14 +732,14 @@
 
 **105. Minimax ― The goal of minimax policies is to find an optimal policy against an adversary by assuming the worst case, i.e. that the opponent is doing everything to minimize the agent's utility. It is done as follows:**
 
-&#10230; En küçük-en büyük (minimax) ― En küçük-enbüyük (minimax) politikaların amacı en kötü durumu kabul ederek, diğer bir deyişle, rakip, oyuncunun faydasını en aza indirmek için her şeyi yaparken, rakibe karşı en iyi politikayı bulmaktır. En küçük-en büyük(minimax) aşağıdaki şekilde yapılır:
+&#10230; En küçük-en büyük (minimax) ― En küçük-enbüyük (minimax) politikaların amacı en kötü durumu kabul ederek, diğer bir deyişle; rakip, oyuncunun faydasını en aza indirmek için her şeyi yaparken, rakibe karşı en iyi politikayı bulmaktır. En küçük-en büyük(minimax) aşağıdaki şekilde yapılır:
 
 <br>
 
 
 **106. Remark: we can extract πmax and πmin from the minimax value Vminimax.**
 
-&#10230; Not: πmax ve πmin değerleri, en küçük-en büyükten,Vminimax, elde edilebilir.
+&#10230; Not: πmax ve πmin değerleri, en küçük-en büyük olan Vminimax'dan elde edilebilir.
 
 <br>
 
@@ -865,7 +865,7 @@
 
 **124. Payoff matrix ― We define Vp(πA,πB) to be the utility for player p.**
 
-&#10230; Getiri dizeyi ― Vp(πA,πB)'yi oyuncu p'nin faydası olarak tanımlıyoruz.
+&#10230; Getiri matrisi ― Vp(πA,πB)'yi oyuncu p'nin faydası olarak tanımlıyoruz.
 
 <br>
 
@@ -893,7 +893,7 @@
 
 **128. [Tree search, Backtracking search, Breadth-first search, Depth-first search, Iterative deepening]**
 
-&#10230; [Ağaç arama, Geri izleme araması, Genişlik-ilk arama, Derinlik-ilk arama, Tekrarlı (Iterative) derinleşme]
+&#10230; [Ağaç arama, Geri izleme araması, Genişlik öncelikli arama, Derinlik öncelikli arama, Tekrarlı (Iterative) derinleşme]
 
 <br>
 

From c660e695ed4fa9c7c4e38f1ecf138d2c10ba72d2 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 13 Aug 2019 22:26:48 -0700
Subject: [PATCH 303/531] Update progress status

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 0ca9795a8..318e74849 100644
--- a/README.md
+++ b/README.md
@@ -46,7 +46,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**日本語**|not started|not started|not started|not started|
 |**한국어**|not started|not started|not started|not started|
 |**Português**|not started|not started|not started|not started|
-|**Türkçe**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/166)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/168)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/171)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/170)|
+|**Türkçe**|done|done|done|done|
 |**Tiếng Việt**|not started|not started|not started|not started|
 |**中文**|not started|not started|not started|not started|
 

From a4eedc121b0f7f0db42ef288d9b24e8faf0cba4c Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 13 Aug 2019 22:44:16 -0700
Subject: [PATCH 304/531] Add [tr] contributors

---
 CONTRIBUTORS | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index b544a257c..023786eae 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -130,6 +130,9 @@
   Kadir Tekeli (translation of linear algebra)
   Ekrem Çetinkaya (review of linear algebra)
   
+  Ayyüce Kızrak (translation of logic-based models)
+  Başak Buluz (review of logic-based models)
+  
   Seray Beşer (translation of machine learning tips and tricks)
   Ayyüce Kızrak (review of machine learning tips and tricks)
   Yavuz Kömeçoğlu (review of machine learning tips and tricks)
@@ -140,12 +143,21 @@
   Başak Buluz (translation of recurrent neural networks)
   Yavuz Kömeçoğlu (review of recurrent neural networks)
   
+  Yavuz Kömeçoğlu (translation of reflex-based models)
+  Ayyüce Kızrak (review of reflex-based models)
+  
+  Cemal Gurpinar (translation of states-based models)
+  Başak Buluz (review of states-based models)
+  
   Başak Buluz (translation of supervised learning)
   Ayyüce Kızrak (review of supervised learning)
   
   Yavuz Kömeçoğlu (translation of unsupervised learning)
   Başak Buluz (review of unsupervised learning)
   
+  Başak Buluz (translation of variables-based models)
+  Ayyüce Kızrak (review of variables-based models)
+  
 --uk
   Gregory Reshetniak (translation of probabilities and statistics)
   Denys (review of probabilities and statistics)

From 7078bf2710a4d9206af27f007d5923c015f6547c Mon Sep 17 00:00:00 2001
From: shervinea <shervine.amidi@centraliens.net>
Date: Wed, 14 Aug 2019 23:05:07 -0700
Subject: [PATCH 305/531] Add [fr] translation for states

---
 fr/cs-221-states-models.md | 980 +++++++++++++++++++++++++++++++++++++
 1 file changed, 980 insertions(+)
 create mode 100644 fr/cs-221-states-models.md

diff --git a/fr/cs-221-states-models.md b/fr/cs-221-states-models.md
new file mode 100644
index 000000000..4019c6f92
--- /dev/null
+++ b/fr/cs-221-states-models.md
@@ -0,0 +1,980 @@
+**States-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-states-models)
+
+<br>
+
+**1. States-based models with search optimization and MDP**
+
+&#10230; Modèles basés sur les états, utilisés pour optimiser le parcours et les MDPs
+
+<br>
+
+
+**2. Search optimization**
+
+&#10230; Optimisation de parcours
+
+<br>
+
+
+**3. In this section, we assume that by accomplishing action a from state s, we deterministically arrive in state Succ(s,a). The goal here is to determine a sequence of actions (a1,a2,a3,a4,...) that starts from an initial state and leads to an end state. In order to solve this kind of problem, our objective will be to find the minimum cost path by using states-based models.**
+
+&#10230; Dans cette section, nous supposons qu'en effectuant une action a à partir d'un état s, on arrive de manière déterministe à l'état Succ(s,a). Le but de cette étude est de déterminer une séquence d'actions (a1,a2,a3,a4,...) démarrant d'un état initial et aboutissant à un état final. Pour y parvenir, notre objectif est de minimiser le coût associés à ces actions à l'aide de modèles basés sur les états (state-based model en anglais).
+
+<br>
+
+
+**4. Tree search**
+
+&#10230; Parcours d'arbre
+
+<br>
+
+
+**5. This category of states-based algorithms explores all possible states and actions. It is quite memory efficient, and is suitable for huge state spaces but the runtime can become exponential in the worst cases.**
+
+&#10230; Cette catégorie d'algorithmes explore tous les états et actions possibles. Même si leur consommation en mémoire est raisonnable et peut supporter des espaces d'états de taille très grande, ce type d'algorithmes est néanmoins susceptible d'engendrer des complexités en temps exponentielles dans le pire des cas.
+
+<br>
+
+
+**6. [Self-loop, More than a parent, Cycle, More than a root, Valid tree]**
+
+&#10230; [Boucle, Plus d'un parent, Cycle, Plus d'une racine, Arbre valide]
+
+<br>
+
+
+**7. [Search problem ― A search problem is defined with:, a starting state sstart, possible actions Actions(s) from state s, action cost Cost(s,a) from state s with action a, successor Succ(s,a) of state s after action a, whether an end state was reached IsEnd(s)]**
+
+&#10230; [Problème de recherche - Un problème de recherche est défini par :, un état de départ sstart, des actions Actions(s) pouvant être effectuées depuis l'état s, le coût de l'action Cost(s,a) depuis l'état s pour effectuer l'action a, le successeur Succ(s,a) de l'état s après avoir effectué l'action a, la connaissance d'avoir atteint ou non un état final IsEnd(s)]
+
+<br>
+
+
+**8. The objective is to find a path that minimizes the cost.**
+
+&#10230; L'objectif est de trouver un chemin minimisant le coût total des actions utilisées.
+
+<br>
+
+
+**9. Backtracking search ― Backtracking search is a naive recursive algorithm that tries all possibilities to find the minimum cost path. Here, action costs can be either positive or negative.**
+
+&#10230; Retour sur trace - L'algorithme de retour sur trace (en anglais backtracking search) est un algorithme récursif explorant naïvement toutes les possibilités jusqu'à trouver le chemin de coût minimal.
+
+<br>
+
+
+**10. Breadth-first search (BFS) ― Breadth-first search is a graph search algorithm that does a level-by-level traversal. We can implement it iteratively with the help of a queue that stores at each step future nodes to be visited. For this algorithm, we can assume action costs to be equal to a constant c⩾0.**
+
+&#10230; Parcours en largeur (BFS) - L'algorithme de parcours en largeur (en anglais breadth-first search ou BFS) est un algorithme de parcours de graphe traversant chaque niveau de manière successive. On peut le coder de manière itérative à l'aide d'une queue stockant à chaque étape les prochains nœuds à visiter. Cet algorithme suppose que le coût de toutes les actions est égal à une constante c⩾0.
+
+<br>
+
+
+**11. Depth-first search (DFS) ― Depth-first search is a search algorithm that traverses a graph by following each path as deep as it can. We can implement it recursively, or iteratively with the help of a stack that stores at each step future nodes to be visited. For this algorithm, action costs are assumed to be equal to 0.**
+
+&#10230; Parcours en profondeur (DFS) - L'algorithme de parcours en profondeur (en anglais depth-first search ou DFS) est un algorithme de parcours de graphe traversant chaque chemin qu'il emprunte aussi loin que possible. On peut le coder de manière récursive, ou itérative à l'aide d'une pile qui stocke à chaque étape les prochains nœuds à visiter. Cet algorithme suppose que le coût de toutes les actions est égal à 0.
+
+<br>
+
+
+**12. Iterative deepening ― The iterative deepening trick is a modification of the depth-first search algorithm so that it stops after reaching a certain depth, which guarantees optimality when all action costs are equal. Here, we assume that action costs are equal to a constant c⩾0.**
+
+&#10230; Approfondissement itératif - L'astuce de l'approfondissement itératif (en anglais iterative deepening) est une modification de l'algorithme de DFS qui l'arrête après avoir atteint une certaine profondeur, garantissant l'optimalité de la solution trouvée quand toutes les actions ont un même coût constant c⩾0.
+
+<br>
+
+
+**13. Tree search algorithms summary ― By noting b the number of actions per state, d the solution depth, and D the maximum depth, we have:**
+
+&#10230; Récapitulatif des algorithmes de parcours d'arbre - En notant b le nombre d'actions par état, d la profondeur de la solution et D la profondeur maximale, on a :
+
+<br>
+
+
+**14. [Algorithm, Action costs, Space, Time]**
+
+&#10230; [Algorithme, Coût des actions, Espace, Temps]
+
+<br>
+
+
+**15. [Backtracking search, any, Breadth-first search, Depth-first search, DFS-Iterative deepening]**
+
+&#10230; [Retour sur trace, peu importe, Parcours en largeur, Parcours en profondeur, DFS-approfondissement itératif]
+
+<br>
+
+
+**16. Graph search**
+
+&#10230; Parcours de graphe
+
+<br>
+
+
+**17. This category of states-based algorithms aims at constructing optimal paths, enabling exponential savings. In this section, we will focus on dynamic programming and uniform cost search.**
+
+&#10230; Cette catégorie d'algorithmes basés sur les états vise à trouver des chemins optimaux avec une complexité moins grande qu'exponentielle. Dans cette section, nous allons nous concentrer sur la programmation dynamique et la recherche à coût uniforme.
+
+<br>
+
+
+**18. Graph ― A graph is comprised of a set of vertices V (also called nodes) as well as a set of edges E (also called links).**
+
+&#10230; Graphe - Un graphe se compose d'un ensemble de sommets V (aussi appelés noeuds) et d'arêtes E (appelés arcs lorsque le graphe est orienté).
+
+<br>
+
+
+**19. Remark: a graph is said to be acylic when there is no cycle.**
+
+&#10230; Remarque : un graphe est dit être acyclique lorsqu'il ne contient pas de cycle.
+
+<br>
+
+
+**20. State ― A state is a summary of all past actions sufficient to choose future actions optimally.**
+
+&#10230; État - Un état contient le résumé des actions passées suffisant pour choisir les actions futures de manière optimale.
+
+<br>
+
+
+**21. Dynamic programming ― Dynamic programming (DP) is a backtracking search algorithm with memoization (i.e. partial results are saved) whose goal is to find a minimum cost path from state s to an end state send. It can potentially have exponential savings compared to traditional graph search algorithms, and has the property to only work for acyclic graphs. For any given state s, the future cost is computed as follows:**
+
+&#10230; Programmation dynamique - La programmation dynamique (en anglais dynamic programming ou DP) est un algorithme de recherche de type retour sur trace qui utilise le principe de mémoïsation (i.e. les résultats intermédiaires sont enregistrés) et ayant pour but de trouver le chemin à coût minimal allant de l'état s à l'état final send. Cette procédure peut potentiellement engendrer des économies exponentielles si on la compare aux algorithmes de parcours de graphe traditionnels, et a la propriété de ne marcher que dans le cas de graphes acycliques. Pour un état s donné, le coût futur est calculé de la manière suivante :
+
+<br>
+
+
+**22. [if, otherwise]**
+
+&#10230; [si, sinon]
+
+<br>
+
+
+**23. Remark: the figure above illustrates a bottom-to-top approach whereas the formula provides the intuition of a top-to-bottom problem resolution.**
+
+&#10230; Remarque : la figure ci-dessus illustre une approche ascendante alors que la formule nous donne l'intuition d'une résolution avec une approche descendante.
+
+<br>
+
+
+**24. Types of states ― The table below presents the terminology when it comes to states in the context of uniform cost search:**
+
+&#10230; Types d'états - La table ci-dessous présente la terminologie relative aux états dans le contexte de la recherche à coût uniforme :
+
+<br>
+
+
+**25. [State, Explanation]**
+
+&#10230; [État, Explication]
+
+<br>
+
+
+**26. [Explored, Frontier, Unexplored]**
+
+&#10230; [Exploré, Frontière, Inexploré]
+
+<br>
+
+
+**27. [States for which the optimal path has already been found, States seen for which we are still figuring out how to get there with the cheapest cost, States not seen yet]**
+
+&#10230; [États pour lesquels le chemin optimal a déjà été trouvé, États rencontrés mais pour lesquels on se demande toujours comment s'y rendre avec un coût minimal, États non rencontrés jusqu'à présent]
+
+<br>
+
+
+**28. Uniform cost search ― Uniform cost search (UCS) is a search algorithm that aims at finding the shortest path from a state sstart to an end state send. It explores states s in increasing order of PastCost(s) and relies on the fact that all action costs are non-negative.**
+
+&#10230; Recherche à coût uniforme - La recherche à coût uniforme (uniform cost search ou UCS en anglais) est un algorithme de recherche qui a pour but de trouver le chemin le plus court entre les états sstart et send. Celui-ci explore les états s en les triant par coût croissant de PastCost(s) et repose sur le fait que toutes les actions ont un coût non négatif.
+
+<br>
+
+
+**29. Remark 1: the UCS algorithm is logically equivalent to Dijkstra's algorithm.**
+
+&#10230; Remarque 1 : UCS fonctionne de la même manière que l'algorithme de Dijkstra.
+
+<br>
+
+
+**30. Remark 2: the algorithm would not work for a problem with negative action costs, and adding a positive constant to make them non-negative would not solve the problem since this would end up being a different problem.**
+
+&#10230; Remarque 2 : cet algorithme ne marche pas sur une configuration contenant des actions à coût négatif. Quelqu'un pourrait penser à ajouter une constante positive à tous les coûts, mais cela ne résoudrait rien puisque le problème résultant serait différent.
+
+<br>
+
+
+**31. Correctness theorem ― When a state s is popped from the frontier F and moved to explored set E, its priority is equal to PastCost(s) which is the minimum cost path from sstart to s.**
+
+&#10230; Théorème de correction - Lorsqu'un état s passe de la frontière F à l'ensemble exploré E, sa priorité est égale à PastCost(s), représentant le chemin de coût minimal allant de sstart à s.
+
+<br>
+
+
+**32. Graph search algorithms summary ― By noting N the number of total states, n of which are explored before the end state send, we have:**
+
+&#10230; Récapitulatif des algorithmes de parcours de graphe - En notant N le nombre total d'états dont n sont explorés avant l'état final send, on a :
+
+<br>
+
+
+**33. [Algorithm, Acyclicity, Costs, Time/space]**
+
+&#10230; [Algorithme, Acyclicité, Coûts, Temps/Espace]
+
+<br>
+
+
+**34. [Dynamic programming, Uniform cost search]**
+
+&#10230; [Programmation dynamique, Recherche à coût uniforme]
+
+<br>
+
+
+**35. Remark: the complexity countdown supposes the number of possible actions per state to be constant.**
+
+&#10230; Remarque : ce décompte de la complexité suppose que le nombre d'actions possibles à partir de chaque état est constant.
+
+<br>
+
+
+**36. Learning costs**
+
+&#10230; Apprendre les coûts
+
+<br>
+
+
+**37. Suppose we are not given the values of Cost(s,a), we want to estimate these quantities from a training set of minimizing-cost-path sequence of actions (a1,a2,...,ak).**
+
+&#10230; Supposons que nous ne sommes pas donnés les valeurs de Cost(s,a). Nous souhaitons estimer ces quantités à partir d'un ensemble d'apprentissage de chemins à coût minimaux d'actions (a1,a2,...,ak).
+
+<br>
+
+
+**38. [Structured perceptron ― The structured perceptron is an algorithm aiming at iteratively learning the cost of each state-action pair. At each step, it:, decreases the estimated cost of each state-action of the true minimizing path y given by the training data, increases the estimated cost of each state-action of the current predicted path y' inferred from the learned weights.]**
+
+&#10230; [Perceptron structuré - L'algorithme du perceptron structuré vise à apprendre de manière itérative les coûts des paires état-action. À chaque étape, il :, fait décroître le coût estimé de chaque état-action du vrai chemin minimisant y donné par la base d'apprentissage, fait croître le coût estimé de chaque état-action du chemin y' prédit comme étant minimisant par les paramètres appris par l'algorithme.]
+
+<br>
+
+
+**39. Remark: there are several versions of the algorithm, one of which simplifies the problem to only learning the cost of each action a, and the other parametrizes Cost(s,a) to a feature vector of learnable weights.**
+
+&#10230; Remarque : plusieurs versions de cette algorithme existent, l'une d'elles réduisant ce problème à l'apprentissage du coût de chaque action a et l'autre paramétrisant chaque Cost(s,a) à un vecteur de paramètres pouvant être appris.
+
+<br>
+
+
+**40. A* search**
+
+&#10230; Algorithme A*
+
+<br>
+
+
+**41. Heuristic function ― A heuristic is a function h over states s, where each h(s) aims at estimating FutureCost(s), the cost of the path from s to send.**
+
+&#10230; Fonction heuristique - Une heuristique est une fonction h opérant sur les états s, où chaque h(s) vise à estimer FutureCost(s), le coût du chemin optimal allant de s à send.
+
+<br>
+
+
+**42. Algorithm ― A∗ is a search algorithm that aims at finding the shortest path from a state s to an end state send. It explores states s in increasing order of PastCost(s)+h(s). It is equivalent to a uniform cost search with edge costs Cost′(s,a) given by:**
+
+&#10230; Algorithme - A* est un algorithme de recherche visant à trouver le chemin le plus court entre un état s et un état final send. Il le fait en explorant les états s triés par ordre croissant de PastCost(s)+h(s). Cela revient à utiliser l'algorithme UCS où chaque arête est associée au coût Cost′(s,a) donné par :
+
+<br>
+
+
+**43. Remark: this algorithm can be seen as a biased version of UCS exploring states estimated to be closer to the end state.**
+
+&#10230; Remarque : cet algorithme peut être vu comme une version biaisée de UCS explorant les états estimés comme étant plus proches de l'état final.
+
+<br>
+
+
+**44. [Consistency ― A heuristic h is said to be consistent if it satisfies the two following properties:, For all states s and actions a, The end state verifies the following:]**
+
+&#10230; [Consistance - Une heuristique h est dite consistante si elle satisfait les deux propriétés suivantes :, Pour tous états s et actions a, L'état final vérifie la propriété :]
+
+<br>
+
+
+**45. Correctness ― If h is consistent, then A∗ returns the minimum cost path.**
+
+&#10230; Correction - Si h est consistante, alors A* renvoie le chemin de coût minimal.
+
+<br>
+
+
+**46. Admissibility ― A heuristic h is said to be admissible if we have:**
+
+&#10230; Admissibilité - Une heuristique est dite admissible si l'on a :
+
+<br>
+
+
+**47. Theorem ― Let h(s) be a given heuristic. We have:**
+
+&#10230; Théorème - Soit h(s) une heuristique. On a :
+
+<br>
+
+
+**48. [consistent, admissible]**
+
+&#10230; [consistante, admissible]
+
+<br>
+
+
+**49. Efficiency ― A* explores all states s satisfying the following equation:**
+
+&#10230; Efficacité - A* explore les états s satisfaisant l'équation :
+
+<br>
+
+
+**50. Remark: larger values of h(s) is better as this equation shows it will restrict the set of states s going to be explored.**
+
+&#10230; Remarque : avoir h(s) élevé est préférable puisque cette équation montre que le nombre d'états s à explorer est alors réduit.
+
+<br>
+
+
+**51. Relaxation**
+
+&#10230; Relaxation
+
+<br>
+
+
+**52. It is a framework for producing consistent heuristics. The idea is to find closed-form reduced costs by removing constraints and use them as heuristics.**
+
+&#10230; C'est un type de procédure permettant de produire des heuristiques consistantes. L'idée est de trouver une fonction de coût facile à exprimer en enlevant des contraintes au problème, et ensuite l'utiliser en tant qu'heuristique.
+
+<br>
+
+
+**53. Relaxed search problem ― The relaxation of search problem P with costs Cost is noted Prel with costs Costrel, and satisfies the identity:**
+
+&#10230; Relaxation d'un problème de recherche - La relaxation d'un problème de recherche P aux coûts Cost est noté Prel avec coûts Costrel, et vérifie la relation :
+
+<br>
+
+
+**54. Relaxed heuristic ― Given a relaxed search problem Prel, we define the relaxed heuristic h(s)=FutureCostrel(s) as the minimum cost path from s to an end state in the graph of costs Costrel(s,a).**
+
+&#10230; Relaxation d'une heuristique - Étant donné la relaxation d'un problème de recherche Prel, on définit l'heuristique relaxée h(s)=FutureCostrel(s) comme étant le chemin de coût minimal allant de s à un état final dans le graphe de fonction de coût Costrel(s,a).
+
+<br>
+
+
+**55. Consistency of relaxed heuristics ― Let Prel be a given relaxed problem. By theorem, we have:**
+
+&#10230; Consistance de la relaxation d'heuristiques - Soit Prel une relaxation d'un problème de recherche. Par théorème, on a :
+
+<br>
+
+
+**56. consistent**
+
+&#10230; consistante
+
+<br>
+
+
+**57. [Tradeoff when choosing heuristic ― We have to balance two aspects in choosing a heuristic:, Computational efficiency: h(s)=FutureCostrel(s) must be easy to compute. It has to produce a closed form, easier search and independent subproblems., Good enough approximation: the heuristic h(s) should be close to FutureCost(s) and we have thus to not remove too many constraints.]**
+
+&#10230; [Compromis lors du choix d'heuristique - Le choix d'heuristique se repose sur un compromis entre :, Complexité de calcul : h(s)=FutureCostrel(s) doit être facile à calculer. De manière préférable, cette fonction peut s'exprimer de manière explicite et elle permet de diviser le problème en sous-parties indépendantes.]
+
+<br>
+
+
+**58. Max heuristic ― Let h1(s), h2(s) be two heuristics. We have the following property:**
+
+&#10230; Heuristique max - Soient h1(s) et h2(s) deux heuristiques. On a la propriété suivante :
+
+<br>
+
+
+**59. Markov decision processes**
+
+&#10230; Processus de décision markovien
+
+<br>
+
+
+**60. In this section, we assume that performing action a from state s can lead to several states s′1,s′2,... in a probabilistic manner. In order to find our way between an initial state and an end state, our objective will be to find the maximum value policy by using Markov decision processes that help us cope with randomness and uncertainty.**
+
+&#10230; Dans cette section, on suppose qu'effectuer l'action a à partir de l'état s peut mener de manière probabiliste à plusieurs états s′1,s′2,... Dans le but de trouver ce qu'il faudrait faire entre un état initial et un état final, on souhaite trouver une stratégie maximisant la quantité des récompenses en utilisant un outil adapté à l'imprévisibilité et l'incertitude : les processus de décision markoviens.
+
+<br>
+
+
+**61. Notations**
+
+&#10230; Notations
+
+<br>
+
+
+**62. [Definition ― The objective of a Markov decision process is to maximize rewards. It is defined with:, a starting state sstart, possible actions Actions(s) from state s, transition probabilities T(s,a,s′) from s to s′ with action a, rewards Reward(s,a,s′) from s to s′ with action a, whether an end state was reached IsEnd(s), a discount factor 0⩽γ⩽1]**
+
+&#10230; [Définition - L'objectif d'un processus de décision markovien (en anglais Markov decision process ou MDP) est de maximiser la quantité de récompenses. Un tel problème est défini par :, un état de départ sstart, l'ensemble des actions Actions(s) pouvant être effectuées à partir de l'état s, la probabilité de transition T(s,a,s′) de l'état s vers l'état s' après avoir pris l'action a, la récompense Reward(s,a,s′) pour être passé de l'état s à l'état s' après avoir pris l'action a, la connaissance d'avoir atteint ou non un état final IsEnd(s), un facteur de dévaluation 0⩽γ⩽1]
+
+<br>
+
+
+**63. Transition probabilities ― The transition probability T(s,a,s′) specifies the probability of going to state s′ after action a is taken in state s. Each s′↦T(s,a,s′) is a probability distribution, which means that:**
+
+&#10230; Probabilités de transitions - La probabilité de transition T(s,a,s′) représente la probabilité de transitionner vers l'état s' après avoir effectué l'action a en étant dans l'état s. Chaque s′↦T(s,a,s′) est une loi de probabilité :
+
+<br>
+
+
+**64. states**
+
+&#10230; états
+
+<br>
+
+
+**65. Policy ― A policy π is a function that maps each state s to an action a, i.e.**
+
+&#10230; Politique - Une politique π est une fonction liant chaque état s à une action a, i.e. :
+
+<br>
+
+
+**66. Utility ― The utility of a path (s0,...,sk) is the discounted sum of the rewards on that path. In other words,**
+
+&#10230; Utilité - L'utilité d'un chemin (s0,...,sk) est la somme des récompenses dévaluées récoltées sur ce chemin. En d'autres termes,
+
+<br>
+
+
+**67. The figure above is an illustration of the case k=4.**
+
+&#10230; La figure ci-dessus illustre le cas k=4.
+
+<br>
+
+
+**68. Q-value ― The Q-value of a policy π at state s with action a, also noted Qπ(s,a), is the expected utility from state s after taking action a and then following policy π. It is defined as follows:**
+
+&#10230; Q-value - La fonction de valeur des états-actions (Q-value en anglais) d'une politique π évaluée à l'état s avec l'action a, aussi notée Qπ(s,a), est l'espérance de l'utilité partant de l'état s avec l'action a et adoptant ensuite la politique π. Cette fonction est définie par :
+
+<br>
+
+
+**69. Value of a policy ― The value of a policy π from state s, also noted Vπ(s), is the expected utility by following policy π from state s over random paths. It is defined as follows:**
+
+&#10230; Fonction de valeur des états d'une politique - La fonction de valeur des états d'une politique π évaluée à l'état s, aussi notée Vπ(s), est l'espérance de l'utilité partant de l'état s et adoptant ensuite la politique π. Cette fonction est définie par :
+
+<br>
+
+
+**70. Remark: Vπ(s) is equal to 0 if s is an end state.**
+
+&#10230; Remarque : Vπ(s) vaut 0 si s est un état final.
+
+<br>
+
+
+**71. Applications**
+
+&#10230; Applications
+
+<br>
+
+
+**72. [Policy evaluation ― Given a policy π, policy evaluation is an iterative algorithm that aims at estimating Vπ. It is done as follows:, Initialization: for all states s, we have:, Iteration: for t from 1 to TPE, we have, with]**
+
+&#10230; [Évaluation d'une politique - Étant donnée une politique π, on peut utiliser l'algorithme itératif d'évaluation de politiques (en anglais policy evaluation) pour estimer Vπ :, Initialisation : pour tous les états s, on a, Itération : pour t allant de 1 à TPE, on a, avec]
+
+<br>
+
+
+**73. Remark: by noting S the number of states, A the number of actions per state, S′ the number of successors and T the number of iterations, then the time complexity is of O(TPESS′).**
+
+&#10230; Remarque : en notant S le nombre d'états, A le nombre d'actions par états, S' le nombre de successeurs et T le nombre d'itérations, la complexité en temps est alors de O(TPESS′).
+
+<br>
+
+
+**74. Optimal Q-value ― The optimal Q-value Qopt(s,a) of state s with action a is defined to be the maximum Q-value attained by any policy starting. It is computed as follows:**
+
+&#10230; Q-value optimale - La Q-value optimale Qopt(s,a) d'un état s avec l'action a est définie comme étant la Q-value maximale atteinte avec n'importe quelle politique. Elle est calculée avec la formule :
+
+<br>
+
+
+**75. Optimal value ― The optimal value Vopt(s) of state s is defined as being the maximum value attained by any policy. It is computed as follows:**
+
+&#10230; Valeur optimale - La valeur optimale Vopt(s) d'un état s est définie comme étant la valeur maximum atteinte par n'importe quelle politique. Elle est calculée avec la formule :
+
+<br>
+
+
+**76. actions**
+
+&#10230; actions
+
+<br>
+
+
+**77. Optimal policy ― The optimal policy πopt is defined as being the policy that leads to the optimal values. It is defined by:**
+
+&#10230; Politique optimale - La politique optimale πopt est définie comme étant la politique liée aux valeurs optimales. Elle est définie par :
+
+<br>
+
+
+**78. [Value iteration ― Value iteration is an algorithm that finds the optimal value Vopt as well as the optimal policy πopt. It is done as follows:, Initialization: for all states s, we have:, Iteration: for t from 1 to TVI, we have:, with]**
+
+&#10230; [Itération sur la valeur - L'algorithme d'itération sur la valeur (en anglais value iteration) vise à trouver la valeur optimale Vopt ainsi que la politique optimale πopt en deux temps :, Initialisation : pour tout état s, on a, Itération : pour t allant de 1 à TVI, on a, avec]
+
+<br>
+
+
+**79. Remark: if we have either γ<1 or the MDP graph being acyclic, then the value iteration algorithm is guaranteed to converge to the correct answer.**
+
+&#10230; Remarque : si γ<1 ou si le graphe associé au processus de décision markovien est acyclique, alors l'algorithme d'itération sur la valeur est garanti de converger vers la bonne solution.
+
+<br>
+
+
+**80. When unknown transitions and rewards**
+
+&#10230; Cas des transitions et récompenses inconnues
+
+<br>
+
+
+**81. Now, let's assume that the transition probabilities and the rewards are unknown.**
+
+&#10230; On suppose maintenant que les probabilités de transition et les récompenses sont inconnues.
+
+<br>
+
+
+**82. Model-based Monte Carlo ― The model-based Monte Carlo method aims at estimating T(s,a,s′) and Reward(s,a,s′) using Monte Carlo simulation with:**
+
+&#10230; Monte-Carlo basé sur modèle - La méthode de Monte-Carlo basée sur modèle (en anglais model-based Monte Carlo) vise à estimer T(s,a,s′) et Reward(s,a,s′) en utilisant des simulations de Monte-Carlo avec :
+
+<br>
+
+
+**83. [# times (s,a,s′) occurs, and]**
+
+&#10230; [# de fois où (s,a,s') se produit]
+
+<br>
+
+
+**84. These estimations will be then used to deduce Q-values, including Qπ and Qopt.**
+
+&#10230; Ces estimations sont ensuite utilisées pour trouver les Q-values, ainsi que Qπ et Qopt.
+
+<br>
+
+
+**85. Remark: model-based Monte Carlo is said to be off-policy, because the estimation does not depend on the exact policy.**
+
+&#10230; Remarque : la méthode de Monte-Carlo basée sur modèle est dite "hors politique" (en anglais "off-policy") car l'estimation produite ne dépend pas de la politique utilisée.
+
+<br>
+
+
+**86. Model-free Monte Carlo ― The model-free Monte Carlo method aims at directly estimating Qπ, as follows:**
+
+&#10230; Monte-Carlo sans modèle - La méthode de Monte-Carlo sans modèle (en anglais model-free Monte Carlo) vise à directement estimer Qπ de la manière suivante :
+
+<br>
+
+
+**87. Qπ(s,a)=average of ut where st−1=s,at=a**
+
+&#10230; Qπ(s,a)=moyenne de ut où st−1=s,at=a
+
+<br>
+
+
+**88. where ut denotes the utility starting at step t of a given episode.**
+
+&#10230; où ut désigne l'utilité à partir de l'étape t d'un épisode donné.
+
+<br>
+
+
+**89. Remark: model-free Monte Carlo is said to be on-policy, because the estimated value is dependent on the policy π used to generate the data.**
+
+&#10230; Remarque : la méthode de Monte-Carlo sans modèle est dite "sur politique" (en anglais "on-policy") car l'estimation produite dépend de la politique π utilisée pour générer les données.
+
+<br>
+
+
+**90. Equivalent formulation - By introducing the constant η=11+(#updates to (s,a)) and for each (s,a,u) of the training set, the update rule of model-free Monte Carlo has a convex combination formulation:**
+
+&#10230; Formulation équivalente - En introduisant la constante η=11+(#mises à jour à (s,a)) et pour chaque triplet (s,a,u) de la base d'apprentissage, la formule de récurrence de la méthode de Monte-Carlo sans modèle s'écrit à l'aide de la combinaison convexe :
+
+<br>
+
+
+**91. as well as a stochastic gradient formulation:**
+
+&#10230; ainsi qu'une formulation mettant en valeur une sorte de gradient :
+
+<br>
+
+
+**92. SARSA ― State-action-reward-state-action (SARSA) is a boostrapping method estimating Qπ by using both raw data and estimates as part of the update rule. For each (s,a,r,s′,a′), we have:**
+
+&#10230; SARSA - État-action-récompense-état-action (en anglais state-action-reward-state-action ou SARSA) est une méthode de bootstrap qui estime Qπ en utilisant à la fois des données réelles et estimées dans sa formule de mise à jour. Pour chaque (s,a,r,s′,a′), on a :
+
+<br>
+
+
+**93. Remark: the SARSA estimate is updated on the fly as opposed to the model-free Monte Carlo one where the estimate can only be updated at the end of the episode.**
+
+&#10230; Remarque : l'estimation donnée par SARSA est mise à jour à la volée contrairement à celle donnée par la méthode de Monte-Carlo sans modèle où la mise à jour est uniquement effectuée à la fin de l'épisode.
+
+<br>
+
+
+**94. Q-learning ― Q-learning is an off-policy algorithm that produces an estimate for Qopt. On each (s,a,r,s′,a′), we have:**
+
+&#10230; Q-learning - Le Q-apprentissage (en anglais Q-learning) est un algorithme hors politique (en anglais off-policy) donnant une estimation de Qopt. Pour chaque (s,a,r,s′,a′), on a :
+
+<br>
+
+
+**95. Epsilon-greedy ― The epsilon-greedy policy is an algorithm that balances exploration with probability ϵ and exploitation with probability 1−ϵ. For a given state s, the policy πact is computed as follows:**
+
+&#10230; Epsilon-glouton - La politique epsilon-gloutonne (en anglais epsilon-greedy) est un algorithme essayant de trouver un compromis entre l'exploration avec probabilité ϵ et l'exploitation avec probabilité 1-ϵ. Pour un état s, la politique πact est calculée par :
+
+<br>
+
+
+**96. [with probability, random from Actions(s)]**
+
+&#10230; [avec probabilité, aléatoire venant d'Actions(s)]
+
+<br>
+
+
+**97. Game playing**
+
+&#10230; Jeux
+
+<br>
+
+
+**98. In games (e.g. chess, backgammon, Go), other agents are present and need to be taken into account when constructing our policy.**
+
+&#10230; Dans les jeux (e.g. échecs, backgammon, Go), d'autres agents sont présents et doivent être pris en compte au moment d'élaborer une politique.
+
+<br>
+
+
+**99. Game tree ― A game tree is a tree that describes the possibilities of a game. In particular, each node is a decision point for a player and each root-to-leaf path is a possible outcome of the game.**
+
+&#10230; Arbre de jeu - Un arbre de jeu est un arbre détaillant toutes les issues possibles d'un jeu. En particulier, chaque noeud représente un point de décision pour un joueur et chaque chemin liant la racine à une des feuilles traduit une possible instance du jeu.
+
+<br>
+
+
+**100. [Two-player zero-sum game ― It is a game where each state is fully observed and such that players take turns. It is defined with:, a starting state sstart, possible actions Actions(s) from state s, successors Succ(s,a) from states s with actions a, whether an end state was reached IsEnd(s), the agent's utility Utility(s) at end state s, the player Player(s) who controls state s]**
+
+&#10230; [Jeu à somme nulle à deux joueurs - C'est un type jeu où chaque état est entièrement observé et où les joueurs jouent de manière successive. On le définit par :, un état de départ sstart, de possibles actions Actions(s) partant de l'état s, du successeur Succ(s,a) l'état s après avoir effectué l'action a, la connaissance d'avoir atteint ou non un état final IsEnd(s), l'utilité de l'agent Utility(s) à l'état final s, le joueur Player(s) qui contrôle l'état s]
+
+<br>
+
+
+**101. Remark: we will assume that the utility of the agent has the opposite sign of the one of the opponent.**
+
+&#10230; Remarque : nous assumerons que l'utilité de l'agent a le signe opposé de celui de son adversaire.
+
+<br>
+
+
+**102. [Types of policies ― There are two types of policies:, Deterministic policies, noted πp(s), which are actions that player p takes in state s., Stochastic policies, noted πp(s,a)∈[0,1], which are probabilities that player p takes action a in state s.]**
+
+&#10230; [Types de politiques - Il y a deux types de politiques :, Les politiques déterministes, notées πp(s), qui représentent pour tout s l'action que le joueur p prend dans l'état s., Les politiques stochastiques, notées πp(s,a)∈[0,1], qui sont décrites pour tout s et a par la probabilité que le joueur p prenne l'action a dans l'état s.]
+
+<br>
+
+
+**103. Expectimax ― For a given state s, the expectimax value Vexptmax(s) is the maximum expected utility of any agent policy when playing with respect to a fixed and known opponent policy πopp. It is computed as follows:**
+
+&#10230; Expectimax - Pour un état donné s, la valeur d'expectimax Vexptmax(s) est l'utilité maximum sur l'ensemble des politiques utilisées par l'agent lorsque celui-ci joue avec un adversaire de politique connue πopp. Cette valeur est calculée de la manière suivante :
+
+<br>
+
+
+**104. Remark: expectimax is the analog of value iteration for MDPs.**
+
+&#10230; Remarque : expectimax est l'analogue de l'algorithme d'itération sur la valeur pour les MDPs.
+
+<br>
+
+
+**105. Minimax ― The goal of minimax policies is to find an optimal policy against an adversary by assuming the worst case, i.e. that the opponent is doing everything to minimize the agent's utility. It is done as follows:**
+
+&#10230; Minimax - Le but des politiques minimax est de trouver une politique optimale contre un adversaire que l'on assume effectuer toutes les pires actions, i.e. toutes celles qui minimisent l'utilité de l'agent. La valeur correspondante est calculée par :
+
+<br>
+
+
+**106. Remark: we can extract πmax and πmin from the minimax value Vminimax.**
+
+&#10230; Remarque : on peut déduire πmax et πmin à partir de la valeur minimax Vminimax.
+
+<br>
+
+
+**107. Minimax properties ― By noting V the value function, there are 3 properties around minimax to have in mind:**
+
+&#10230; Propriétés de minimax - En notant V la fonction de valeur, il y a 3 propriétés sur minimax qu'il faut avoir à l'esprit :
+
+<br>
+
+
+**108. Property 1: if the agent were to change its policy to any πagent, then the agent would be no better off.**
+
+&#10230; Propriété 1 : si l'agent changeait sa politique en un quelconque πagent, alors il ne s'en sortirait pas mieux.
+
+<br>
+
+
+**109. Property 2: if the opponent changes its policy from πmin to πopp, then he will be no better off.**
+
+&#10230; Propriété 2 : si son adversaire change sa politique de πmin à πopp, alors il ne s'en sortira pas mieux.
+
+<br>
+
+
+**110. Property 3: if the opponent is known to be not playing the adversarial policy, then the minimax policy might not be optimal for the agent.**
+
+&#10230; Propriété 3 : si l'on sait que son adversaire ne joue pas les pires actions possibles, alors la politique minimax peut ne pas être optimale pour l'agent.
+
+<br>
+
+
+**111. In the end, we have the following relationship:**
+
+&#10230; À la fin, on a la relation suivante :
+
+<br>
+
+
+**112. Speeding up minimax**
+
+&#10230; Accélération de minimax
+
+<br>
+
+
+**113. Evaluation function ― An evaluation function is a domain-specific and approximate estimate of the value Vminimax(s). It is noted Eval(s).**
+
+&#10230; Fonction d'évaluation - Une fonction d'évaluation estime de manière approximative la valeur Vminimax(s) selon les paramètres du problème. Elle est notée Eval(s).
+
+<br>
+
+
+**114. Remark: FutureCost(s) is an analogy for search problems.**
+
+&#10230; Remarque : l'analogue de cette fonction utilisé dans les problèmes de recherche est FutureCost(s).
+
+<br>
+
+
+**115. Alpha-beta pruning ― Alpha-beta pruning is a domain-general exact method optimizing the minimax algorithm by avoiding the unnecessary exploration of parts of the game tree. To do so, each player keeps track of the best value they can hope for (stored in α for the maximizing player and in β for the minimizing player). At a given step, the condition β<α means that the optimal path is not going to be in the current branch as the earlier player had a better option at their disposal.**
+
+&#10230; Élagage alpha-bêta - L'élagage alpha-bêta (en anglais alpha-beta pruning) est une méthode exacte d'optimisation employée sur l'algorithme de minimax et a pour but d'éviter l'exploration de parties inutiles de l'arbre de jeu. Pour ce faire, chaque joueur garde en mémoire la meilleure valeur qu'il puisse espérer (appelée α chez le joueur maximisant et β chez le joueur minimisant). À une étape donnée, la condition β<α signifie que le chemin optimal ne peut pas passer par la branche actuelle puisque le joueur qui précédait avait une meilleure option à sa disposition.
+
+<br>
+
+
+**116. TD learning ― Temporal difference (TD) learning is used when we don't know the transitions/rewards. The value is based on exploration policy. To be able to use it, we need to know rules of the game Succ(s,a). For each (s,a,r,s′), the update is done as follows:**
+
+&#10230; TD learning - L'apprentissage par différence de temps (en anglais temporal difference learning ou TD learning) est une méthode utilisée lorsque l'on ne connait pas les transitions/récompenses. La valeur et alors basée sur la politique d'exploration. Pour pouvoir l'utiliser, on a besoin de connaître les règles du jeu Succ(s,a). Pour chaque (s,a,r,s′), la mise à jour des coefficients est faite de la manière suivante :
+
+<br>
+
+
+**117. Simultaneous games**
+
+&#10230; Jeux simultanés
+
+<br>
+
+
+**118. This is the contrary of turn-based games, where there is no ordering on the player's moves.**
+
+&#10230; Ce cas est opposé aux jeux joués tour à tour. Il n'y a pas d'ordre prédéterminé sur le mouvement du joueur.
+
+<br>
+
+
+**119. Single-move simultaneous game ― Let there be two players A and B, with given possible actions. We note V(a,b) to be A's utility if A chooses action a, B chooses action b. V is called the payoff matrix.**
+
+&#10230; Jeu simultané à un mouvement - Soient deux joueurs A et B, munis de possibles actions. On note V(a,b) l'utilité de A si A choisit l'action a et B l'action b. V est appelée la matrice de profit (en anglais payoff matrix).
+
+<br>
+
+
+**120. [Strategies ― There are two main types of strategies:, A pure strategy is a single action:, A mixed strategy is a probability distribution over actions:]**
+
+&#10230; [Stratégies - Il y a principalement deux types de stratégies :, Une stratégie pure est une seule action, Une stratégie mixte est une loi de probabilité sur les actions :]
+
+<br>
+
+
+**121. Game evaluation ― The value of the game V(πA,πB) when player A follows πA and player B follows πB is such that:**
+
+&#10230; Évaluation de jeu - La valeur d'un jeu V(πA,πB) quand le joueur A suit πA et le joueur B suit πB est telle que :
+
+<br>
+
+
+**122. Minimax theorem ― By noting πA,πB ranging over mixed strategies, for every simultaneous two-player zero-sum game with a finite number of actions, we have:**
+
+&#10230; Théorème Minimax - Soient πA et πB des stratégies mixtes. Pour chaque jeu à somme nulle à deux joueurs ayant un nombre fini d'actions, on a :
+
+<br>
+
+
+**123. Non-zero-sum games**
+
+&#10230; Jeux à somme non nulle
+
+<br>
+
+
+**124. Payoff matrix ― We define Vp(πA,πB) to be the utility for player p.**
+
+&#10230; Matrice de profit - On définit Vp(πA,πB) l'utilité du joueur p.
+
+<br>
+
+
+**125. Nash equilibrium ― A Nash equilibrium is (π∗A,π∗B) such that no player has an incentive to change its strategy. We have:**
+
+&#10230; Équilibre de Nash - Un équilibre de Nash est défini par (π∗A,π∗B) tel qu'aucun joueur n'a d'intérêt de changer sa stratégie. On a :
+
+<br>
+
+
+**126. and**
+
+&#10230; et
+
+<br>
+
+
+**127. Remark: in any finite-player game with finite number of actions, there exists at least one Nash equilibrium.**
+
+&#10230; Remarque : dans un jeu à nombre de joueurs et d'actions finis, il existe au moins un équilibre de Nash.
+
+<br>
+
+
+**128. [Tree search, Backtracking search, Breadth-first search, Depth-first search, Iterative deepening]**
+
+&#10230; [Parcours d'arbre, Retour sur trace, Parcours en largeur, Parcours en profondeur, Approfondissement itératif]
+
+<br>
+
+
+**129. [Graph search, Dynamic programming, Uniform cost search]**
+
+&#10230; [Parcours de graphe, Programmation dynamique, Recherche à coût uniforme]
+
+<br>
+
+
+**130. [Learning costs, Structured perceptron]**
+
+&#10230; [Apprendre les coûts, Perceptron structuré]
+
+<br>
+
+
+**131. [A star search, Heuristic function, Algorithm, Consistency, correctness, Admissibility, efficiency]**
+
+&#10230; [A étoile, Fonction heuristique, Algorithme, Consistance, Correction, Admissibilité, Efficacité]
+
+<br>
+
+
+**132. [Relaxation, Relaxed search problem, Relaxed heuristic, Max heuristic]**
+
+&#10230; [Relaxation, Relaxation d'un problème de recherche, Relaxation d'une heuristique, Heuristique max]
+
+<br>
+
+
+**133. [Markov decision processes, Overview, Policy evaluation, Value iteration, Transitions, rewards]**
+
+&#10230; [Processus de décision markovien, Aperçu, Évaluation d'une politique, Itération sur la valeur, Transitions, Récompenses]
+
+<br>
+
+
+**134. [Game playing, Expectimax, Minimax, Speeding up minimax, Simultaneous games, Non-zero-sum games]**
+
+&#10230; [Jeux, Expectimax, Minimax, Accélération de minimax, Jeux simultanés, Jeux à somme non nulle]
+
+<br>
+
+
+**135. View PDF version on GitHub**
+
+&#10230; Voir la version PDF sur GitHub.
+
+<br>
+
+
+**136. Original authors**
+
+&#10230; Auteurs d'origine.
+
+<br>
+
+
+**137. Translated by X, Y and Z**
+
+&#10230; Traduit de l'anglais par X, Y et Z.
+
+<br>
+
+
+**138. Reviewed by X, Y and Z**
+
+&#10230; Revu par X, Y et Z.
+
+<br>
+
+
+**139. By X and Y**
+
+&#10230; De X et Y.
+
+<br>
+
+
+**140. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230; Les pense-bête d'intelligence artificielle sont maintenant disponibles en français !

From c09f48ded1058ec7a9407583817ffbdcb86e3922 Mon Sep 17 00:00:00 2001
From: shervinea <shervine.amidi@centraliens.net>
Date: Wed, 14 Aug 2019 23:05:47 -0700
Subject: [PATCH 306/531] Fix typo

---
 template/cs-221-states-models.md | 2 +-
 tr/cs-221-states-models.md       | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/template/cs-221-states-models.md b/template/cs-221-states-models.md
index a945c8632..e21270f89 100644
--- a/template/cs-221-states-models.md
+++ b/template/cs-221-states-models.md
@@ -198,7 +198,7 @@
 <br>
 
 
-**29. Remark 1: the UCS algorithm is logically equivalent to Djikstra's algorithm.**
+**29. Remark 1: the UCS algorithm is logically equivalent to Dijkstra's algorithm.**
 
 &#10230;
 
diff --git a/tr/cs-221-states-models.md b/tr/cs-221-states-models.md
index dccab7884..bceddce2b 100644
--- a/tr/cs-221-states-models.md
+++ b/tr/cs-221-states-models.md
@@ -198,9 +198,9 @@
 <br>
 
 
-**29. Remark 1: the UCS algorithm is logically equivalent to Djikstra's algorithm.**
+**29. Remark 1: the UCS algorithm is logically equivalent to Dijkstra's algorithm.**
 
-&#10230; Not 1: UCS algoritması mantıksal olarak Djikstra algoritması ile aynıdır.
+&#10230; Not 1: UCS algoritması mantıksal olarak Dijkstra algoritması ile aynıdır.
 
 <br>
 

From 1a653e5fb94980719d4e4ce71be01de69146d626 Mon Sep 17 00:00:00 2001
From: shervinea <shervine.amidi@centraliens.net>
Date: Sun, 18 Aug 2019 19:05:02 -0700
Subject: [PATCH 307/531] Add [fr] translation

---
 fr/cs-221-variables-models.md | 617 ++++++++++++++++++++++++++++++++++
 1 file changed, 617 insertions(+)
 create mode 100644 fr/cs-221-variables-models.md

diff --git a/fr/cs-221-variables-models.md b/fr/cs-221-variables-models.md
new file mode 100644
index 000000000..9c802583b
--- /dev/null
+++ b/fr/cs-221-variables-models.md
@@ -0,0 +1,617 @@
+**Variables-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-variables-models)
+
+<br>
+
+**1. Variables-based models with CSP and Bayesian networks**
+
+&#10230; Modèles basés sur les variables : CSP et réseaux bayésiens
+
+<br>
+
+
+**2. Constraint satisfaction problems**
+
+&#10230; Problèmes de satisfaction de contraintes
+
+<br>
+
+
+**3. In this section, our objective is to find maximum weight assignments of variable-based models. One advantage compared to states-based models is that these algorithms are more convenient to encode problem-specific constraints.**
+
+&#10230; Dans cette section, notre but est de trouver des affectations de poids maximisants dans des problèmes impliquant des modèles basés sur les variables. Un avantage comparé aux modèles basés sur les états est que ces algorithmes sont plus commodes lorsqu'il s'agit de transcrire des contraintes spécifiques à certains problèmes.
+
+<br>
+
+
+**4. Factor graphs**
+
+&#10230; Graphes de facteurs
+
+<br>
+
+
+**5. Definition ― A factor graph, also referred to as a Markov random field, is a set of variables X=(X1,...,Xn) where Xi∈Domaini and m factors f1,...,fm with each fj(X)⩾0.**
+
+&#10230; Définition - Un graphe de facteurs, aussi appelé champ aléatoire de Markov, est un ensemble de variables X=(X1,...,Xn) où Xi∈Domaini muni de m facteurs f1,...,fm où chaque fj(X)⩾0.
+
+<br>
+
+
+**6. Domain**
+
+&#10230; Domaine
+
+<br>
+
+
+**7. Scope and arity ― The scope of a factor fj is the set of variables it depends on. The size of this set is called the arity.**
+
+&#10230; Arité - Le nombre de variables dépendant d'un facteur fj est appelé son arité.
+
+<br>
+
+
+**8. Remark: factors of arity 1 and 2 are called unary and binary respectively.**
+
+&#10230; Remarque : les facteurs d'arité 1 et 2 sont respectivement appelés unaire et binaire.
+
+<br>
+
+
+**9. Assignment weight ― Each assignment x=(x1,...,xn) yields a weight Weight(x) defined as being the product of all factors fj applied to that assignment. Its expression is given by:**
+
+&#10230; Affectation de poids - Chaque affectation x=(x1,...,xn) donne un poids Weight(x) défini comme étant le produit de tous les facteurs fj appliqués à cette affectation. Son expression est donnée par :
+
+<br>
+
+
+**10. Constraint satisfaction problem ― A constraint satisfaction problem (CSP) is a factor graph where all factors are binary; we call them to be constraints:**
+
+&#10230; Problème de satisfaction de contraintes - Un problème de satisfaction de contraintes (en anglais constraint satisfaction problem ou CSP) est un graphe de facteurs où tous les facteurs sont binaires ; on les appelle "contraintes".
+
+<br>
+
+
+**11. Here, the constraint j with assignment x is said to be satisfied if and only if fj(x)=1.**
+
+&#10230; Ici, on dit que l'affectation x satisfait la contrainte j si et seulement si fj(x)=1.
+
+<br>
+
+
+**12. Consistent assignment ― An assignment x of a CSP is said to be consistent if and only if Weight(x)=1, i.e. all constraints are satisfied.**
+
+&#10230; Affectation consistante - Une affectation x d'un CSP est dite consistante si et seulement si Weight(x)=1, i.e. toutes les contraintes sont satisfaites.
+
+<br>
+
+
+**13. Dynamic ordering**
+
+&#10230; Mise en ordre dynamique
+
+<br>
+
+
+**14. Dependent factors ― The set of dependent factors of variable Xi with partial assignment x is called D(x,Xi), and denotes the set of factors that link Xi to already assigned variables.**
+
+&#10230; Facteurs dépendants - L'ensemble des facteurs dépendants de la variable Xi dont l'affectation partielle est x est appelé D(x,Xi) et désigne l'ensemble des facteurs liant Xi à des variables déjà affectées.
+
+<br>
+
+
+**15. Backtracking search ― Backtracking search is an algorithm used to find maximum weight assignments of a factor graph. At each step, it chooses an unassigned variable and explores its values by recursion. Dynamic ordering (i.e. choice of variables and values) and lookahead (i.e. early elimination of inconsistent options) can be used to explore the graph more efficiently, although the worst-case runtime stays exponential: O(|Domain|n).**
+
+&#10230; Recherche avec retour sur trace - L'algorithme de recherche avec retour sur trace (en anglais backtracking search) est utilisé pour trouver l'affectation de poids maximum d'un graphe de facteurs. À chaque étape, une variable non assignée est choisie et ses valeurs sont explorées par récursivité. On peut utiliser un processus de mise en ordre dynamique sur le choix des variables et valeurs et/ou d'anticipation (i.e. élimination précoce d'options non consistantes) pour explorer le graphe de manière plus efficace. La complexité temporelle dans tous les cas reste néanmoins exponentielle : O(|Domaine|n).
+
+<br>
+
+
+**16. [Forward checking ― It is a one-step lookahead heuristic that preemptively removes inconsistent values from the domains of neighboring variables. It has the following characteristics:, After assigning a variable Xi, it eliminates inconsistent values from the domains of all its neighbors., If any of these domains becomes empty, we stop the local backtracking search., If we un-assign a variable Xi, we have to restore the domain of its neighbors.]**
+
+&#10230; [Vérification en avant - La vérification en avant (forward checking en anglais) est une heuristique d'anticipation à une étape qui enlève des variables voisines les valeurs impossibles de manière préemptive. Cette méthode a les caractéristiques suivantes :, Après l'affectation d'une variable Xi, les valeurs non consistantes sont éliminées du domaine de tous ses voisins., Si l'un de ces domaines devient vide, la recherche locale s'arrête., Si l'on enlève l'affectation d'une valeur Xi, on doit restaurer le domaine de ses voisins.]
+
+<br>
+
+
+**17. Most constrained variable ― It is a variable-level ordering heuristic that selects the next unassigned variable that has the fewest consistent values. This has the effect of making inconsistent assignments to fail earlier in the search, which enables more efficient pruning.**
+
+&#10230; Variable la plus contrainte - L'heuristique de la variable la plus contrainte (en anglais most constrained variable ou MCV) sélectionne la prochaine variable sans affectation ayant le moins de valeurs consistantes. Cette procédure a pour effet de faire échouer les affectations impossibles plus tôt dans la recherche, permettant un élagage plus efficace.
+
+<br>
+
+
+**18. Least constrained value ― It is a value-level ordering heuristic that assigns the next value that yields the highest number of consistent values of neighboring variables. Intuitively, this procedure chooses first the values that are most likely to work.**
+
+&#10230; Valeur la moins contraignante - L'heuristique de la valeur la moins contraignante (en anglais least constrained value ou LCV) sélectionne pour une variable donnée la prochaine valeur maximisant le nombre de valeurs consistantes chez les variables voisines. De manière intuitive, on peut dire que cette procédure choisit en premier les valeurs qui sont le plus susceptible de marcher.
+
+<br>
+
+
+**19. Remark: in practice, this heuristic is useful when all factors are constraints.**
+
+&#10230; Remarque : en pratique, cette heuristique est utile quand tous les facteurs sont des contraintes.
+
+<br>
+
+
+**20. The example above is an illustration of the 3-color problem with backtracking search coupled with most constrained variable exploration and least constrained value heuristic, as well as forward checking at each step.**
+
+&#10230; L'exemple ci-dessus est une illustration du problème de coloration de graphe à 3 couleurs en utilisant l'algorithme de recherche avec retour sur trace couplé avec les heuristiques de MCV, de LCV ainsi que de vérification en avant à chaque étape.
+
+<br>
+
+
+**21. [Arc consistency ― We say that arc consistency of variable Xl with respect to Xk is enforced when for each xl∈Domainl:, unary factors of Xl are non-zero, there exists at least one xk∈Domaink such that any factor between Xl and Xk is non-zero.]**
+
+&#10230; [Arc-consistance - On dit que l'arc-consistance de la variable Xl par rapport à Xk est vérifiée lorsque pour tout xl∈Domainl :, les facteurs unaires de Xl sont non-nuls, il existe au moins un xk∈Domaink tel que n'importe quel facteur entre Xl et Xk est non nul.]
+
+<br>
+
+
+**22. AC-3 ― The AC-3 algorithm is a multi-step lookahead heuristic that applies forward checking to all relevant variables. After a given assignment, it performs forward checking and then successively enforces arc consistency with respect to the neighbors of variables for which the domain change during the process.**
+
+&#10230; AC-3 - L'algorithme d'AC-3 est une heuristique qui applique le principe de vérification en avant à toutes les variables susceptibles d'être concernées. Après l'affectation d'une variable, cet algorithme effectue une vérification en avant et applique successivement l'arc-consistance avec tous les voisins de variables pour lesquels le domaine change.
+
+<br>
+
+
+**23. Remark: AC-3 can be implemented both iteratively and recursively.**
+
+&#10230; Remarque : AC-3 peut être codé de manière itérative ou récursive.
+
+<br>
+
+
+**24. Approximate methods**
+
+&#10230; Méthodes approximatives
+
+<br>
+
+
+**25. Beam search ― Beam search is an approximate algorithm that extends partial assignments of n variables of branching factor b=|Domain| by exploring the K top paths at each step. The beam size K∈{1,...,bn} controls the tradeoff between efficiency and accuracy. This algorithm has a time complexity of O(n⋅Kblog(Kb)).**
+
+&#10230; Recherche en faisceau - L'algorithme de recherche en faisceau (en anglais beam search) est une technique approximative qui étend les affectations partielles de n variables de facteur de branchement b=|Domain| en explorant les K meilleurs chemins qui s'offrent à chaque étape. La largeur du faisceau K∈{1,...,bn} détermine la balance entre efficacité et précision de l'algorithme. Sa complexité en temps est de O(n⋅Kblog(Kb)).
+
+<br>
+
+
+**26. The example below illustrates a possible beam search of parameters K=2, b=3 and n=5.**
+
+&#10230; L'exemple ci-dessous illustre une recherche en faisceau de paramètres K=2, b=3 et n=5.
+
+<br>
+
+
+**27. Remark: K=1 corresponds to greedy search whereas K→+∞ is equivalent to BFS tree search.**
+
+&#10230; Remarque : K=1 correspond à la recherche gloutonne alors que K→+∞ est équivalent à effectuer un parcours en largeur.
+
+<br>
+
+
+**28. Iterated conditional modes ― Iterated conditional modes (ICM) is an iterative approximate algorithm that modifies the assignment of a factor graph one variable at a time until convergence. At step i, we assign to Xi the value v that maximizes the product of all factors connected to that variable.**
+
+&#10230; Modes conditionnels itérés - L'algorithme des modes conditionnels itérés (en anglais iterated conditional modes ou ICM) est une technique itérative et approximative qui modifie l'affectation d'un graphe de facteurs une variable à la fois jusqu'à convergence. À l'étape i, Xi prend la valeur v qui maximise le produit de tous les facteurs connectés à cette variable.
+
+<br>
+
+
+**29. Remark: ICM may get stuck in local minima.**
+
+&#10230; Remarque : il est possible qu'ICM reste bloqué dans un minimum local.
+
+<br>
+
+
+**30. [Gibbs sampling ― Gibbs sampling is an iterative approximate method that modifies the assignment of a factor graph one variable at a time until convergence. At step i:, we assign to each element u∈Domaini a weight w(u) that is the product of all factors connected to that variable, we sample v from the probability distribution induced by w and assign it to Xi.]**
+
+&#10230; [Échantillonnage de Gibbs - La méthode d'échantillonnage de Gibbs (en anglais Gibbs sampling) est une technique itérative et approximative qui modifie les affectations d'un graphe de facteurs une variable à la fois jusqu'à convergence. À l'étape i :, on assigne à chaque élément u∈Domaini un poids w(u) qui est le produit de tous les facteurs connectés à cette variable, on échantillonne v de la loi de probabilité engendrée par w et on l'associe à Xi.]
+
+<br>
+
+
+**31. Remark: Gibbs sampling can be seen as the probabilistic counterpart of ICM. It has the advantage to be able to escape local minima in most cases.**
+
+&#10230; Remarque : la méthode d'échantillonnage de Gibbs peut être vue comme étant la version probabiliste de ICM. Cette méthode a l'avantage de pouvoir échapper aux potentiels minimum locaux dans la plupart des situations.
+
+<br>
+
+
+**32. Factor graph transformations**
+
+&#10230; Transformations sur les graphes de facteurs
+
+<br>
+
+
+**33. Independence ― Let A,B be a partitioning of the variables X. We say that A and B are independent if there are no edges between A and B and we write:**
+
+&#10230; Indépendance - Soit A, B une partition des variables X. On dit que A et B sont indépendants s'il n'y a pas d'arête connectant A et B et on écrit :
+
+<br>
+
+
+**34. Remark: independence is the key property that allows us to solve subproblems in parallel.**
+
+&#10230; Remarque : l'indépendance est une propriété importante car elle nous permet de décomposer la situation en sous-problèmes que l'on peut résoudre en parallèle.
+
+<br>
+
+
+**35. Conditional independence ― We say that A and B are conditionally independent given C if conditioning on C produces a graph in which A and B are independent. In this case, it is written:**
+
+&#10230; Indépendance conditionnelle - On dit que A et B sont conditionnellement indépendants par rapport à C si le fait de conditionner sur C produit un graphe dans lequel A et B sont indépendants. Dans ce cas, on écrit :
+
+<br>
+
+
+**36. [Conditioning ― Conditioning is a transformation aiming at making variables independent that breaks up a factor graph into smaller pieces that can be solved in parallel and can use backtracking. In order to condition on a variable Xi=v, we do as follows:, Consider all factors f1,...,fk that depend on Xi, Remove Xi and f1,...,fk, Add gj(x) for j∈{1,...,k} defined as:]**
+
+&#10230; [Conditionnement - Le conditionnement est une transformation visant à rendre des variables indépendantes et ainsi diviser un graphe de facteurs en pièces plus petites qui peuvent être traitées en parallèle et utiliser le retour sur trace. Pour conditionner par rapport à une variable Xi=v, on :, considère toues les facteurs f1,...,fk qui dépendent de Xi, enlève Xi et f1,...,fk, ajoute gj(x) pour j∈{1,...,k} défini par :]
+
+<br>
+
+
+**37. Markov blanket ― Let A⊆X be a subset of variables. We define MarkovBlanket(A) to be the neighbors of A that are not in A.**
+
+&#10230; Couverture de Markov - Soit A⊆X une partie des variables. On définit MarkovBlanket(A) comme étant les voisins de A qui ne sont pas dans A.
+
+<br>
+
+
+**38. Proposition ― Let C=MarkovBlanket(A) and B=X∖(A∪C). Then we have:**
+
+&#10230; Proposition - Soit C=MarkovBlanket(A) et B=X∖(A∪C). On a alors :
+
+<br>
+
+
+**39. [Elimination ― Elimination is a factor graph transformation that removes Xi from the graph and solves a small subproblem conditioned on its Markov blanket as follows:, Consider all factors fi,1,...,fi,k that depend on Xi, Remove Xi
+and fi,1,...,fi,k, Add fnew,i(x) defined as:]**
+
+&#10230; [Élimination - L'élimination est une transformation consistant à enlever Xi d'un graphe de facteurs pour ensuite résoudre un sous-problème conditionné sur sa couverture de Markov où l'on :, considère tous les facteurs fi,1,...,fi,k qui dépendent de Xi, enlève Xi et fi,1,...,fi,k, ajoute fnew,i(x) défini par :]
+
+<br>
+
+
+**40. Treewidth ― The treewidth of a factor graph is the maximum arity of any factor created by variable elimination with the best variable ordering. In other words,**
+
+&#10230; Largeur arborescente - La largeur arborescente (en anglais treewidth) d'un graphe de facteurs est l'arité maximum de n'importe quel facteur créé par élimination avec le meilleur ordre de variable. En d'autres termes,
+
+<br>
+
+
+**41. The example below illustrates the case of a factor graph of treewidth 3.**
+
+&#10230; L'exemple ci-dessous illustre le cas d'un graphe de facteurs ayant une largeur arborescente égale à 3.
+
+<br>
+
+
+**42. Remark: finding the best variable ordering is a NP-hard problem.**
+
+&#10230; Remarque : trouver le meilleur ordre de variable est un problème NP-difficile.
+
+<br>
+
+
+**43. Bayesian networks**
+
+&#10230; Réseaux bayésiens
+
+<br>
+
+
+**44. In this section, our goal will be to compute conditional probabilities. What is the probability of a query given evidence?**
+
+&#10230; Dans cette section, notre but est de calculer des probabilités conditionnelles. Quelle est la probabilité d'un événement étant donné des observations ?
+
+<br>
+
+
+**45. Introduction**
+
+&#10230; Introduction
+
+<br>
+
+
+**46. Explaining away ― Suppose causes C1 and C2 influence an effect E. Conditioning on the effect E and on one of the causes (say C1) changes the probability of the other cause (say C2). In this case, we say that C1 has explained away C2.**
+
+&#10230; Explication - Supposons que les causes C1 et C2 influencent un effet E. Le conditionnement sur l'effet E et une des causes (disons C1) change la probabilité de l'autre cause (disons C2). Dans ce cas, on dit que C1 a expliqué C2.
+
+<br>
+
+
+**47. Directed acyclic graph ― A directed acyclic graph (DAG) is a finite directed graph with no directed cycles.**
+
+&#10230; Graphe orienté acyclique - Un graphe orienté acyclique (en anglais directed acyclic graph ou DAG) est un graphe orienté fini sans cycle orienté.
+
+<br>
+
+
+**48. Bayesian network ― A Bayesian network is a directed acyclic graph (DAG) that specifies a joint distribution over random variables X=(X1,...,Xn) as a product of local conditional distributions, one for each node:**
+
+&#10230; Réseau bayésien - Un réseau bayésien (en anglais Bayesian network) est un DAG qui définit une loi de probabilité jointe sur les variables aléatoires X=(X1,...,Xn) comme étant le produit des lois de probabilités conditionnelles locales (une pour chaque noeud) :
+
+<br>
+
+
+**49. Remark: Bayesian networks are factor graphs imbued with the language of probability.**
+
+&#10230; Remarque : les réseaux bayésiens sont des graphes de facteurs imprégnés de concepts de probabilité.
+
+<br>
+
+
+**50. Locally normalized ― For each xParents(i), all factors are local conditional distributions. Hence they have to satisfy:**
+
+&#10230; Normalisation locale - Pour chaque xParents(i), tous les facteurs sont localement des lois de probabilité conditionnelles. Elles doivent donc vérifier :
+
+<br>
+
+
+**51. As a result, sub-Bayesian networks and conditional distributions are consistent.**
+
+&#10230; De ce fait, les sous-réseaux bayésiens et les distributions conditionnelles sont consistants.
+
+<br>
+
+
+**52. Remark: local conditional distributions are the true conditional distributions.**
+
+&#10230; Remarque : les lois locales de probabilité conditionnelles sont de vraies lois de probabilité conditionnelles.
+
+<br>
+
+
+**53. Marginalization ― The marginalization of a leaf node yields a Bayesian network without that node.**
+
+&#10230; Marginalisation - La marginalisation d'un noeud sans enfant entraine un réseau bayésian sans ce noeud.
+
+<br>
+
+
+**54. Probabilistic programs**
+
+&#10230; Programmes probabilistes
+
+<br>
+
+
+**55. Concept ― A probabilistic program randomizes variables assignment. That way, we can write down complex Bayesian networks that generate assignments without us having to explicitly specify associated probabilities.**
+
+&#10230; Concept - Un programme probabiliste rend aléatoire l'affectation de variables. De ce fait, on peut imaginer des réseaux bayésiens compliqués pour la génération d'affectations sans avoir à écrire de manière explicite les probabilités associées.
+
+<br>
+
+
+**56. Remark: examples of probabilistic programs include Hidden Markov model (HMM), factorial HMM, naive Bayes, latent Dirichlet allocation, diseases and symptoms and stochastic block models.**
+
+&#10230; Remarque : quelques exemples de programmes probabilistes incluent parmi d'autres le modèle de Markov caché (en anglais hidden Markov model ou HMM), HMM factoriel, le modèle bayésien naïf (en anglais naive Bayes), l'allocation de Dirichlet latente (en anglais latent Dirichlet allocation ou LDA), le modèle à blocs stochastiques (en anglais stochastic block model).
+
+<br>
+
+
+**57. Summary ― The table below summarizes the common probabilistic programs as well as their applications:**
+
+&#10230; Récapitulatif - La table ci-dessous résume les programmes probabilistes les plus fréquents ainsi que leur champ d'application associé :
+
+<br>
+
+
+**58. [Program, Algorithm, Illustration, Example]**
+
+&#10230; [Programme, Algorithme, Illustration, Exemple]
+
+<br>
+
+
+**59. [Markov Model, Hidden Markov Model (HMM), Factorial HMM, Naive Bayes, Latent Dirichlet Allocation (LDA)]**
+
+&#10230; [Modèle de Markov, Modèle de Markov caché (HMM), HMM factoriel, Bayésien naïf, Allocation de Dirichlet latente (LDA)]
+
+<br>
+
+
+**60. [Generate, distribution]**
+
+&#10230; [Génère, distribution]
+
+<br>
+
+
+**61. [Language modeling, Object tracking, Multiple object tracking, Document classification, Topic modeling]**
+
+&#10230; [Modélisation du langage, Suivi d'objet, Suivi de plusieurs objets, Classification de document, Modélisation de sujet]
+
+<br>
+
+
+**62. Inference**
+
+&#10230; Inférence
+
+<br>
+
+
+**63. [General probabilistic inference strategy ― The strategy to compute the probability P(Q|E=e) of query Q given evidence E=e is as follows:, Step 1: Remove variables that are not ancestors of the query Q or the evidence E by marginalization, Step 2: Convert Bayesian network to factor graph, Step 3: Condition on the evidence E=e, Step 4: Remove nodes disconnected from the query Q by marginalization, Step 5: Run a probabilistic inference algorithm (manual, variable elimination, Gibbs sampling, particle filtering)]**
+
+&#10230; [Stratégie générale pour l'inférence probabiliste - La stratégie que l'on utilise pour calculer la probabilité P(Q|E=e) d'une requête Q étant donnée l'observation E=e est la suivante :, Étape 1 : on enlève les variables qui ne sont pas les ancêtres de la requête Q ou de l'observation E par marginalisation, Étape 2 : on convertit le réseau bayésien en un graphe de facteurs, Étape 3 : on conditionne sur l'observation E=e, Étape 4 : on enlève les noeuds déconnectés de la requête Q par marginalisation, Étape 5 : on lance un algorithme d'inférence probabiliste (manuel, élimination de variables, échantillonnage de Gibbs, filtrage particulaire)]
+
+<br>
+
+
+**64. Forward-backward algorithm ― This algorithm computes the exact value of P(H=hk|E=e) (smoothing query) for any k∈{1,...,L} in the case of an HMM of size L. To do so, we proceed in 3 steps:**
+
+&#10230; Algorithme progressif-rétrogressif - L'algorithme progressif-rétrogressif (en anglais forward-backward) calcule la valeur exacte de P(H=hk|E=e) pour chaque k∈{1,...,L} dans le cas d'un HMM de taille L. Pour ce faire, on procède en 3 étapes :
+
+<br>
+
+
+**65. Step 1: for ..., compute ...**
+
+&#10230; Étape 1 : pour ..., calculer ...
+
+<br>
+
+
+**66. with the convention F0=BL+1=1. From this procedure and these notations, we get that**
+
+&#10230; avec la convention F0=BL+1=1. À partir de cette procédure et avec ces notations, on obtient
+
+<br>
+
+
+**67. Remark: this algorithm interprets each assignment to be a path where each edge hi−1→hi is of weight p(hi|hi−1)p(ei|hi).**
+
+&#10230; Remarque : cet algorithme interprète une affectation comme étant un chemin où chaque arête hi−1→hi a un poids p(hi|hi−1)p(ei|hi).
+
+<br>
+
+
+**68. [Gibbs sampling ― This algorithm is an iterative approximate method that uses a small set of assignments (particles) to represent a large probability distribution. From a random assignment x, Gibbs sampling performs the following steps for i∈{1,...,n} until convergence:, For all u∈Domaini, compute the weight w(u) of assignment x where Xi=u, Sample v from the probability distribution induced by w: v∼P(Xi=v|X−i=x−i), Set Xi=v]**
+
+&#10230; [Échantillonnage de Gibbs - L'algorithme d'échantillonnage de Gibbs (en anglais Gibbs sampling) est une méthode itérative et approximative qui utilise un petit ensemble d'affectations (particules) pour représenter une loi de probabilité. Pour une affectation aléatoire x, l'échantillonnage de Gibbs effectue les étapes suivantes pour i∈{1,...,n} jusqu'à convergence :, Pour tout u∈Domaini, on calcule le poids w(u) de l'affectation x où Xi=u, On échantillonne v de la loi de probabilité engendrée par w : v∼P(Xi=v|X−i=x−i), On pose Xi=v]
+
+<br>
+
+
+**69. Remark: X−i denotes X∖{Xi} and x−i represents the corresponding assignment.**
+
+&#10230; Remarque X-i veut dire X∖{Xi} et x−i représente l'affectation correspondante.
+
+<br>
+
+
+**70. [Particle filtering ― This algorithm approximates the posterior density of state variables given the evidence of observation variables by keeping track of K particles at a time. Starting from a set of particles C of size K, we run the following 3 steps iteratively:, Step 1: proposal - For each old particle xt−1∈C, sample x from the transition probability distribution p(x|xt−1) and add x to a set C′., Step 2: weighting - Weigh each x of the set C′ by w(x)=p(et|x), where et is the evidence observed at time t., Step 3: resampling - Sample K elements from the set C′ using the probability distribution induced by w and store them in C: these are the current particles xt.]**
+
+&#10230; [Filtrage particulaire - L'algorithme de filtrage particulaire (en anglais particle filtering) approxime la densité postérieure de variables d'états à partir des variables observées en suivant K particules à la fois. En commençant avec un ensemble de particules C de taille K, on répète les 3 étapes suivantes :, Étape 1 : proposition - Pour chaque particule xt−1∈C, on échantillonne x avec loi de probabilité p(x|xt−1) et on ajoute x à un ensemble C′., Étape 2 : pondération - On associe chaque x de l'ensemble C′ au poids w(x)=p(et|x), où et est l'observation vue à l'instant t. Étape 3 : ré-échantillonnage - On échantillonne K éléments de l'ensemble C´ en utilisant la loi de probabilité engendrée par w et on les met dans C : ce sont les particules actuelles xt.]
+
+<br>
+
+
+**71. Remark: a more expensive version of this algorithm also keeps track of past particles in the proposal step.**
+
+&#10230; Remarque : une version plus coûteuse de cet algorithme tient aussi compte des particules passée à l'étape de proposition.
+
+<br>
+
+
+**72. Maximum likelihood ― If we don't know the local conditional distributions, we can learn them using maximum likelihood.**
+
+&#10230; Maximum de vraisemblance - Si l'on ne connaît pas les lois de probabilité locales, on peut les trouver en utilisant le maximum de vraisemblance.
+
+<br>
+
+
+**73. Laplace smoothing ― For each distribution d and partial assignment (xParents(i),xi), add λ to countd(xParents(i),xi), then normalize to get probability estimates.**
+
+&#10230; Lissage de Laplace - Pour chaque loi de probabilité d et affectation partielle (xParents(i),xi), on ajoute λ à countd(xParents(i),xi) et on normalise ensuite pour obtenir des probabilités.
+
+<br>
+
+
+**74. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230; Espérance-maximisation - L'algorithme d'espérance-maximisation (en anglais expectation-maximization ou EM) est une méthode efficace utilisée pour estimer le paramètre θ via l'estimation du maximum de vraisemblance en construisant de manière répétée une borne inférieure de la vraisemblance (étape E) et en optimisant cette borne inférieure (étape M) :
+
+<br>
+
+
+**75. [E-step: Evaluate the posterior probability q(h) that each data point e came from a particular cluster h as follows:, M-step: Use the posterior probabilities q(h) as cluster specific weights on data points e to determine θ through maximum likelihood.]**
+
+&#10230; [Étape E : on évalue la probabilité postérieure q(h) que chaque point e vienne d'une partition particulière h avec :, Étape M : on utilise la probabilité postérieure q(h) en tant que poids de la partition h sur les points e pour déterminer θ via maximum de vraisemblance]
+
+<br>
+
+
+**76. [Factor graphs, Arity, Assignment weight, Constraint satisfaction problem, Consistent assignment]**
+
+&#10230; [Graphe de facteurs, Arité, Poids, Satisfaction de contraintes, Affectation consistante]
+
+<br>
+
+
+**77. [Dynamic ordering, Dependent factors, Backtracking search, Forward checking, Most constrained variable, Least constrained value]**
+
+&#10230; [Mise en ordre dynamique, Facteurs dépendants, Retour sur trace, Vérification en avant, Variable la plus contrainte, Valeur la moins contraignante]
+
+<br>
+
+
+**78. [Approximate methods, Beam search, Iterated conditional modes, Gibbs sampling]**
+
+&#10230; [Méthodes approximatives, Recherche en faisceau, Modes conditionnels itérés, Échantillonnage de Gibbs]
+
+<br>
+
+
+**79. [Factor graph transformations, Conditioning, Elimination]**
+
+&#10230; [Transformations de graphes de facteurs, Conditionnement, Élimination]
+
+<br>
+
+
+**80. [Bayesian networks, Definition, Locally normalized, Marginalization]**
+
+&#10230; [Réseaux bayésiens, Définition, Normalisé localement, Marginalisation]
+
+<br>
+
+
+**81. [Probabilistic program, Concept, Summary]**
+
+&#10230; [Programme probabiliste, Concept, Récapitulatif]
+
+<br>
+
+
+**82. [Inference, Forward-backward algorithm, Gibbs sampling, Laplace smoothing]**
+
+&#10230; [Inférence, Algorithme progressif-rétrogressif, Échantillonnage de Gibbs, Lissage de Laplace]
+
+<br>
+
+
+**83. View PDF version on GitHub**
+
+&#10230; Voir la version PDF sur GitHub.
+
+<br>
+
+
+**84. Original authors**
+
+&#10230; Auteurs d'origine.
+
+<br>
+
+
+**85. Translated by X, Y and Z**
+
+&#10230; Traduit de l'anglais par X, Y et Z.
+
+<br>
+
+
+**86. Reviewed by X, Y and Z**
+
+&#10230; Revu par X, Y et Z.
+
+<br>
+
+
+**87. By X and Y**
+
+&#10230; De X et Y.
+
+<br>
+
+
+**88. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230; Les pense-bêtes d'intelligence artificielle sont maintenant disponibles en français !

From 52dfcd08a8bb58d39a728439aba4efbc7a980f7e Mon Sep 17 00:00:00 2001
From: shervinea <shervine.amidi@centraliens.net>
Date: Sun, 18 Aug 2019 19:15:06 -0700
Subject: [PATCH 308/531] Minor [fr] fixes

---
 fr/cs-221-logic-models.md  | 14 +++++++-------
 fr/cs-221-reflex-models.md |  4 ++--
 fr/cs-221-states-models.md |  4 ++--
 3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/fr/cs-221-logic-models.md b/fr/cs-221-logic-models.md
index 2ecdbc81e..aa03a9b9a 100644
--- a/fr/cs-221-logic-models.md
+++ b/fr/cs-221-logic-models.md
@@ -4,7 +4,7 @@
 
 **1. Logic-based models with propositional and first-order logic**
 
-&#10230; Modèles logiques propositionnels et calcul des prédicats du premier ordre
+&#10230; Modèles basés sur la logique : logique propositionnelle et calcul des prédicats du premier ordre
 
 <br>
 
@@ -95,7 +95,7 @@
 
 **14. Probabilistic interpretation ― The probability that query f is evaluated to 1 can be seen as the proportion of models w of the knowledge base KB that satisfy f, i.e.:**
 
-&#10230; Interprétation en termes de probabilités - La probabilté que la requête f soit évaluée à 1 peut être vue comme la proportion des modèles w de la base de connaissance KB qui satisfait f, i.e. :
+&#10230; Interprétation en termes de probabilités - La probabilité que la requête f soit évaluée à 1 peut être vue comme la proportion des modèles w de la base de connaissance KB qui satisfait f, i.e. :
 
 <br>
 
@@ -172,7 +172,7 @@
 
 **25. Forward inference algorithm ― From a set of inference rules Rules, this algorithm goes through all possible f1,...,fk and adds g to the knowledge base KB if a matching rule exists. This process is repeated until no more additions can be made to KB.**
 
-&#10230; Algorithme de chaînage avant (forward inference algorithm) - Partant d'un ensemble de règles d'inférence Rules, cet algorithme parcourt tous les f1,...,fk et ajoute g à la base de connaissance KB si une règle parvient à une telle conclusion. Cette démarche est répétée jusqu'à ce qu'aucun autre ajout ne puisse être fait à KB.
+&#10230; Algorithme de chaînage avant - Partant d'un ensemble de règles d'inférence Rules, l'algorithme de chaînage avant (en anglais forward inference algorithm) parcourt tous les f1,...,fk et ajoute g à la base de connaissance KB si une règle parvient à une telle conclusion. Cette démarche est répétée jusqu'à ce qu'aucun autre ajout ne puisse être fait à KB.
 
 <br>
 
@@ -200,14 +200,14 @@
 
 **29. [Soundness, Completeness]**
 
-&#10230; [Correction, Complétude]
+&#10230; [Validité, Complétude]
 
 <br>
 
 
 **30. [Inferred formulas are entailed by KB, Can be checked one rule at a time, "Nothing but the truth", Formulas entailing KB are either already in the knowledge base or inferred from it, "The whole truth"]**
 
-&#10230; [Les formules inférées sont déduites par KB, Peut être vérifiée un règle à la fois, "Rien que la vérité", Les formules déduites par KB sont soit déjà dans la base de connaissance, soit inférées de celle-ci, "La vérité dans sa totalité"]
+&#10230; [Les formules inférées sont déduites par KB, Peut être vérifiée une règle à la fois, "Rien que la vérité", Les formules déduites par KB sont soit déjà dans la base de connaissance, soit inférées de celle-ci, "La vérité dans sa totalité"]
 
 <br>
 
@@ -221,7 +221,7 @@
 
 **32. In this section, we will go through logic-based models that use logical formulas and inference rules. The idea here is to balance expressivity and computational efficiency.**
 
-&#10230; Dans cette section, nous allons parcourir les modèles logiques utilisant des formules logiques et des règles d'inférence. L'idée est de trouver le juste milieu entre expressivité et efficacité en termes de calculs.
+&#10230; Dans cette section, nous allons parcourir les modèles logiques utilisant des formules logiques et des règles d'inférence. L'idée est de trouver le juste milieu entre expressivité et efficacité.
 
 <br>
 
@@ -263,7 +263,7 @@
 
 **38. Conjunctive normal form ― A conjunctive normal form (CNF) formula is a conjunction of clauses, where each clause is a disjunction of atomic formulas.**
 
-&#10230; Forme normale conjonctive - La forme normale conjonctive (en anglais conjunctive normal form ou CNF) d'une formule est une conjonction de clauses, chacune d'entre elles étant une dijonction de formules atomiques.
+&#10230; Forme normale conjonctive - La forme normale conjonctive (en anglais conjunctive normal form ou CNF) d'une formule est une conjonction de clauses, chacune d'entre elles étant une disjonction de formules atomiques.
 
 <br>
 
diff --git a/fr/cs-221-reflex-models.md b/fr/cs-221-reflex-models.md
index 0ec1fd159..7a7a489e1 100644
--- a/fr/cs-221-reflex-models.md
+++ b/fr/cs-221-reflex-models.md
@@ -4,7 +4,7 @@
 
 **1. Reflex-based models with Machine Learning**
 
-&#10230; Modèles basés sur le réflex à l'aide de l'apprentissage automatique
+&#10230; Modèles basés sur le réflex : apprentissage automatique
 
 <br>
 
@@ -431,7 +431,7 @@
 
 **62. [Step 2: Compute Σ=1mm∑i=1ϕ(xi)ϕ(xi)T∈Rn×n, which is symmetric with real eigenvalues., Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues., Step 4: Project the data on spanR(u1,...,uk).]**
 
-&#10230; [Étape 2: Calculer Σ=1mm∑i=1ϕ(xi)ϕ(xi)T∈Rn×n, qui est symmétrique avec des valeurs propres réelles., Étape 3: Calculer u1,...,uk∈Rn les k valeurs propres principales orthogonales de Σ, i.e. les vecteurs propres orthogonaux des k valeurs propres les plus grandes., Étape 4: Projeter les données sur spanR(u1,...,uk).]
+&#10230; [Étape 2: Calculer Σ=1mm∑i=1ϕ(xi)ϕ(xi)T∈Rn×n, qui est symétrique avec des valeurs propres réelles., Étape 3: Calculer u1,...,uk∈Rn les k valeurs propres principales orthogonales de Σ, i.e. les vecteurs propres orthogonaux des k valeurs propres les plus grandes., Étape 4: Projeter les données sur spanR(u1,...,uk).]
 
 <br>
 
diff --git a/fr/cs-221-states-models.md b/fr/cs-221-states-models.md
index 4019c6f92..20be6ebb7 100644
--- a/fr/cs-221-states-models.md
+++ b/fr/cs-221-states-models.md
@@ -4,7 +4,7 @@
 
 **1. States-based models with search optimization and MDP**
 
-&#10230; Modèles basés sur les états, utilisés pour optimiser le parcours et les MDPs
+&#10230; Modèles basés sur les états : optimisation de parcours et MDPs
 
 <br>
 
@@ -977,4 +977,4 @@
 
 **140. The Artificial Intelligence cheatsheets are now available in [target language].**
 
-&#10230; Les pense-bête d'intelligence artificielle sont maintenant disponibles en français !
+&#10230; Les pense-bêtes d'intelligence artificielle sont maintenant disponibles en français !

From db204a51d8041231368f1a39f53775a118fea707 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Sat, 24 Aug 2019 12:57:47 +0900
Subject: [PATCH 309/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 50 ++++++++++++++---------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index 095b94dd3..f12b21465 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -4,7 +4,7 @@
 
 **1. Convolutional Neural Networks cheatsheet**
 
-&#10230; 畳み込み神経の網チートシート
+&#10230; 畳み込みニューラルネットワーク チートシート
 
 <br>
 
@@ -25,7 +25,7 @@
 
 **4. [Types of layer, Convolution, Pooling, Fully connected]**
 
-&#10230; [層のタイプ, 畳み込み, プーリング, 完全接続]
+&#10230; [層のタイプ, 畳み込み, プーリング, 全結合]
 
 <br>
 
@@ -34,47 +34,47 @@
 
 &#10230;
 
-<br> [フィルタハイパーパラメータ, 寸法, ストライド, 詰め物]
+<br> [フィルタハイパーパラメータ, 大きさ, ストライド, パディング]
 
 
 **6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
 
-&#10230; [調律ハイパーパラメータ, パラメータの互換性, モデルの複雑, 受容的なフィールド]
+&#10230; [ハイパーパラメータの調整, パラメータの互換性, モデルの複雑さ, 受容野]
 
 <br>
 
 
 **7. [Activation functions, Rectified Linear Unit, Softmax]**
 
-&#10230; [活性化関数, 修正済み線形単位, ソフトマックス]
+&#10230; [活性化関数, 正規化線形ユニット, ソフトマックス]
 
 <br>
 
 
 **8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
 
-&#10230; [オブジェクト検出, モデルのタイプ, 検出, 組合の上の交差点, 非最大抑制, YOLO, R-CNN]
+&#10230; [オブジェクト検出, モデルのタイプ, 検出, 積集合の和集合, 非最大抑制, YOLO, R-CNN]
 
 <br>
 
 
 **9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
 
-&#10230; [顔認証/認識, 一発学習, シャムネットワーク, トリプレット損失]
+&#10230; [顔認証/認識, 一発学習, シャムネットワーク, 三重項損失]
 
 <br>
 
 
 **10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
 
-&#10230; [神経スタイル転送, 活性化, スタイル行列, スタイル/コンテンツコスト関数]
+&#10230; [ニューラルスタイル変換, 活性化, スタイル行列, スタイル/コンテンツコスト関数]
 
 <br>
 
 
 **11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
 
-&#10230; [計算詭計アーキテクチャ, 生成型敵対的ネットワーク, ResNet, インセプションネットワーク]
+&#10230; [計算トリックアーキテクチャ, 敵対的生成ネットワーク, ResNet, インセプションネットワーク]
 
 <br>
 
@@ -88,14 +88,14 @@
 
 **13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
 
-&#10230; 伝統的な畳み込み神経の網のアーキテクチャ - CNNとも知られる畳み込み神経の網は一般的に次の層で構成されている特定タイプの神経の網です。
+&#10230; 伝統的な畳み込みニューラルネットワークのアーキテクチャ - CNNとしても知られる畳み込みニューラルネットワークは一般的に次の層で構成される特定タイプのニューラルネットワークです。
 
 <br>
 
 
 **14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
 
-&#10230; 畳み込み層とプール層は次のセクションで説明されるハイパーパラメータに関して微調整されられる。
+&#10230; 畳み込み層とプーリング層は次のセクションで説明されるハイパーパラメータに関して微調整できます。
 
 <br>
 
@@ -109,21 +109,21 @@
 
 **16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
 
-&#10230; 畳み込み層 (CONV) - 畳み込み層 (CONV)は入力Iを寸法に関して走査している時畳み込みオペレーションズを行うフィルタを使用する。畳み込み層のハイパーパラメータにはフィルタサイズFとストライドSが含まれる。結果出力0は特徴図及び活性化図で呼ばれる。
+&#10230; 畳み込み層 (CONV) - 畳み込み層 (CONV)は入力Iを各次元に関して走査する時に、畳み込み演算を行うフィルタを使用します。畳み込み層のハイパーパラメータにはフィルタサイズFとストライドSが含まれます。結果出力Oは特徴マップまたは活性化マップと呼ばれます。
 
 <br>
 
 
 **17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
 
-&#10230; 注意: 畳み込みステップは1D及び3Dの場合にも一般化されられる。
+&#10230; 注: 畳み込みステップは1次元や3次元の場合にも一般化できます。
 
 <br>
 
 
 **18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
 
-&#10230; プーリング (POOL) - プール層 (POOL)はダウンサンプリング操作で、通常は空間的に不変な畳み込み層の後に適用される。特に、最大及び平均プーリングはそれぞれ最大と平均値が取られる特別な種類のプールです。
+&#10230; プーリング (POOL) - プーリング層 (POOL)はダウンサンプリング操作で、通常は位置不変性をもつ畳み込み層の後に適用されます。特に、最大及び平均プーリングはそれぞれ最大と平均値が取られる特別な種類のプーリングです。
 
 <br>
 
@@ -137,21 +137,21 @@
 
 **20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
 
-&#10230; [最大プール, 平均プール, 各プール操作は現在ビューの最大値を選ぶ, 各プール操作は現在ビューの値を平均する]
+&#10230; [最大プーリング, 平均プーリング, 各プーリング操作は現在のビューの中から最大値を選ぶ, 各プーリング操作は現在のビューに含まれる値を平均する]
 
 <br>
 
 
 **21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
 
-&#10230; [検出された特徴保持, 最も一般的に利用される, ダウンサンプル特徴図, LeNetで利用される]
+&#10230; [検出された特徴を保持する, 最も一般的に利用される, 特徴マップをダウンサンプリングする, LeNetで利用される]
 
 <br>
 
 
 **22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
 
-&#10230; 完全接続 (FC) - 完全接続層は各入力は全ての神経に接続されているフラット化入力で動く。存在する場合、FC層は通常CNNアーキテクチャの終わりに向かって見られ、クラススコアなどの目的を最適化するため利用される。
+&#10230; 全結合 (FC) - 全結合 (FC) 層は平坦化された入力に対して演算を行います。各入力は全てのニューロンに接続されています。FC層が存在する場合、通常CNNアーキテクチャの末尾に向かって見られ、クラススコアなどの目的を最適化するため利用できます。
 
 <br>
 
@@ -165,14 +165,14 @@
 
 **24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
 
-&#10230; 畳み込み層にはハイパーパラメータの背後にある意味を知ることが重要なフィルタが含まれる。
+&#10230; 畳み込み層にはハイパーパラメータの背後にある意味を知ることが重要なフィルタが含まれています。
 
 <br>
 
 
 **25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
 
-&#10230; フィルタの寸法 - C個別のチャネルを含む入力に適用されるFxFサイズのフィルタは0x0x1サイズの出力特徴図(活性化マップとも呼ばれている)を作り出し、IxIxCサイズの入力に対して畳み込みを実施するFxFxCボリュームです。
+&#10230; フィルタの大きさ - C個のチャネルを含む入力に適用されるF×Fサイズのフィルタの体積はF×F×Cで、それはI×I×Cサイズの入力に対して畳み込みを実行してO×O×1サイズの特徴マップ（活性化マップとも呼ばれる）出力を生成します。
 
 
 <br>
@@ -187,35 +187,35 @@
 
 **27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
 
-&#10230; 注意: FxFサイズのK個別のフィルタを適用すると、0x0xKサイズの出力特徴図を得られる。
+&#10230; 注: F×FサイズのK個のフィルタを適用すると、O×O×Kサイズの特徴マップの出力を得られます。
 
 <br>
 
 
 **28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
 
-&#10230; ストライド - 畳み込みまたはプール操作に対して、ストライドSはそれぞれの操作の後にウィンドウに移動されるピクセル数を表示する。
+&#10230; ストライド - 畳み込みまたはプーリング操作において、ストライドSは各操作の後にウィンドウを移動させるピクセル数を表します。
 
 <br>
 
 
 **29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
 
-&#10230; ゼロパディング - ゼロパディングは入力の境界線の各側にP個別のゼロ追加プロセスを表す。この値は手動で指定されることも、以下に詳述する３つのモードのいずれを通じて自動的に設定されることもできる。
+&#10230; ゼロパディング - ゼロパディングとは入力の各境界に対してP個のゼロを追加するプロセスを意味します。この値は手動で指定することも、以下に詳述する３つのモードのいずれかを使用して自動的に設定することもできます。
 
 <br>
 
 
 **30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
 
-&#10230; [モード, 値, 図, 目的, 有効, 同様, フル]
+&#10230; [モード, 値, 図, 目的, Valid, Same, Full]
 
 <br>
 
 
 **31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
 
-&#10230; [パディングなし, もし寸法が一致しなかったら最後の畳み込みを落とす, 特徴図のサイズが[IS]サイズになるようなパディング, 出力サイズは数学的に便利です, ハーフパディングとも呼ばれる, 入力の限界に端部畳み込みが適用されるような最大パディング, フィルタはエンドツーエンド入力を観察する]
+&#10230; [パディングなし, もし大きさが合わなかったら最後の畳み込みをやめる, 特徴マップのサイズが[IS]になるようなパディング, 出力サイズは数学的に扱いやすい, 「ハーフ」パディングとも呼ばれる, 入力の一番端まで畳み込みが適用されるような最大パディング, フィルタは入力を端から端まで「見る」]
 
 <br>
 
@@ -588,7 +588,7 @@
 
 &#10230;
 
-<br> 活性化 - 与えられた層Lで、活性化はa[l]と表示されて、nH×nw×ncの寸法。
+<br> 活性化 - 与えられた層lで、活性化はa[l]と表示されて、nH×nw×ncの寸法。
 
 
 **85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**

From b27a459f87687076a03d3ee2bf086fbaea9abaf9 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:07:06 +0900
Subject: [PATCH 310/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

agreed updates.

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 917fb2378..b3a35dd86 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -74,7 +74,7 @@
 
 **11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
 
-&#10230;一般的なRNNのアーキテクチャ - RNNとして知られるリカレントニューラルネットワークは、直前隠れ層の状態を利用しながら、過去の(一時点前の)情報を入力情報と取り扱うことを可能にするニューラルネットワークです。一般的なモデルは下記のようになります。
+&#10230;一般的なRNNのアーキテクチャ - RNNとして知られるリカレントニューラルネットワークは、隠れ層の状態を利用して、前の出力を次の入力として取り扱うことを可能にするニューラルネットワークの一種です。一般的なモデルは下記のようになります。
 
 <br>
 

From f269d6b46d8c4d71358f20785fa4d57d5b1fb651 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:07:20 +0900
Subject: [PATCH 311/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index b3a35dd86..08fe8a105 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -81,7 +81,7 @@
 
 **12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
 
-&#10230;それぞれの時点t において活性化関数の状態 a<t> と出力 y<t> は下記のように表現されます。　
+&#10230;それぞれの時点 t において活性化関数の状態 a<t> と出力 y<t> は下記のように表現されます。　
 
 <br>
 

From fe40c0dd6784bb2654e708d4eceb58c56f11144a Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:07:34 +0900
Subject: [PATCH 312/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 08fe8a105..8bbcb2e75 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -95,7 +95,7 @@
 
 **14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
 
-&#10230;Wax,Waa,Wya,baは一時的に共有係数であり、g1,g2は活性化関数です。
+&#10230;Wax,Waa,Wya,baは全ての時点で共有される係数であり、g1,g2は活性化関数です。
 
 <br>
 

From eb8d0ea041243c9af070532204161a3819414a17 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:07:49 +0900
Subject: [PATCH 313/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 8bbcb2e75..304a38202 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -109,7 +109,7 @@
 
 **16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
 
-&#10230;長所、任意の長さの入力を処理できる、入力サイズに比べてモデルサイズは大きくならない、時間軸を考慮した計算パワー、時間軸での重みは共有される
+&#10230;長所、任意の長さの入力を処理できる、入力サイズに応じてモデルサイズが大きくならない、計算は時系列情報を考慮している、重みは全ての時点で共有される
 
 <br>
 

From 47492a55b2ef8bceea89c86987d77e9fa608277f Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:08:01 +0900
Subject: [PATCH 314/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 304a38202..9fc936b90 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -116,7 +116,7 @@
 
 **17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
 
-&#10230;短所、遅い計算時間、長期の時間軸にわたるデータ探索が困難、現在の状態から将来の入力を予測不可能
+&#10230;短所、遅い計算、長い時間軸での情報の利用が困難、現在の状態から将来の入力を予測不可能
 
 <br>
 

From 7c5ed172a4d246081ddc989b379c2a7c578e0041 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:08:11 +0900
Subject: [PATCH 315/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 9fc936b90..9e73e741c 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -123,7 +123,7 @@
 
 **18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
 
-&#10230;RNNの応用 -  RNNモデルは主に自然言語処理と音声認識の分野で使用されます。以下の表に、さまざまなアプリケーションの概要が下記のテーブルに示されます。
+&#10230;RNNの応用 - RNNモデルは主に自然言語処理と音声認識の分野で使用されます。以下の表に、さまざまな応用例がまとめられています。
 
 <br>
 

From 24cc64c6e7fca349b48c941b7d955ea28aa415c4 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:08:22 +0900
Subject: [PATCH 316/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 9e73e741c..2e10daa45 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -130,7 +130,7 @@
 
 **19. [Type of RNN, Illustration, Example]**
 
-&#10230;RNNの種類、イラスト、例
+&#10230;RNNの種類、図、例
 
 <br>
 

From a61cf251757645663149ced5468f4b923be81a32 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:08:35 +0900
Subject: [PATCH 317/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 2e10daa45..00cc93cd3 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -144,7 +144,7 @@
 
 **21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
 
-&#10230;伝統的なニューラルネットワーク、音楽生成、感情分類、固有名詞認識、機械翻訳
+&#10230;伝統的なニューラルネットワーク、音楽生成、感情分類、固有表現認識、機械翻訳
 
 <br>
 

From 2059452293f33ad6a6c43d1ebd49b27cf191058e Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:08:46 +0900
Subject: [PATCH 318/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 00cc93cd3..dceadfe00 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -151,7 +151,7 @@
 
 **22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
 
-&#10230;損失関数 - リカレントニューラルネットワークの場合、すべての時間軸での損失関数Lは、それぞれの時点での損失に基づき、次のように定義されます
+&#10230;損失関数 - リカレントニューラルネットワークの場合、時間軸全体での損失関数Lは、各時点での損失に基づき、次のように定義されます。
 
 <br>
 

From 9788eb7a23204d7bb27946518540561e0d998df0 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:08:57 +0900
Subject: [PATCH 319/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index dceadfe00..41c9fcef2 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -158,7 +158,7 @@
 
 **23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
 
-&#10230;時間軸での誤差逆伝播法 - 誤差逆伝播法(バックプロパゲーション)は各時点で行われます。時間ステップＴにおいて、重み行列Ｗに関する損失Ｌの導関数は以下のように表されます。
+&#10230;時間軸での誤差逆伝播法 - 誤差逆伝播(バックプロパゲーション)が各時点で行われます。時刻 T における、重み行列 W に関する損失 L の導関数は以下のように表されます。
 
 <br>
 

From c9ab3e7f407a3b58c5a566c55cca88ec3e29ae39 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:09:10 +0900
Subject: [PATCH 320/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 41c9fcef2..6c01b2bf8 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -172,7 +172,7 @@
 
 **25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
 
-&#10230;一般的に使用される活性化関数 -  RNNモジュールで使用される最も一般的な活性化関数を以下に説明します。
+&#10230;一般的に使用される活性化関数 - RNNモジュールで使用される最も一般的な活性化関数を以下に説明します。
 
 <br>
 

From b810b826b593df59eee746007a919a28c8e34fc3 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:09:26 +0900
Subject: [PATCH 321/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 6c01b2bf8..ac9c94c0d 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -179,7 +179,7 @@
 
 **26. [Sigmoid, Tanh, RELU]**
 
-&#10230;[ジグモイド、Tanh、RELU]
+&#10230;[シグモイド、Tanh、RELU]
 
 <br>
 

From 4268af3d94fc9ad5f9310cb3e0392e184969c2e8 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:09:42 +0900
Subject: [PATCH 322/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index ac9c94c0d..31e57abfe 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -186,7 +186,7 @@
 
 **27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
 
-&#10230;勾配消失と勾配爆発について - 勾配消失と勾配爆発の現象は、RNNでよく見られます。これらの現象が起こる理由は、多層にわたり勾配が指数関数的に減少/増加する可能性があるため、長期の依存関係を計算するのには向いていないからです。
+&#10230;勾配消失と勾配爆発について - 勾配消失と勾配爆発の現象は、RNNでよく見られます。これらの現象が起こる理由は、掛け算の勾配が層の数に対して指数関数的に減少/増加する可能性があるため、長期の依存関係を捉えるのが難しいからです。
 
 <br>
 

From 5ffb1019abf25193cdd37e0c84592b708efee7c6 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:11:48 +0900
Subject: [PATCH 323/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 31e57abfe..25240e9f8 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -193,7 +193,7 @@
 
 **28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
 
-&#10230;勾配クリッピング - 誤差逆伝播法を実行するときに時折発生する勾配爆発問題に対処するために使用される手法です。勾配の最大値(閾値)を定義することで、この現象が抑制されます。
+&#10230;勾配クリッピング - 誤差逆伝播法を実行するときに時折発生する勾配爆発問題に対処するために使用される手法です。勾配の上限値を定義することで、実際にこの現象が抑制されます。
 
 <br>
 

From 9957de3bfe804413716564e0f84837551953d7f0 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:12:03 +0900
Subject: [PATCH 324/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 25240e9f8..2c69c0f66 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -207,7 +207,7 @@
 
 **30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
 
-&#10230;ゲートの種類 - 勾配消失問題を解決するために、特定のゲートがいくつかのRNNで使用され、通常明確に定義された目的を持っています。それらは通常Γと記され、以下と同じです。
+&#10230;ゲートの種類 - 勾配消失問題を解決するために、特定のゲートがいくつかのRNNで使用され、通常明確に定義された目的を持っています。それらは通常Γと記され、以下のように定義されます。
 
 <br>
 

From 5f12f41b3ddfbf00e462db774238ca3eeaa67618 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:12:39 +0900
Subject: [PATCH 325/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 2c69c0f66..8c73357e6 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -221,7 +221,7 @@
 
 **32. [Type of gate, Role, Used in]**
 
-&#10230;[ゲートの種類、役割、で使用]
+&#10230;[ゲートの種類、役割、下記で使用される]
 
 <br>
 

From 68269c4164a4095f0b0f93ec10dc0816844b194f Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:12:50 +0900
Subject: [PATCH 326/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 8c73357e6..49a2e6cc7 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -235,7 +235,7 @@
 
 **34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
 
-&#10230;[過去情報はどのくらい重要ですか？ 前の情報を削除しますか？、セルを消去しますか？　しませんか？　セルを表示するコストはどのくらいですか？]
+&#10230;[過去情報はどのくらい重要ですか？、前の情報を削除しますか？、セルを消去しますか？しませんか？、セルをどのくらい見せますか？]
 
 <br>
 

From 40a0590b7278c750a9f9ffe3100afdd2dcd21417 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:13:05 +0900
Subject: [PATCH 327/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 49a2e6cc7..b484aaf55 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -249,7 +249,7 @@
 
 **36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
 
-&#10230;GRU/LSTM -  ゲートリカレントユニット（GRU）およびロングショートタームメモリユニット（LSTM）は、従来のRNNで問題になった勾配消失問題を解決します。LSTMはGRUの一般化名称です。以下は、各アーキテクチャの特性式をまとめた表です。
+&#10230;GRU/LSTM - ゲート付きリカレントユニット（GRU）およびロングショートタームメモリユニット（LSTM）は、従来のRNNが直面した勾配消失問題を解決しようとします。LSTMはGRUを一般化したものです。以下は、各アーキテクチャを特徴づける式をまとめた表です。
 
 <br>
 

From 0f7f5c0034c13fcab3f5af7cf33a2e6c7a620af1 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:13:16 +0900
Subject: [PATCH 328/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index b484aaf55..d7971577e 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -256,7 +256,7 @@
 
 **37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
 
-&#10230;特性評価、ゲートリカレントユニット（GRU）、ロングショートタームメモリ（LSTM）、依存関係
+&#10230;特徴づけ、ゲート付きリカレントユニット（GRU）、ロングショートタームメモリ（LSTM）、依存関係
 
 <br>
 

From 92f0a2ec16b0ccbbb2181e20def31bc3c93ce8d9 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:13:28 +0900
Subject: [PATCH 329/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index d7971577e..ffa27ef52 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -270,7 +270,7 @@
 
 **39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
 
-&#10230;RNNの変化版 - 以下の表は、一般的に使用されている他のRNNアーキテクチャをまとめたものです。
+&#10230;RNNの変種 - 以下の表は、一般的に使用されている他のRNNアーキテクチャをまとめたものです。
 
 <br>
 

From a43f695ad8dde2f44e71d76e4612d979fb583ad3 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:13:38 +0900
Subject: [PATCH 330/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index ffa27ef52..917b73ad6 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -312,7 +312,7 @@
 
 **45. [1-hot representation, Word embedding]**
 
-&#10230;[1-ホット表現、Wordの埋め込み]
+&#10230;[1-hot表現、単語埋め込み]
 
 <br>
 

From a3786058bcd73e393f9f5e2b083ad0a747dc8a2d Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:13:47 +0900
Subject: [PATCH 331/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 917b73ad6..c1179cbf0 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -319,7 +319,7 @@
 
 **46. [teddy bear, book, soft]**
 
-&#10230;テディベア、本、ソフト
+&#10230;テディベア、本、柔らかい
 
 <br>
 

From f8b46487e4ef7d9d10c80bcc97793fdd21a2475d Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:13:58 +0900
Subject: [PATCH 332/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index c1179cbf0..fb60e5208 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -326,7 +326,7 @@
 
 **47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
 
-&#10230;[表記 ow、ナイーブベイズアプローチ、類似性情報なし、表記 ew、単語の類似性を考慮に入れる]
+&#10230;[owと表記される、素朴なアプローチ、類似性情報なし、ewと表記される、単語の類似性を考慮に入れる]
 
 <br>
 

From 04d2d4b51e9e9630f247c8dc8800ac5bbd8a684e Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:14:10 +0900
Subject: [PATCH 333/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index fb60e5208..2b1a62af6 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -333,7 +333,7 @@
 
 **48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
 
-&#10230;埋め込み行列 - 与えられた単語wに対して、埋め込み行列Eは、1-hot表現owを埋め込み行列ewに写像します。式は以下のようになります。
+&#10230;埋め込み行列 - 与えられた単語wに対して、埋め込み行列Eは、以下のように1-hot表現owを埋め込み行列ewに写像します。
 
 <br>
 

From 13610e743b2beec169a6d5f955264fe1afdedf1b Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:14:21 +0900
Subject: [PATCH 334/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 2b1a62af6..4d5954a8e 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -361,7 +361,7 @@
 
 **52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
 
-&#10230;[かわいいテディベアが読んでいる、テディベア、ソフト、ペルシャ詩、芸術]
+&#10230;[かわいいテディベアが読んでいる、テディベア、柔らかい、ペルシャ詩、芸術]
 
 <br>
 

From ca181bc4ff4e81279ea252c134953c4b03421328 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:14:30 +0900
Subject: [PATCH 335/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 4d5954a8e..29d1a795e 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -368,7 +368,7 @@
 
 **53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
 
-&#10230;[代理タスク上のネットワークの訓練、高水準表現の抽出、単語埋め込み重みの計算]
+&#10230;[代理タスクでのネットワークの訓練、高水準表現の抽出、単語埋め込み重みの計算]
 
 <br>
 

From 15794b157a74c613cd59ca6aea953a80d2b3949a Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:14:40 +0900
Subject: [PATCH 336/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 29d1a795e..f97ab696c 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -354,7 +354,7 @@
 
 **51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
 
-&#10230;Word2vec -  Word2vecは、ある単語が他の周辺単語から導きだされる可能性を推定することで、単語の埋め込みの重みを学習することを目的としたフレームワークです。この一般的なモデルは、スキップグラム、ネガティブサンプリング、およびCBOWがあります。
+&#10230;Word2vec - Word2vecは、ある単語が他の単語の周辺にある可能性を推定することで、単語の埋め込みの重みを学習することを目的としたフレームワークです。人気のあるモデルは、スキップグラム、ネガティブサンプリング、およびCBOWです。
 
 <br>
 

From 1a6cdfdae912044fc0e94156c47e9dfa847ee3a4 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:14:50 +0900
Subject: [PATCH 337/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index f97ab696c..d17cb4154 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -375,7 +375,7 @@
 
 **54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
 
-&#10230;スキップグラム - スキップグラムword2vecモデルは、あるコンテキスト単語を与え、ターゲット単語t の出現確率を計算することで単語の埋め込みを学習する教師付き学習タスクです。時点tと関係するパラメーターθtと表記すると、確率P(t|c) は下記のように表現されます。
+&#10230;スキップグラム - スキップグラムword2vecモデルは、あるターゲット単語tがコンテキスト単語cと一緒に出現する確率を評価することで単語の埋め込みを学習する教師付き学習タスクです。tに関するパラメータをθtと表記すると、その確率P(t|c) は下記の式で与えられます。
 
 <br>
 

From c86a870c86b7b6dc340c99bd504812b1fe4fc28e Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:15:00 +0900
Subject: [PATCH 338/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index d17cb4154..65670dce3 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -382,7 +382,7 @@
 
 **55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
 
-&#10230;注意：softmax部分の分母全体の語彙全体を合計すると、モデルの計算コストは高くなります。 CBOWは、ある単語を予測するため周辺単語を使用する別のタイプのword2vecモデルです。
+&#10230;注：softmax部分の分母の語彙全体を合計するため、このモデルの計算コストは高くなります。 CBOWは、ある単語を予測するため周辺単語を使用する別のタイプのword2vecモデルです。
 
 <br>
 

From f97610763214f3761cc8e9052ea62b831059bd79 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:15:12 +0900
Subject: [PATCH 339/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 65670dce3..89e9ab9a7 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -389,7 +389,7 @@
 
 **56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
 
-&#10230;ネガティブサンプリング -  k個のネガティブな例と1つのポジティブな例で訓練されたモデルで、ある与えられた文脈とターゲット単語の出現確率を評価するロジスティック回帰を使用するバイナリ分類器です。単語cとターゲット語tが与えられると、予測は次のように表現されます。
+&#10230;ネガティブサンプリング - ロジスティック回帰を使用したバイナリ分類器のセットで、特定の文脈とあるターゲット単語が同時に出現する確率を評価することを目的としています。モデルはk個のネガティブな例と1つのポジティブな例のセットで訓練されます。コンテキスト単語cとターゲット単語tが与えられると、予測は次のように表現されます。
 
 <br>
 

From 575dc6b0344393ce7ef2bb7e6571d00e01ccfb2c Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:20:00 +0900
Subject: [PATCH 340/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 89e9ab9a7..3d13a2934 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -291,7 +291,7 @@
 
 **42. In this section, we note V the vocabulary and |V| its size.**
 
-&#10230;この節では、Vを語彙、そして|V|を語彙のサイズとして定義します。
+&#10230;この節では、Vは語彙、そして|V|は語彙のサイズを表します。
 
 <br>
 

From 3346457e1ca381c97f0e97c939ce2cb9f0161682 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:20:11 +0900
Subject: [PATCH 341/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 3d13a2934..321d1b9fd 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -396,7 +396,7 @@
 
 **57. Remark: this method is less computationally expensive than the skip-gram model.**
 
-&#10230;注意：この計算コストは、スキップグラムモデルよりも少ないです。
+&#10230;注：この方法の計算コストは、スキップグラムモデルよりも少ないです。
 
 <br>
 

From c1621ab9d8dc833ad0ecac66e9fdfe38b35a20a7 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:20:22 +0900
Subject: [PATCH 342/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 321d1b9fd..0b5ddd96c 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -446,7 +446,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
 
-&#10230; t-SNE − t-SNE（ｔ−分布確率的近傍埋め込み）は、高次元埋め込みから低次元埋め込み空間への次元削減を目的とした技法です。実際には、2次元空間で単語ベクトルを視覚化するために使用されます。
+&#10230; t-SNE − t-SNE（ｔ−分布型確率的近傍埋め込み）は、高次元埋め込みから低次元埋め込み空間への次元削減を目的とした手法です。実際には、2次元空間で単語ベクトルを視覚化するために使用されます。
 
 <br>
 

From 8e9cca00cf4b726234da1fffea51ae4d3aea9e96 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:20:32 +0900
Subject: [PATCH 343/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 0b5ddd96c..751dd0fa4 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -453,7 +453,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
 
-&#10230;[文学、芸術、本、文化、詩、読書、知識、娯楽、愛らしい、幼年期、親切、テディベア、ソフト、抱擁、かわいい、愛らしい] 
+&#10230;[文学、芸術、本、文化、詩、読書、知識、面白い、愛らしい、幼年期、親切、テディベア、柔らかい、抱擁、かわいい、愛らしい] 
 
 <br>
 

From 5b5f763d665ee3ef56ecf44e2673a66c146df711 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:20:45 +0900
Subject: [PATCH 344/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 751dd0fa4..fc3bbf5fc 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -474,7 +474,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
 
-&#10230;n-gramモデル - このモデルは、トレーニングデータでの出現数を数えることによって、コーパス表現の出現確率を定量化することを目的とした単純なアプローチです。
+&#10230;n-gramモデル - このモデルは、トレーニングデータでの出現数を数えることによって、ある表現がコーパスに出現する確率を定量化することを目的とした単純なアプローチです。
 
 <br>
 

From 86d42d03b81abbb3471f6e2666b1b26a0e6a0b53 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:20:54 +0900
Subject: [PATCH 345/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index fc3bbf5fc..08f761a72 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -488,7 +488,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **69. Remark: PP is commonly used in t-SNE.**
 
-&#10230;備考：PPはt-SNEで一般的に使用されています。
+&#10230;注：PPはt-SNEで一般的に使用されています。
 
 <br>
 

From fa9ff54cd283a84fdfddfb3dd9c3994582fb9926 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:21:06 +0900
Subject: [PATCH 346/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 08f761a72..02184e1f4 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -403,7 +403,7 @@
 
 **57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
 
-&#10230;GloVe  -  GloVeモデルは、単語表現のためのグローバルベクトルの略で、共起行列Xを使用する単語の埋め込み手法です。ここで、各Xi、jは、ターゲットiがコンテキストjで発生した回数を表します。そのコスト関数Jは以下の通りです。
+&#10230;GloVe - GloVeモデルは、単語表現のためのグローバルベクトルの略で、共起行列Xを使用する単語の埋め込み手法です。ここで、各Xi,jは、ターゲットiがコンテキストjで発生した回数を表します。そのコスト関数Jは以下の通りです。
 
 <br>
 

From bbcd85e8a1eb1daf20db2a6a262c915422fd4a92 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:21:20 +0900
Subject: [PATCH 347/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 02184e1f4..66f0ae43d 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -411,7 +411,7 @@
 **58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
 Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
 
-&#10230;ここで、fはXi、j =0⟹f（Xi、j）= 0となるような重み関数です。このモデルでeとθが果たす対称性を考えると、e（final）wが最後の単語の埋め込みになります。
+&#10230;ここで、fはXi,j =0⟹f（Xi,j）= 0となるような重み関数です。このモデルでeとθが果たす対称性を考えると、最後の単語の埋め込みe（final）wは下記ののようになります。
 
 <br>
 

From e7247e5d6fd1f9a1fe19104ac210301f0ae6a4ce Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:21:33 +0900
Subject: [PATCH 348/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 66f0ae43d..6b27c5bdd 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -418,7 +418,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
 
-&#10230;注意：学習された単語の埋め込みの個々の要素は、必ずしも関係性がある必要はないです。
+&#10230;注：学習された単語の埋め込みの個々の要素は、必ずしも解釈可能ではありません。
 
 <br>
 

From 2b2a86755ec2deb4d3744731d2a4c53be5fd727f Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:21:44 +0900
Subject: [PATCH 349/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 6b27c5bdd..4ad7fd73e 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -439,7 +439,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **62. Remark: θ is the angle between words w1 and w2.**
 
-&#10230;注意：θはワードw1とw2の間の角度です。
+&#10230;注：θは単語w1とw2の間の角度です。
 
 <br>
 

From 0bbcdc087c63f6489928b1e9439746e383d245ef Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:22:04 +0900
Subject: [PATCH 350/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 4ad7fd73e..d23cd22a3 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -481,7 +481,8 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
 
-&#10230;パープレキシティ - 言語モデルは、一般的にPPとも呼ばれるパープレキシティメトリックを使用して評価されます。これは、ワード数Tにより正規化されたデータセットの確率の逆数と解釈できます。パープレキシティの数値はより低いものがより選択しやすい単語として評価されます(訳注:10であれば10個の中から1つ選択される、10000であれば10000個の中から1つ)、評価式は下記のようになります。
+&#10230;パープレキシティ - 言語モデルは一般的に、PPとも呼ばれるパープレキシティメトリックを使用して評価されます。これは、単語数Tにより正規化されたデータセットの逆確率と解釈できます。パープレキシティは低いほど良く、次のように定義されます。
+(訳注:パープレキシティの数値はより低いものがより選択しやすい単語として評価されます。10であれば10個の中から1つ、10000であれば10000個の中から1つ選択されます。)
 
 <br>
 

From 95c88e6b668a6b86947910304f7034bf72f97fd5 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:22:29 +0900
Subject: [PATCH 351/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index d23cd22a3..560fc8be1 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -510,7 +510,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
 
-&#10230;ビーム検索 - 入力xが与えられたとき最も可能性の高い文yを見つける、機械翻訳と音声認識で使用されるヒューリスティック探索アルゴリズムです。
+&#10230;ビーム検索 - 入力xが与えられたとき最も可能性の高い文yを見つけるために、機械翻訳と音声認識で使用されるヒューリスティック探索アルゴリズムです。
 
 <br>
 

From cd21f3fdb7b3dc2b1b0e7720e7e2e81b1b238705 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:22:51 +0900
Subject: [PATCH 352/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 560fc8be1..6f79c5d8d 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -517,7 +517,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
 
-&#10230;［ステップ１：単語y<1>と高い確率を持つ上位Ｂ個の組み合わせを見つける。ステップ２：条件付き確率y<k>|x,y<1>,...,y<k−1>を計算する。ステップ３：上位Ｂ個の組み合わせx,y<1>,...,y<k>を保持しながら、あるストップワードでプロセスを終了する]
+&#10230;［ステップ１：上位Ｂ個の高い確率を持つ単語y<1>を見つける。ステップ２：条件付き確率y<k>|x,y<1>,...,y<k−1>を計算する。ステップ３：上位Ｂ個の組み合わせx,y<1>,...,y<k>を保持する。あるストップワードでプロセスを終了する]
 
 <br>
 

From 1fe9a64defd09473b2ed8bf9d24fa727cef9b96c Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:22:59 +0900
Subject: [PATCH 353/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 6f79c5d8d..b41e241b0 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -524,7 +524,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
 
-&#10230;注意：ビーム幅が1に設定されている場合、これは単純な貪欲法と同等の結果を導きます。
+&#10230;注意：ビーム幅が1に設定されている場合、これは単純な貪欲法と同等です。
 
 <br>
 

From dc9776f3c70dedf31713c3c54147aa567432db63 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:23:10 +0900
Subject: [PATCH 354/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index b41e241b0..e2f911f7a 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -531,7 +531,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
 
-&#10230;ビーム幅 - ビーム幅Bはビームサーチのパラメータです。 Bの値を大きくするとより良い結果が得られますが、探索パフォーマンスは低下し、メモリ使用量が増加します。 Bの値が小さいと結果が悪くなりますが、計算量は少なくなります。 Bの標準推奨値は10前後です。
+&#10230;ビーム幅 - ビーム幅Bはビーム検索のパラメータです。 Bの値を大きくするとより良い結果が得られますが、探索パフォーマンスは低下し、メモリ使用量が増加します。 Bの値が小さいと結果が悪くなりますが、計算量は少なくなります。 Bの標準値は10前後です。
 
 <br>
 

From 62e7cfe39b189039441fdee161e99b0f851e664b Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:23:19 +0900
Subject: [PATCH 355/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index e2f911f7a..98504ebdb 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -538,7 +538,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
 
-&#10230;文章の長さの正規化 - 数値の安定性を向上させるために、ビームサーチは通常次のような正規化、特に対数尤度正規化された探索対象物に対して適用されます。
+&#10230;文章の長さの正規化 - 数値の安定性を向上させるために、ビーム検索は通常次のように正規化（対数尤度正規化）された目的関数に対して適用されます。
 
 <br>
 

From 9ec1fefcc3cb0b028bae7bb6568e16b35bd405dc Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:23:31 +0900
Subject: [PATCH 356/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 98504ebdb..0b7863e88 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -643,7 +643,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **91. The Deep Learning cheatsheets are now available in [target language].**
 
-&#10230;ディープラーニングのチートシートが[対象言語]で利用可能になりました。
+&#10230;ディープラーニングのチートシートが日本語で利用可能になりました。
 
 <br>
 

From 26ecac1e850ccb8e5fdb61284c90d6a9006fc67e Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:23:38 +0900
Subject: [PATCH 357/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 0b7863e88..07bca74ef 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -649,7 +649,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **92. Original authors**
 
-&#10230;原作者
+&#10230;原著者
 
 <br>
 

From c81a91341b138e1c0aa5b82730183f70bd8c717d Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:23:49 +0900
Subject: [PATCH 358/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 07bca74ef..30627ff60 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -661,7 +661,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **94. Reviewed by X, Y and Z**
 
-&#10230;X,YそしてZにより校正されました。
+&#10230;X・Y・Z 校正
 
 <br>
 

From bf4700a405f58c8da20d8a38d617d4c42aa138fa Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:23:57 +0900
Subject: [PATCH 359/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <fpnz.tams@gmail.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 30627ff60..6c0304504 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -655,7 +655,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **93. Translated by X, Y and Z**
 
-&#10230;X,YそしてZにより翻訳されました。
+&#10230;X・Y・Z 訳
 
 <br>
 

From 142aaa9e3d7e654543852e4ce88a63aae2da50a6 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 09:24:08 +0900
Subject: [PATCH 360/531] Update ja/recurrent-neural-networks.md
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <8402782+yoshiyukinakai@users.noreply.github.com>
---
 ja/recurrent-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index 6c0304504..dbf64718f 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -673,6 +673,6 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **96. By X and Y**
 
-&#10230;XそしてYによる。
+&#10230;X・Y 著
 
 <br>

From f59bf0144f894328b9ad910c7d6b2a0af1478f81 Mon Sep 17 00:00:00 2001
From: H Hamano <scrambleegg7@gmail.com>
Date: Wed, 28 Aug 2019 12:13:31 +0900
Subject: [PATCH 361/531] Apply suggestions from code review
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Co-Authored-By: Yoshiyuki Nakai 中井喜之 <8402782+yoshiyukinakai@users.noreply.github.com>
---
 ja/recurrent-neural-networks.md | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/ja/recurrent-neural-networks.md b/ja/recurrent-neural-networks.md
index dbf64718f..b236d8727 100644
--- a/ja/recurrent-neural-networks.md
+++ b/ja/recurrent-neural-networks.md
@@ -200,7 +200,7 @@
 
 **29. clipped**
 
-&#10230;クリップド
+&#10230;clipped
 
 <br>
 
@@ -545,14 +545,14 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
 
-&#10230;注意：パラメーターαは緩衝パラメーターと見なされ、その値は通常0.5から1の間です。
+&#10230;注：パラメータαは緩衝パラメータと見なされ、その値は通常0.5から1の間です。
 
 <br>
 
 
 **78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
 
-&#10230;エラー分析 - 予測ˆyの翻訳が誤りである場合、その文の後に続く誤り分析を実行することで訳文y*がなぜ不正解であるかを理解することが可能です。
+&#10230;エラー分析 - 予測されたˆyの翻訳が良くない場合、以下のようなエラー分析を実行することで、なぜy∗のような良い翻訳を得られなかったのか考えることが可能です。
 
 <br>
 
@@ -566,28 +566,28 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
 
-&#10230;[ビーム検索の誤り、RNNの誤り、ビーム幅の拡大、さまざまなアーキテクチャを試す、正規化、データをさらに取得] 
+&#10230;[ビーム検索の誤り、RNNの誤り、ビーム幅の拡大、さまざまなアーキテクチャを試す、正則化、データをさらに取得] 
 
 <br>
 
 
 **81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
 
-&#10230;Bleuスコア - バイリンガル正確性の代替評価（bleu）スコアは、n-gramの精度に基づき類似性スコアを計算することで、機械翻訳がどれほど優れているかを定量化します。以下のように定義されています。
+&#10230;Bleuスコア - Bleu（Bilingual evaluation understudy）スコアは、n-gramの精度に基づき類似性スコアを計算することで、機械翻訳がどれほど優れているかを定量化します。以下のように定義されています。
 
 <br>
 
 
 **82. where pn is the bleu score on n-gram only defined as follows:**
 
-&#10230;ここで、pnは、唯一定義されたn-gramでのbleuスコアです。定義は下記のようになります。
+&#10230;ここで、pnはn-gramでのbleuスコアで下記のようにだけ定義されています。
 
 <br>
 
 
 **83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
 
-&#10230;注：人為的に水増しされたブルースコアを防ぐために、短い翻訳評価には簡潔なペナルティが適用される場合があります。
+&#10230;注：人為的に水増しされたブルースコアを防ぐために、短い翻訳評価には簡潔さへのペナルティが適用される場合があります。
 
 <br>
 
@@ -601,21 +601,21 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
 
-&#10230;アテンションモデル - このモデルはRNNが重要であると考えられる特定の入力部分に注目することで、モデルの実際の性能結果を向上させます。時点tにおける出力y<t>が、活性化関数a<t'>およびコンテキストc <t>に注目するとき、α<t、t'>はアテンション量と定義されます。式は次のようになります。
+&#10230;アテンションモデル - このモデルを使用するとRNNは重要であると考えられる入力の特定部分に注目することができ、得られるモデルの性能が実際に向上します。時刻tにおいて、出力y<t>が活性化関数a<t'>とコンテキストc<t>とに払うべき注意量をα<t,t′>と表記すると次のようになります。
 
 <br>
 
 
 **86. with**
 
-&#10230;ウェイト
+&#10230;および
 
 <br>
 
 
 **87. Remark: the attention scores are commonly used in image captioning and machine translation.**
 
-&#10230;注：アテンションスコアは、一般的に画像のキャプション作成および機械翻訳で使用されています。*
+&#10230;注：アテンションスコアは、一般的に画像のキャプション作成および機械翻訳で使用されています。
 
 <br>
 
@@ -629,14 +629,14 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **89. Attention weight ― The amount of attention that the output y<t> should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
 
-&#10230;アテンションの重み - 出力y<t>が活性化関数a<t'>で表現されるアテンションのウェイト量α<t,t>は、次のように計算されます。
+&#10230;アテンションの重み - 出力y<t>が活性化関数a<t'>に払うべき注意量α<t,t′>は次のように計算されます。
 
 <br>
 
 
 **90. Remark: computation complexity is quadratic with respect to Tx.**
 
-&#10230;注意：この計算の複雑さはTxの２次関数です。
+&#10230;注：この計算の複雑さはTxに関して2次です。
 
 <br>
 

From 94d7297f8eda7507f58be95fa7a8d1f45cd249c5 Mon Sep 17 00:00:00 2001
From: Robert Altena <Rob@ra-ai.com>
Date: Fri, 30 Aug 2019 17:10:13 +0900
Subject: [PATCH 362/531] Apply suggestions from code review

Co-Authored-By: Kamuela Lau <33002774+Kamulau@users.noreply.github.com>
---
 ja/refresher-linear-algebra.md | 82 +++++++++++++++++-----------------
 1 file changed, 41 insertions(+), 41 deletions(-)

diff --git a/ja/refresher-linear-algebra.md b/ja/refresher-linear-algebra.md
index e72a56fb4..4dcf78e4b 100644
--- a/ja/refresher-linear-algebra.md
+++ b/ja/refresher-linear-algebra.md
@@ -1,13 +1,13 @@
 **1. Linear Algebra and Calculus refresher**
 
 &#10230;
-線形代数と微積分回顧
+線形代数と微積分の復習
 <br>
 
 **2. General notations**
 
 &#10230;
-一般的表記
+一般表記
 <br>
 
 **3. Definitions**
@@ -19,49 +19,49 @@
 **4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
 
 &#10230;
-ベクター - x∈Rnがn個のエントリを持つベクトルです。ここで、xi∈Rはi番目のエントリです。
+ベクトル - x∈Rn は n個の要素を持つベクトルを表し、xi∈Rはi番目の要素を表します。
 <br>
 
 **5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
 
 &#10230;
-行列 - A∈Rm×nがm行n列の行列です。 Ai、j∈Rは、i行j列目にあるエントリです。
+行列 - m行n列の行列をA∈Rm×nと表記し、Ai、j∈Rは i行目のj列目の要素を指します。
 <br>
 
 **6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
 
 &#10230;
-備考：上で定義されたベクトルxはn×1行列と見なすことができます。 それは列ベクトルと呼ばれます。
+備考：上記で定義されたベクトルxはn×1の行列と見なすことができ、列ベクトルと呼ばれます。
 <br>
 
 **7. Main matrices**
 
 &#10230;
-主行列
+主な行列の種類
 <br>
 
 **8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
 
 &#10230;
-単位行列 - 単位行列I∈Rn×nは、対角に1、それ以外ではゼロの正方行列です。
+単位行列 - 単位行列I∈Rn×nは、対角成分に 1 が並び、他は全て 0 となる正方行列です。
 <br>
 
 **9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
 
 &#10230;
-備考：すべての行列A∈Rn×nに対して、A×I = I×A = Aとなる。
+備考：すべての行列A∈Rn×nに対して、A×I = I×A = Aとなります。
 <br>
 
 **10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
 
 &#10230;
-対角行列 - 対角行列D∈Rn×nは、対角にゼロ以外の値があり、それ以外はゼロである正方行列です。
+対角行列 - 対角行列D∈Rn×nは、対角成分の値がゼロ以外で、それ以外はゼロである正方行列です。
 <br>
 
 **11. Remark: we also note D as diag(d1,...,dn).**
 
 &#10230;
-備考：Dをdiag（d 1、...、d n）と呼ばれます。
+備考：Dをdiag（d 1、...、d n）とも表記します。
 <br>
 
 **12. Matrix operations**
@@ -79,43 +79,43 @@
 **14. Vector-vector ― There are two types of vector-vector products:**
 
 &#10230;
-ベクトル-ベクトル - ベクトル-ベクトル積には2つのタイプがあります。
+ベクトル-ベクトル - ベクトル-ベクトル積には2種類あります。
 <br>
 
 **15. inner product: for x,y∈Rn, we have:**
 
 &#10230;
-内積: x、y∈Rnについては、
+内積: x、y∈Rnに対して、内積の定義は下記の通りです:
 <br>
 
 **16. outer product: for x∈Rm,y∈Rn, we have:**
 
 &#10230;
-外積: x∈Rm,y∈Rnについては、
+外積: x∈Rm,y∈Rnに対して、外積の定義は下記の通りです:
 <br>
 
 **17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
 
 &#10230;
-行列-ベクトル - 行列A∈Rm×nとベクトルx∈Rnの積はサイズRnのベクトルで、次のようになります。
+行列-ベクトル - 行列A∈Rm×nとベクトルx∈Rnの積は以下の条件を満たすようなサイズRnのベクトルです。
 <br>
 
 **18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
 
 &#10230;
-ここで、aTr、iはAのベクトル行、ac、jはAのベクトル列です。 xiはxのエントリです。
+上記 aTr、iはAの行ベクトルで、ac、jはAの列ベクトルです。 xiはxの要素です。
 <br>
 
 **19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
 
 &#10230;
-行列-行列 - 行列A∈Rm×nとB∈Rn×pの積は次のようにサイズRm×pの行列です。 (There is a typo in the original: Rn×p)
+行列-行列 - 行列A∈Rm×nとB∈Rn×pの積は以下の条件を満たすようなサイズRm×pの行列です。 (There is a typo in the original: Rn×p)
 <br>
 
 **20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
 
 &#10230;
-aTr、i、bTr、iはベクトル行。　ac、j、bc、jはそれぞれAとBのベクトル列です。
+aTr,i、bTr,iはAとBの行ベクトルで　ac,j、bc,jはAとBの列ベクトルです。
 <br>
 
 **21. Other operations**
@@ -127,7 +127,7 @@ aTr、i、bTr、iはベクトル行。　ac、j、bc、jはそれぞれAとBの
 **22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
 
 &#10230;
-転置 ― A∈Rm×nの転置行列はATと示される。　Aの行列要素が交換されます。
+転置 ― A∈Rm×nの転置行列はATと表記し、Aの行列要素が交換した行列です。
 <br>
 
 **23. Remark: for matrices A,B, we have (AB)T=BTAT**
@@ -139,19 +139,19 @@ aTr、i、bTr、iはベクトル行。　ac、j、bc、jはそれぞれAとBの
 **24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
 
 &#10230;
-逆行列 ― 可逆正方行列Ａの逆行列はＡ − １と表される。 以下を満たす唯一の行列です。
+逆行列 ― 可逆正方行列Ａの逆行列はＡ − １と表記し、 以下の条件を満たす唯一の行列です。
 <br>
 
 **25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
 
 &#10230;
-備考： すべての正方行列が可逆的なわけではありません。　行列A、Bについては、(AB)−1=B−1A−1
+備考： すべての正方行列が可逆とは限りません。　行列A、Bについては、(AB)−1=B−1A−1
 <br>
 
 **26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
 
 &#10230;
-跡 ― 正方行列Aの跡は、その対角要素の合計です。　tr(A)と表される。
+跡 - 正方行列Aの跡は、tr(A)と表記し、その対角成分の要素の和です。
 <br>
 
 **27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
@@ -163,7 +163,7 @@ aTr、i、bTr、iはベクトル行。　ac、j、bc、jはそれぞれAとBの
 **28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
 
 &#10230;
-行列式 ― 行列式は|A| または det(A) と表される。 正方行列　A∈Rn×n　の行列式はAijによって再帰的に表現されます。
+行列式 ― 正方行列A∈Rn×nの行列式は|A| または det(A) と表記し、以下のように i番目の行とj番目の列を抜いたA, Aijによって再帰的に表現されます。
  それはi番目の行とj番目の列のない行列Aです。 次のように：
 <br>
 
@@ -188,7 +188,7 @@ aTr、i、bTr、iはベクトル行。　ac、j、bc、jはそれぞれAとBの
 **32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
 
 &#10230;
-対称分解 ― 行列Aは次のように対称および反対称部分で表現できます。
+対称分解 ― 行列Aは次のように対称および反対称的な部分で表現できます。
 <br>
 
 **33. [Symmetric, Antisymmetric]**
@@ -200,26 +200,26 @@ aTr、i、bTr、iはベクトル行。　ac、j、bc、jはそれぞれAとBの
 **34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
 
 &#10230;
-ノルムは関数N:V⟶[0,+∞[　Vはベクトル空間、　すべてのx、y∈Vについて：
+ノルムは関数N:V⟶[0,+∞[　Vはすべてのx、y∈Vに対して、以下の条件を満たすようなベクトル空間です。
 ]]
 <br>
 
 **35. N(ax)=|a|N(x) for a scalar**
 
 &#10230;
-N(ax)=|a|N(x) スカラー用
+スカラー a に対して N(ax)=|a|N(x) 
 <br>
 
 **36. if N(x)=0, then x=0**
 
 &#10230;
-N（x）= 0の場合、x = 0
+N（x）= 0ならば x = 0
 <br>
 
 **37. For x∈V, the most commonly used norms are summed up in the table below:**
 
 &#10230;
-x∈V、一般的に使用されるノルムは、以下の表にまとめられています。
+x∈Vに対して、最も多用されているノルムは、以下の表にまとめられています。
 <br>
 
 **38. [Norm, Notation, Definition, Use case]**
@@ -231,7 +231,7 @@ x∈V、一般的に使用されるノルムは、以下の表にまとめられ
 **39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
 
 &#10230;
-線形依存 ― ベクトルの集合は、その集合内のベクトルのうちの1つが他のベクトルの線形結合として定義できる場合、線形従属であると言われます。
+線形従属 ― ベクトルの集合に対して、少なくともどれか一つのベクトルを他のベクトルの線形結合として定義できる場合、その集合が線形従属であるといいます。
 <br>
 
 **40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
@@ -243,31 +243,31 @@ x∈V、一般的に使用されるノルムは、以下の表にまとめられ
 **41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
 
 &#10230;
-行列の階数　―　行列Aの階数をrank（A）と表記します。　それはその列によって生成されたベクトル空間の次元です。これは、Aの線形独立列の最大数に相当します。
+行列の階数　―　行列Aの階数は rank（A）と表記し、列空間の次元を表します。これは、Aの線形独立の列の最大数に相当します。
 <br>
 
 **42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
 
 &#10230;
-半正定値行列 ― 以下の式が成り立つとき、行列 A∈Rn×n、　A⪰0 は半正定値(PSD)
+半正定値行列 ― 行列 A, A∈Rn×nに対して、以下の式が成り立つならば、 Aを半正定値(PSD)といい、A⪰0と表記します。
 <br>
 
 **43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
 
 &#10230;
-備考：　同様に、行列Ａは、正定値行列であると言われ、A≻0、それが全ての非ゼロベクトルを満足するＰＳＤ行列である場合と表記される。
+備考：　同様に、全ての非ゼロベクトルx, xTAx>0に対して条件を満たすような行列Aは正定値行列といい、A≻0と表記します。
 <br>
 
 **44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
 &#10230;
-固有値、固有ベクトル　―　与えられた行列A∈Rn×n。以下の式が成り立つとき、もしベクトルz∈Rn∖{0}、固有ベクトルと呼ばれる、が存在する場合ならばλはAの固有値であると言われる：
+固有値、固有ベクトル　―　行列 A, A∈Rn×nに対して、以下の条件を満たすようなベクトルz, z∈Rn∖{0}が存在するならば、λは固有値といい、z は固有ベクトルといいます。
 <br>
 
 **45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
 &#10230;
-スペクトル定理 ― A∈Rn×nとする。　Aが対称ならば、Aは実直交行列U∈Rn×nによって対角化可能です。Λ=diag(λ1,...,λn)と書くと、次のようになります。
+スペクトル定理 ― A∈Rn×nとします。　Aが対称ならば、Aは実直交行列U∈Rn×nによって対角化可能です。Λ=diag(λ1,...,λn)と表記すると、次のように表現できます。
 <br>
 
 **46. diagonal**
@@ -279,7 +279,7 @@ x∈V、一般的に使用されるノルムは、以下の表にまとめられ
 **47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
 
 &#10230;
-特異値分解 ― Aをm×nの行列とする。特異値分解（SVD）は、Ｕ ｍ×ｍのユニタリ行列、ｍ ｍ×ｎの対角行列、およびＶ ｎ×ｎのユニタリ行列の存在を保証する因数分解手法である、次のようになります。
+特異値分解 ― Aをm×nの行列とします。特異値分解（SVD）は、ユニタリ行列Ｕ ｍ×ｍ、Σ ｍ×ｎの対角行列、およびユニタリ行列Ｖ ｎ×ｎの存在を保証する因数分解手法で、以下の条件を満たします。
 <br>
 
 **48. Matrix calculus**
@@ -291,37 +291,37 @@ x∈V、一般的に使用されるノルムは、以下の表にまとめられ
 **49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
 
 &#10230;
-勾配 ― f:Rm×n→Rを関数とし、A∈Rm×nを行列とする。 Aに対するfの勾配はm×n行列で、∇Af（A）と表記され。次のように：
+勾配 ― f:Rm×n→Rを関数とし、A∈Rm×nを行列とします。 Aに対するfの勾配はm×n行列で、∇Af（A）と表記し、次の条件を満たします。
 <br>
 
 **50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
 
 &#10230;
-備考：　fの勾配は、fがスカラーを返す関数である場合にのみ定義されます。
+備考：　fの勾配は、fがスカラーを返す関数であるときに限り存在します。
 <br>
 
 **51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
 
 &#10230;
-ヘッセ行列 ― f：Rn→Rを関数とし、x∈Rnをベクトルとする。 xに関するfのヘッセ行列は、次のように∇2xf（x）と表記されるn×n対称行列です。
+ヘッセ行列 ― f：Rn→Rを関数とし、x∈Rnをベクトルとします。 xに対するfのヘッセ行列は、n×n対称行列で∇2xf（x）と表記し、以下の条件を満たします。
 <br>
 
 **52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
 
 &#10230;
-備考：　fのヘッセ行列は、fがスカラーを返す関数である場合にのみ定義されます。
+備考：　fのヘッセ行列は、fがスカラーを返す関数である場合に限り存在します。
 <br>
 
 **53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
 
 &#10230;
-勾配演算 ― 行列A、B、Cの場合、次の勾配特性があります。
+勾配演算 ― 行列A、B、Cの場合、特に以下の勾配の性質を意識する甲斐があります。
 <br>
 
 **54. [General notations, Definitions, Main matrices]**
 
 &#10230;
-[表記, 定義, 主行列]
+[表記, 定義, 主な行列の種類]
 <br>
 
 **55. [Matrix operations, Multiplication, Other operations]**
@@ -339,4 +339,4 @@ x∈V、一般的に使用されるノルムは、以下の表にまとめられ
 **57. [Matrix calculus, Gradient, Hessian, Operations]**
 
 &#10230;
-[行列計算, 勾配, ヘッセ行列, 演算]
\ No newline at end of file
+[行列微積分, 勾配, ヘッセ行列, 演算]

From 15563a53569994f366fe259e8b611f5477e5b49e Mon Sep 17 00:00:00 2001
From: Yuta Kanzawa <yutakanzawa@gmail.com>
Date: Sat, 31 Aug 2019 16:00:35 +0900
Subject: [PATCH 363/531] Update translation based on review by
 @tuananhhedspibk

---
 ja/cheatsheet-supervised-learning.md | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/ja/cheatsheet-supervised-learning.md b/ja/cheatsheet-supervised-learning.md
index ce9f85804..b5d6896fb 100644
--- a/ja/cheatsheet-supervised-learning.md
+++ b/ja/cheatsheet-supervised-learning.md
@@ -54,7 +54,7 @@
 
 **10. Notations and general concepts**
 
-&#10230;記法と概念
+&#10230;記法と全般的な概念
 
 <br>
 
@@ -90,7 +90,7 @@
 
 **16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
 
-&#10230;勾配降下法 ― 学習率をα∈Rとし、勾配降下法におけるパラメータの更新は学習率とコスト関数Jを用いて次のように行われる：
+&#10230;勾配降下法 ― 学習率をα∈Rとし、勾配降下法における更新ルールは学習率とコスト関数Jを用いて次のように表される：
 
 <br>
 
@@ -108,13 +108,13 @@
 
 **19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
 
-&#10230;ニュートン法 ― ニュートン法とはℓ′(θ)=0となるθを求める数値計算アルゴリズムである。そのパラメータ更新は次のように行われる：
+&#10230;ニュートン法 ― ニュートン法とはℓ′(θ)=0となるθを求める数値法である。その更新ルールは次の通りである：
 
 <br>
 
 **20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
 
-&#10230;備考：高次元正則化またはニュートン-ラフソン法ではパラメータ更新は次のように行われる：
+&#10230;備考：多次元一般化またはニュートン-ラフソン法の更新ルールは次の通りである：
 
 <br>
 
@@ -144,13 +144,13 @@
 
 **25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
 
-&#10230;最小2乗法 ― 学習率をαとすると、m個のデータ点からなる学習データに対する最小2乗法（LMSアルゴリズム）によるパラメータ更新は次のように行われ、これはウィドロウ-ホフの学習規則としても知られている：
+&#10230;最小2乗法 ― 学習率をαとすると、m個のデータ点からなる学習データに対する最小2乗法（LMSアルゴリズム）による更新ルールは、ウィドロウ-ホフの学習規則としても知られており、次の通りである：
 
 <br>
 
 **26. Remark: the update rule is a particular case of the gradient ascent.**
 
-&#10230;備考：この更新は勾配上昇法の特殊な例である。
+&#10230;備考：この更新ルールは勾配上昇法の特殊な例である。
 
 <br>
 
@@ -324,7 +324,7 @@
 
 **55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
 
-&#10230;生成モデルは、P(x|y)を推定することによりデータがどのように生成されるのかを学習しようとする。それはその後ベイズの定理を用いてP(y|x)を推定することに使える。
+&#10230;生成モデルは、P(x|y)を推定することによりデータがどのように生成されるのかを学習しようとする。それはベイズの定理を用いてP(y|x)を推定するために使える。
 
 <br>
 
@@ -336,7 +336,7 @@
 
 **57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
 
-&#10230;前提 ― ガウシアン判別分析はyとx|y=0とx|y=1は次のようであることを前提とする：
+&#10230;前提条件 ― ガウシアン判別分析はyとx|y=0とx|y=1は次のようであることを前提とする：
 
 <br>
 
@@ -372,7 +372,7 @@
 
 **63. Tree-based and ensemble methods**
 
-&#10230;ツリーとアンサンブル学習
+&#10230;決定木とアンサンブル学習
 
 <br>
 
@@ -384,13 +384,13 @@
 
 **65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
 
-&#10230;CART ― 分類・回帰ツリー (CART)は、一般には決定木として知られ、二分木として表される。非常に解釈しやすいという利点がある。
+&#10230;CART ― 分類・回帰木 (CART)は、一般には決定木として知られ、二分木として表される。非常に解釈しやすいという利点がある。
 
 <br>
 
 **66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
 
-&#10230;ランダムフォレスト ― これはツリーをベースにしたもので、ランダムに選択された特徴量の集合から構築された多数の決定木を用いる。単純な決定木と異なり、非常に解釈しにくいが、一般的に良い性能が出るのでよく使われるアルゴリズムである。
+&#10230;ランダムフォレスト ― これは決定木をベースにしたもので、ランダムに選択された特徴量の集合から構築された多数の決定木を用いる。単純な決定木と異なり、非常に解釈しにくいが、一般的に良い性能が出るのでよく使われるアルゴリズムである。
 
 <br>
 

From a1f4089c0f696175ccc846feb23463c17598a98d Mon Sep 17 00:00:00 2001
From: qunaieer <fsaq2000@gmail.com>
Date: Mon, 2 Sep 2019 09:00:01 +0300
Subject: [PATCH 364/531] Update cheatsheet-supervised-learning.md

The translation of this file is complete. I have finished from it according to the agreed terminologies on this webpage: https://www.nmthgiat.com/terminology/
---
 ar/cheatsheet-supervised-learning.md | 142 +++++++++++++--------------
 1 file changed, 71 insertions(+), 71 deletions(-)

diff --git a/ar/cheatsheet-supervised-learning.md b/ar/cheatsheet-supervised-learning.md
index 2967cb7ca..1d387b05e 100644
--- a/ar/cheatsheet-supervised-learning.md
+++ b/ar/cheatsheet-supervised-learning.md
@@ -1,18 +1,18 @@
 ﻿**1. Supervised Learning cheatsheet**
 
-مرجع سريع للتعلّم تحت الإشراف
+مرجع سريع للتعلّم المُوَجَّه
 
 <br>
 
 **2. Introduction to Supervised Learning**
 
-مقدمة للتعلّم تحت الإشراف
+مقدمة للتعلّم المُوَجَّه
 
 <br>
 
 **3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
 
-إذا كان لدينا مجموعة من نقاط البيانات {x(1),...,x(m)} مرتبطة بمجموعة مخرجات {y(1),...,y(m)}، نريد أن نبني نموذج تصنيف يتعلم كيف يتوقع y من x.
+إذا كان لدينا مجموعة من نقاط البيانات {x(1),...,x(m)} مرتبطة بمجموعة مخرجات {y(1),...,y(m)}، نريد أن نبني مُصَنِّف يتعلم كيف يتوقع y من x.
 
 
 <br>
@@ -25,13 +25,13 @@
 
 **5. [Regression, Classifier, Outcome, Examples]**
 
-[الارتباط (Regression)، التصنيف (Classification)، المُخرَج، أمثلة]
+[الانحدار (Regression)، التصنيف (Classification)، المُخرَج، أمثلة]
 
 <br>
 
 **6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
 
-[مستمر، فئة، ارتباط خطّي (Linear regression)، ارتباط لوجستي (Logistic regression)، SVM، بايز البسيط (Naive Bayes)]
+[مستمر، صنف، انحدار خطّي (Linear regression)، انحدار لوجستي (Logistic regression)، آلة المتجهات الداعمة (SVM)، بايز البسيط (Naive Bayes)]
 
 <br>
 
@@ -43,19 +43,19 @@
 
 **8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
 
-[النماذج التمييزية (Discriminative)، النماذج التوليدية (Generative)، الهدف، ماذا تتعلم، توضيح، أمثلة]
+[نموذج تمييزي (discriminative)، نموذج توليدي (Generative)، الهدف، ماذا يتعلم، توضيح، أمثلة]
 
 <br>
 
-**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
+**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, آلة المتجهات الداعمة (SVM), GDA, Naive Bayes]**
 
-[التقدير المباشر لـ P(y|x)، تقدير P(x|y) ثم استنتاج P(y|x)، حدود القرار، التوزيع الاحتمالي للبيانات، الارتباط (Regression)، SVM، GDA، بايز البسيط (Naive Bayes)]
+[التقدير المباشر لـ P(y|x)، تقدير P(x|y) ثم استنتاج P(y|x)، حدود القرار، التوزيع الاحتمالي للبيانات، الانحدار (Regression)، آلة المتجهات الداعمة (SVM)، GDA، بايز البسيط (Naive Bayes)]
 
 <br>
 
 **10. Notations and general concepts**
 
-تعريفات ومفاهيم أساسية
+الرموز ومفاهيم أساسية
 
 <br>
 
@@ -67,43 +67,43 @@
 
 **12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
 
-دالة الفرق (Loss function) - دالة الفرق هي الدالة L:(z,y)∈R×Y⟼L(z,y)∈R التي تأخذ كمدخلات القيمة المتوقعة z والقيمة الحقيقية y وتعطينا الفرق بينهما. الجدول التالي يحتوي على بعض دوال الفرق المستخدمة بكثرة:
+دالة الخسارة (Loss function) - دالة الخسارة هي الدالة L:(z,y)∈R×Y⟼L(z,y)∈R التي تأخذ كمدخلات القيمة المتوقعة z والقيمة الحقيقية y وتعطينا الفرق بينهما. الجدول التالي يحتوي على بعض دوال الخسارة الشائعة:
 
 <br>
 
 **13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
 
-[مربع الخطأ الأصغر (Least squared error)، الفرق اللوجستي (Logistic loss)، الفرق المفصلي (Hinge loss)، Cross-entropy]
+[خطأ أصغر تربيع (Least squared error)، خسارة لوجستية (Logistic loss)، خسارة مفصلية (Hinge loss)، الانتروبيا التقاطعية (Cross-entropy)]
 
 <br>
 
 **14. [Linear regression, Logistic regression, SVM, Neural Network]**
 
-[الارتباط الخطّي (Linear regression)، الارتباط اللوجستي (Logistic regression)، SVM، الشبكات العصبية (Neural Network)]
+[الانحدار الخطّي (Linear regression)، الانحدار اللوجستي (Logistic regression)، آلة المتجهات الداعمة (SVM)، الشبكات العصبية (Neural Network)]
 
 <br>
 
 **15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
 
-دالة التكلفة (Cost function) - دالة التكلفة J تستخدم عادة لتقييم أداء نموذج ما، ويتم تعريفها مع دالة الفرق L كالتالي:
+دالة التكلفة (Cost function) - دالة التكلفة J تستخدم عادة لتقييم أداء نموذج ما، ويتم تعريفها مع دالة الخسارة L كالتالي:
 
 <br>
 
 **16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
 
-الهبوط التفاضلي (Gradient descent) - لنعرّف معدل التعلّم α∈R، يمكن تعريف القانون الذي يتم تحديث خوارزمية الهبوط التفاضلي من خلاله باستخدام معدل التعلّم ودالة التكلفة J كالتالي:
+النزول الاشتقاقي (Gradient descent) - لنعرّف معدل التعلّم α∈R، يمكن تعريف القانون الذي يتم تحديث خوارزمية النزول الاشتقاقي من خلاله باستخدام معدل التعلّم ودالة التكلفة J كالتالي:
 
 <br>
 
 **17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
 
-ملاحظة: في الهبوط التفاضلي العشوائي (Stochastic gradient descent (SGD)) يتم تحديث المُعاملات (parameters) بناءاً على كل نقطة تدريب على حدة، بينما في الهبوط التفاضلي الحُزَمي (batch gradient descent) يتم تحديثها باستخدام حُزَم من نقاط التدريب.
+ملاحظة: في النزول الاشتقاقي العشوائي (Stochastic gradient descent (SGD)) يتم تحديث المُدخلات (parameters) بناءاً على كل عينة تدريب على حدة، بينما في النزول الاشتقاقي الحُزَمي (batch gradient descent) يتم تحديثها باستخدام حُزَم من عينات التدريب.
 
 <br>
 
 **18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
 
-الأرجحية (Likelihood) - تستخدم أرجحية النموذج L(θ)، حيث أن θ هي المعاملات، للبحث عن أفضل المُعاملات θ عن طريق تعظيم (maximizing) الأرجحية. عملياً يتم استخدام الأرجحية اللوغاريثمية (log-likelihood) ℓ(θ)=log(L(θ)) حيث أنها أسهل في التحسين (optimize). فيكون لدينا:
+الأرجحية (Likelihood) - تستخدم أرجحية النموذج L(θ)، حيث أن θ هي المُدخلات، للبحث عن المُدخلات θ الأحسن عن طريق تعظيم (maximizing) الأرجحية. عملياً يتم استخدام الأرجحية اللوغاريثمية (log-likelihood) ℓ(θ)=log(L(θ)) حيث أنها أسهل في التحسين (optimize). فيكون لدينا:
 
 <br>
 
@@ -127,7 +127,7 @@
 
 **22. Linear regression**
 
-الارتباط الخطّي (Linear regression)
+الانحدار الخطّي (Linear regression)
 
 <br>
 
@@ -139,31 +139,31 @@
 
 **24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
 
-معادلة Normal - إذا كان لدينا المصفوفة X، القيمة θ التي تقلل من دالة التكلفة يمكن حلها رياضياً بشكل مغلق (closed-form) عن طريق:
+المعادلة الطبيعية/الناظمية (Normal) - إذا كان لدينا المصفوفة X، القيمة θ التي تقلل من دالة التكلفة يمكن حلها رياضياً بشكل مغلق (closed-form) عن طريق:
 
 <br>
 
 **25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
 
-خوارزمية LMS - إذا كان لدينا معدل التعلّم α، فإن قانون التحديث لخوارزمية معدل المربعات الأصغر (Least Mean Squares (LMS)) لمجموعة بيانات من m عينة، ويطلق عليه قانون تعلم ويدرو-هوف (Widrow-Hoff)، كالتالي:
+خوارزمية أصغر معدل تربيع LMS - إذا كان لدينا معدل التعلّم α، فإن قانون التحديث لخوارزمية أصغر معدل تربيع (Least Mean Squares (LMS)) لمجموعة بيانات من m عينة، ويطلق عليه قانون تعلم ويدرو-هوف (Widrow-Hoff)، كالتالي:
 
 <br>
 
 **26. Remark: the update rule is a particular case of the gradient ascent.**
 
-ملاحظة: قانون التحديث هذا يعتبر حالة خاصة من الهبوط التفاضلي (Gradient descent).
+ملاحظة: قانون التحديث هذا يعتبر حالة خاصة من النزول الاشتقاقي (Gradient descent).
 
 <br>
 
 **27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
 
-LWR - الارتباط الموزون محلّياً (Locally Weighted Regression)، ويعرف بـ LWR، هو نوع من الارتباط الخطي يَزِن كل عينة تدريب أثناء حساب دالة التكلفة باستخدام w(i)(x)، التي يمكن تعريفها باستخدام المعامل τ∈R كالتالي:
+الانحدار الموزون محليّاً (LWR) - الانحدار الموزون محليّاً (Locally Weighted Regression)، ويعرف بـ LWR، هو نوع من الانحدار الخطي يَزِن كل عينة تدريب أثناء حساب دالة التكلفة باستخدام w(i)(x)، التي يمكن تعريفها باستخدام المُدخل (parameter) τ∈R كالتالي:
 
 <br>
 
 **28. Classification and logistic regression**
 
-التصنيف والارتباط اللوجستي
+التصنيف والانحدار اللوجستي
 
 <br>
 
@@ -175,79 +175,79 @@ LWR - الارتباط الموزون محلّياً (Locally Weighted Regressio
 
 **30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
 
-الارتباط اللوجستي (Logistic regression) - نفترض هنا أن  y|x;θ∼Bernoulli(ϕ). فيكون لدينا:
+الانحدار اللوجستي (Logistic regression) - نفترض هنا أن  y|x;θ∼Bernoulli(ϕ). فيكون لدينا:
 
 <br>
 
 **31. Remark: there is no closed form solution for the case of logistic regressions.**
 
-ملاحظة: ليس هناك حل رياضي مغلق للارتباط اللوجستي.
+ملاحظة: ليس هناك حل رياضي مغلق للانحدار اللوجستي.
 
 <br>
 
 **32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
 
-Softmax regression - ويطلق عليه الارتباط اللوجستي متعدد الفئات (multiclass logistic regression)، يستخدم لتعميم الارتباط اللوجستي إذا كان لدينا أكثر من فئتين. في العرف يتم تعيين θK=0، بحيث تجعل معامل بيرنوللي (Bernoulli) ϕi لكل فئة i يساوي:
+انحدار سوفت ماكس (Softmax) - ويطلق عليه الانحدار اللوجستي متعدد الأصناف (multiclass logistic regression)، يستخدم لتعميم الانحدار اللوجستي إذا كان لدينا أكثر من صنفين. في العرف يتم تعيين θK=0، بحيث تجعل مُدخل بيرنوللي (Bernoulli) ϕi لكل فئة i يساوي:
 
 <br>
 
 **33. Generalized Linear Models**
 
-النماذج الخطية العامة (Generalized Linear Models)
+النماذج الخطية العامة (Generalized Linear Models - GLM)
 
 <br>
 
 **34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
 
-العائلة الأسيّة (Exponential family) - يطلق على صنف من التوزيعات (distributions) بأنها تنتمي إلى العائلة الأسيّة إذا كان يمكن كتابتها ###########
+العائلة الأُسيّة (Exponential family) - يطلق على صنف من التوزيعات (distributions) بأنها تنتمي إلى العائلة الأسيّة إذا كان يمكن كتابتها بواسطة مُدخل قانوني (canonical parameter) η، إحصاء كافٍ (sufficient statistic) T(y)، ودالة تجزئة لوغاريثمية a(η)، كالتالي:
 
 <br>
 
 **35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
 
-ملاحظة: كثيراً ما سيكون T(y)=y. كذلك فإن exp(−a(η)) يمكن أن تفسر كمُعامل تسوية (normalization) للتأكد من أن الاحتمالات يكون حاصل جمعها واحد.
+ملاحظة: كثيراً ما سيكون T(y)=y. كذلك فإن exp(−a(η)) يمكن أن تفسر كمُدخل تسوية (normalization) للتأكد من أن الاحتمالات يكون حاصل جمعها يساوي واحد.
 
 <br>
 
 **36. Here are the most common exponential distributions summed up in the following table:**
 
-أكثر التوزيعات الأسيّة استخداماً تم تلخيصها في الجدول التالي:
+تم تلخيص أكثر التوزيعات الأسيّة استخداماً في الجدول التالي:
 
 <br>
 
 **37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
 
-[التوزيع، بيرنوللي (Bernoulli)، Gaussian، Poisson، Geometric]
+[التوزيع، بِرنوللي (Bernoulli)، جاوسي (Gaussian)، بواسون (Poisson)، هندسي (Geometric)]
 
 <br>
 
 **38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
 
-افتراضات GLMs - تهدف النماذج الخطيّة العامة (GLM) إلى توقع القيمة العشوائية y كدالة لـ x∈Rn+1، وتستند إلى ثلاثة افتراضات:
+افتراضات GLMs - تهدف النماذج الخطيّة العامة (GLM) إلى توقع المتغير العشوائي y كدالة لـ x∈Rn+1، وتستند إلى ثلاثة افتراضات:
 
 <br>
 
 **39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
 
-ملاحظة: المربعات الصغرى (least squares) الاعتيادية و الارتباط اللوجستي يعتبران من الحالات الخاصة للنماذج الخطيّة العامة.
+ملاحظة: أصغر تربيع (least squares) الاعتيادي و الانحدار اللوجستي يعتبران من الحالات الخاصة للنماذج الخطيّة العامة.
 
 <br>
 
 **40. Support Vector Machines**
 
-Support Vector Machines
+آلة المتجهات الداعمة (Support Vector Machines)
 
 <br>
 
 **41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
 
-تهدف Support Vector Machines إلى العثور على الخط الذي يعظم أصغر مسافة إلى الخط:
+تهدف آلة المتجهات الداعمة (SVM) إلى العثور على الخط الذي يعظم أصغر مسافة إليه:
 
 <br>
 
 **42: Optimal margin classifier ― The optimal margin classifier h is such that:**
 
-خوارزمية تصنيف الهامش الأمثل (Optimal margin classifier) - تعرَّف خوارزمية تصنيف الهامش الأمثل h كالتالي:
+مُصنِّف الهامش الأحسن (Optimal margin classifier) - يعرَّف مُصنِّف الهامش الأحسن h كالتالي:
 
 <br>
 
@@ -259,7 +259,7 @@ Support Vector Machines
 
 **44. such that**
 
-بحيث
+بحيث أن
 
 <br>
 
@@ -277,31 +277,31 @@ Support Vector Machines
 
 **47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
 
-الفرق المفصلي (Hinge loss) - يستخدم الفرق المفصلي في حل SVM ويعرف على النحو التالي:
+الخسارة المفصلية (Hinge loss) - تستخدم الخسارة المفصلية في حل SVM ويعرف على النحو التالي:
 
 <br>
 
 **48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
 
-النواة (Kernel) - إذا كان لدينا دالة تحويل الخصائص (features) ϕ، يمكننا تعريف النواة K كالتالي:
+النواة (Kernel) - إذا كان لدينا دالة ربط الخصائص (features) ϕ، يمكننا تعريف النواة K كالتالي:
 
 <br>
 
 **49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
 
-عملياً تُعَرَّف الدالة K عن طريق المعادلة K(x,z)=exp(−||x−z||22σ2)، ويطلق عليها النواة الجاوسية (Gaussian kernel)، وهي من الأكثر استخداماً.
+في التطبيق، يمكن أن تُعَرَّف الدالة K عن طريق المعادلة K(x,z)=exp(−||x−z||22σ2)، ويطلق عليها النواة الجاوسية (Gaussian kernel)، وهي تستخدم بكثرة.
 
 <br>
 
 **50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
 
-[فصل غير خطي، استخدام النواة للتحويل، خط القرار في الفضاء الأصلي]
+[قابلية الفصل غير الخطي، استخدام ربط النواة، حد القرار في الفضاء الأصلي]
 
 <br>
 
 **51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
 
-ملاحظة: نقول أننا نستخدم "حيلة النواة" لحساب دالة التكلفة عند استخدام النواة لأننا في الحقيقة لا نحتاج أن نعرف التحويل الصريح ϕ، الذي يكون في الغالب شديد التعقيد. ولكن، نحتاج أن فقط أن نحسب القيم K(x,z).
+ملاحظة: نقول أننا نستخدم "حيلة النواة" (kernel trick) لحساب دالة التكلفة عند استخدام النواة لأننا في الحقيقة لا نحتاج أن نعرف التحويل الصريح ϕ، الذي يكون في الغالب شديد التعقيد. ولكن، نحتاج أن فقط أن نحسب القيم K(x,z).
 
 <br>
 
@@ -313,7 +313,7 @@ Support Vector Machines
 
 **53. Remark: the coefficients βi are called the Lagrange multipliers.**
 
-ملاحظة: المعاملات (coefficients) βi يطلق عليها مضروبات لاغرانج (Lagrange multipliers).
+ملاحظة: المعامِلات (coefficients) βi يطلق عليها مضروبات لاغرانج (Lagrange multipliers).
 
 <br>
 
@@ -343,7 +343,7 @@ Support Vector Machines
 
 **58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
 
-التقدير - الجدول التالي يلخص أهم التي يمكننا التوصل لها عند تعظيم الأرجحية (likelihood):
+التقدير - الجدول التالي يلخص التقديرات التي يمكننا التوصل لها عند تعظيم الأرجحية (likelihood):
 
 <br>
 
@@ -355,7 +355,7 @@ Support Vector Machines
 
 **60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
 
-الافتراض - يفترض نموذج بايز البسيط أن جميع الخصائص لكل نقطة بيانات مستقلة (independent):
+الافتراض - يفترض نموذج بايز البسيط أن جميع الخصائص لكل عينة بيانات مستقلة (independent):
 
 <br>
 
@@ -367,37 +367,37 @@ Support Vector Machines
 
 **62. Remark: Naive Bayes is widely used for text classification and spam detection.**
 
-ملاحظة: بايز البسيط يستخدم بشكل واسع لتصنيف النصوص واكتشاف البريد الاكتروني المزعج.
+ملاحظة: بايز البسيط يستخدم بشكل واسع لتصنيف النصوص واكتشاف البريد الإلكتروني المزعج.
 
 <br>
 
 **63. Tree-based and ensemble methods**
 
-الطرق الشجرية (tree-based) والمجموعية (ensemble)
+الطرق الشجرية (tree-based) والتجميعية (ensemble)
 
 <br>
 
 **64. These methods can be used for both regression and classification problems.**
 
-هذه الطرق يمكن استخدامها لكلٍ من مشاكل الارتباط (regression) والتصنيف (classification).
+هذه الطرق يمكن استخدامها لكلٍ من مشاكل الانحدار (regression) والتصنيف (classification).
 
 <br>
 
 **65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
 
-CART - التصنيف والارتباط الشجري (CART)، والاسم الشائع له أشجار القرار (decision trees)، يمكن أن يمثل كأشجار ثنائية (binary trees). من المزايا لهذه الطريقة إمكانية تفسيرها بسهولة.
+التصنيف والانحدار الشجري (CART) - والاسم الشائع له أشجار القرار (decision trees)، يمكن أن يمثل كأشجار ثنائية (binary trees). من المزايا لهذه الطريقة إمكانية تفسيرها بسهولة.
 
 <br>
 
 **66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
 
-الغابة العشوائية (Random forest) - هي أحد الطرق الشجرية التي تستخدم عدداً كبيراً من أشجار القرار مبنية باستخدام مجموعة عشوائية من الخصائص. بخلاف شجرة القرار البسيطة، لا يمكن تفسير النموذج بسهولة، ولكن أدائها العالي جعلها أحد الخوارزمية المشهورة.
+الغابة العشوائية (Random forest) - هي أحد الطرق الشجرية التي تستخدم عدداً كبيراً من أشجار القرار مبنية باستخدام مجموعة عشوائية من الخصائص. بخلاف شجرة القرار البسيطة لا يمكن تفسير النموذج بسهولة، ولكن أدائها العالي جعلها أحد الخوارزمية المشهورة.
 
 <br>
 
 **67. Remark: random forests are a type of ensemble methods.**
 
-ملاحظة: أشجار القرار نوع من الخوارزميات المجموعية (ensemble).
+ملاحظة: أشجار القرار نوع من الخوارزميات التجميعية (ensemble).
 
 <br>
 
@@ -409,7 +409,7 @@ CART - التصنيف والارتباط الشجري (CART)، والاسم ال
 
 **69. [Adaptive boosting, Gradient boosting]**
 
-[الدعم المتكيف (Adaptive boosting)، الدعم التفاضلي (Gradient boosting)]
+[التعزيز التَكَيُّفي (Adaptive boosting)، التعزيز الاشتقاقي (Gradient boosting)]
 
 <br>
 
@@ -427,19 +427,19 @@ CART - التصنيف والارتباط الشجري (CART)، والاسم ال
 
 **72. Other non-parametric approaches**
 
-&#10230;
+طرق أخرى غير بارامترية (non-parametric)
 
-طرق أخرى غير حدودية (non-parametric)
+<br>
 
 **73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
 
-خوارزمية أقرب الجيران (k-nearest neighbors) - تعتبر خوارزمية أقرب الجيران، وتعرف بـ k-NN، طريقة غير حدودية حيث يتم تحديد نتيجة نقطة من البيانات من خلال عدد k من البيانات المجاورة في مجموعة التدريب. ويمكن استخدامها للتصنيف والارتباط.
+خوارزمية أقرب الجيران (k-nearest neighbors) - تعتبر خوارزمية أقرب الجيران، وتعرف بـ k-NN، طريقة غير بارامترية، حيث يتم تحديد نتيجة عينة من البيانات من خلال عدد k من البيانات المجاورة في مجموعة التدريب. ويمكن استخدامها للتصنيف والانحدار.
 
 <br>
 
 **74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
 
-ملاحظة: كلما زاد المُعامل k، كلما زاد الانحياز (bias)، وكلما نقص k، زاد التباين (variance).
+ملاحظة: كلما زاد المُدخل k، كلما زاد الانحياز (bias)، وكلما نقص k، زاد التباين (variance).
 
 <br>
 
@@ -451,37 +451,37 @@ CART - التصنيف والارتباط الشجري (CART)، والاسم ال
 
 **76. Union bound ― Let A1,...,Ak be k events. We have:**
 
-حدود الاتّحاد (Union bound) - لنجعل A1,...,Ak تمثل k حدث. فيكون لدينا:
+حد الاتحاد (Union bound) - لنجعل A1,...,Ak تمثل k حدث. فيكون لدينا:
 
 <br>
 
 **77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
 
-لا مساواة هوفدينج (Hoeffding) - لنجعل Z1,..,Zm تمثل m متغير مستقلة وموزعة بشكل مماثل (iid) مأخوذة من توزيع برنولي (Bernoulli distribution) ذا معامل ϕ. لنجعل ˆϕ متوسط العينة (sample mean) و γ>0 ثابت. فيكون لدينا:
+متراجحة هوفدينج (Hoeffding) - لنجعل Z1,..,Zm تمثل m متغير مستقلة وموزعة بشكل مماثل (iid) مأخوذة من توزيع بِرنوللي (Bernoulli distribution) ذا مُدخل ϕ. لنجعل ˆϕ متوسط العينة (sample mean) و γ>0 ثابت. فيكون لدينا:
 
 <br>
 
 **78. Remark: this inequality is also known as the Chernoff bound.**
 
-ملاحظة: هذه اللا مساواة تعرف كذلك بحد كيرنوف (Chernoff bound).
+ملاحظة: هذه المتراجحة تعرف كذلك بحد تشرنوف (Chernoff bound).
 
 <br>
 
 **79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
 
-خطأ التدريب - ليكن لدينا خوارزمية التصنيف h، يمكن تعريف خطأ التدريب ˆϵ(h)، ويعرف كذلك بالخطر التجريبي أو الخطأ التجريبي، كالتالي:
+خطأ التدريب - ليكن لدينا المُصنِّف h، يمكن تعريف خطأ التدريب ˆϵ(h)، ويعرف كذلك بالخطر التجريبي أو الخطأ التجريبي، كالتالي:
 
 <br>
 
 **80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
 
-Probably Approximately Correct (PAC) - PAC هو إطار يتم من خلاله إثبات العديد من نظريات التعلم، ويحتوي على الافتراضات التالية:
+تقريباً صحيح احتمالياً (Probably Approximately Correct (PAC)) - هو إطار يتم من خلاله إثبات العديد من نظريات التعلم، ويحتوي على الافتراضات التالية:
 
 <br>
 
 **81: the training and testing sets follow the same distribution **
 
-مجموعتي التدريب والاختبار تتبعان نفس التوزيع.
+مجموعتي التدريب والاختبار يتبعان نفس التوزيع.
 
 <br>
 
@@ -493,31 +493,31 @@ Probably Approximately Correct (PAC) - PAC هو إطار يتم من خلاله
 
 **83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
 
-Shattering - إذا كان لدينا المجموعة S={x(1),...,x(d)}، ومجموعة نماذج H، نقول أن H shatters S إذا كان لكل مجموعة أهداف (labels) {y(1),...,y(d)} لدينا:
+مجموعة تكسيرية (Shattering Set) - إذا كان لدينا المجموعة S={x(1),...,x(d)}، ومجموعة مُصنٍّفات H، نقول أن H shatters S إذا كان لكل مجموعة علامات (labels) {y(1),...,y(d)} لدينا:
 
 <br>
 
 **84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
 
-نظرية الحد الأعلى (Upper bound theorem) - لنجعل H فئة فرضية محدودة (finite hypothesis class) بحيث |H|=k، و δ وحجم العينة m ثابتين. حينها سيكون لدينا، مع احتمال على الأقل 1−δ، التالي:
+مبرهنة الحد الأعلى (Upper bound theorem) - لنجعل H فئة فرضية محدودة (finite hypothesis class) بحيث |H|=k، و δ وحجم العينة m ثابتين. حينها سيكون لدينا، مع احتمال على الأقل 1−δ، التالي:
 
 <br>
 
 **85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
 
-بُعد VC - بُعد فابنك-شيرفونينكز (Vapnik-Chervonenkis) لفئة فرضية محدودة (finite hypothesis class) H، ويرمز له بـ VC(H)، هو حجم أكبر مجموعة (set) التي shattered by H.
+بُعْد فابنيك-تشرفونيكس (Vapnik-Chervonenkis - VC) لفئة فرضية غير محدودة (infinite hypothesis class) H، ويرمز له بـ VC(H)، هو حجم أكبر مجموعة (set) التي shattered by H.
 
 <br>
 
 **86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
 
-ملاحظة: بُعد VC لـ H = {مجموعة التصنيفات الخطية في بُعدين} يساوي 3.
+ملاحظة: بُعْد فابنيك-تشرفونيكس VC لـ H = {مجموعة التصنيفات الخطية في بُعدين} يساوي 3.
 
 <br>
 
 **87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
 
-نظرية فابنك (Vapnik) - ليكن لدينا H، مع VC(H)=d وعدد عيّنات التدريب m. سيكون لدينا، مع احتمال على الأقل 1−δ، التالي:
+مبرهنة فابنيك (Vapnik theorem) - ليكن لدينا H، مع VC(H)=d وعدد عيّنات التدريب m. سيكون لدينا، مع احتمال على الأقل 1−δ، التالي:
 
 <br>
 
@@ -529,19 +529,19 @@ Shattering - إذا كان لدينا المجموعة S={x(1),...,x(d)}، وم
 
 **89. [Notations and general concepts, loss function, gradient descent, likelihood]**
 
-[تعريفات ومفاهيم أساسية، دالة الفرق، الهبوط التفاضلي، الأرجحية]
+[الرموز ومفاهيم أساسية، دالة الخسارة، النزول الاشتقاقي، الأرجحية]
 
 <br>
 
 **90. [Linear models, linear regression, logistic regression, generalized linear models]**
 
-[النماذج الخطيّة، الارتباط الخطّي، الارتباط اللوجستي، النماذج الخطية العامة]
+[النماذج الخطيّة، الانحدار الخطّي، الانحدار اللوجستي، النماذج الخطية العامة]
 
 <br>
 
 **91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
 
-[Support vector machines، خوارزمية تصنيف الهامش الأمثل، الفرق المفصلي، النواة]
+[آلة المتجهات الداعمة (SVM)، مُصنِّف الهامش الأحسن، الفرق المفصلي، النواة]
 
 <br>
 
@@ -553,7 +553,7 @@ Shattering - إذا كان لدينا المجموعة S={x(1),...,x(d)}، وم
 
 **93. [Trees and ensemble methods, CART, Random forest, Boosting]**
 
-[الطرق الشجرية والمجموعية، التصنيف والارتباط الشجري (CART)، الغابة العشوائية (Random forest)، التعزيز (Boosting)]
+[الطرق الشجرية والتجميعية، التصنيف والانحدار الشجري (CART)، الغابة العشوائية (Random forest)، التعزيز (Boosting)]
 
 <br>
 
@@ -565,4 +565,4 @@ Shattering - إذا كان لدينا المجموعة S={x(1),...,x(d)}، وم
 
 **95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
 
-[نظرية التعلُّم، لا مساواة هوفدينج (Hoeffding)، PAC، بُعد VC]
+[نظرية التعلُّم، متراجحة هوفدنك، تقريباً صحيح احتمالياً (PAC)، بُعْد فابنيك-تشرفونيكس (VC dimension)]

From fa9c0f539e8ce79ffdf811a969bd8fced47fbf17 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 2 Sep 2019 22:01:35 -0700
Subject: [PATCH 365/531] Add contributors [ja]

---
 CONTRIBUTORS | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 023786eae..addb9870f 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -92,6 +92,18 @@
   Kamuela Lau (translation of deep learning tips and tricks)
   Yoshiyuki Nakai (review of deep learning tips and tricks)
   Hiroki Mori (review of deep learning tips and tricks)
+  
+  Robert Altena (translation of linear algebra)
+  Kamuela Lau (review of linear algebra)
+  
+  Takatoshi Nao (translation of probabilities and statistics)
+  Yuta Kanzawa (review of probabilities and statistics)
+  
+  H. Hamano (translation of recurrent neural networks)
+  Yoshiyuki Nakai (review of recurrent neural networks)
+  
+  Yuta Kanzawa (translation of supervised learning)
+  Tran Tuan Anh (review of supervised learning)
 
 --pt
   Leticia Portella (translation of convolutional neural networks)

From 8bb6198799d40dfcccaa440f3eefab09f928785a Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 2 Sep 2019 22:03:32 -0700
Subject: [PATCH 366/531] Update progress [ja]

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 318e74849..1ce4fa514 100644
--- a/README.md
+++ b/README.md
@@ -65,7 +65,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
 |**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/144)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/142)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/140)|
+|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/142)|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
 |**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|
 |**Português**|done|done|done|done|done|done|
@@ -90,7 +90,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Magyar**|not started|not started|not started|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/155)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
 |**Italiano**|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/146)|done|
+|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
 |**Polski**|not started|not started|not started|
 |**Português**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|not started|

From 7b934326ae648eb18c4ac7a6286f3435da320d56 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Sat, 7 Sep 2019 12:26:33 +0900
Subject: [PATCH 367/531] [ja] Cheatsheet Unsupervised learning

---
 ja/cheatsheet-unsupervised-learning.md | 340 +++++++++++++++++++++++++
 1 file changed, 340 insertions(+)
 create mode 100644 ja/cheatsheet-unsupervised-learning.md

diff --git a/ja/cheatsheet-unsupervised-learning.md b/ja/cheatsheet-unsupervised-learning.md
new file mode 100644
index 000000000..08e3f593a
--- /dev/null
+++ b/ja/cheatsheet-unsupervised-learning.md
@@ -0,0 +1,340 @@
+**1. Unsupervised Learning cheatsheet**
+
+&#10230;教師なし学習チートシート
+
+<br>
+
+**2. Introduction to Unsupervised Learning**
+
+&#10230;教師なし学習のはじめに
+
+<br>
+
+**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
+
+&#10230;モチベーション - 教師なし学習の目的はラベルんなしデータ{x(1),...,x(m)}の中の隠されたパターンを探す。
+
+<br>
+
+**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
+
+&#10230;ジェンセンの不平等 - fを凸関数とし、Xをランダム変数。次の不平等がある:
+
+<br>
+
+**5. Clustering**
+
+&#10230;クラスタリング
+
+<br>
+
+**6. Expectation-Maximization**
+
+&#10230;EM
+
+<br>
+
+**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
+
+&#10230;潜在変数 - 潜在変数は推定問題を困難にする隠される変数であり、zで示される。潜在変数がある最も一般的な設定はこれ:
+
+<br>
+
+**8. [Setting, Latent variable z, Comments]**
+
+&#10230;[設定, 潜在変数z, コメント]
+
+<br>
+
+**9. [Mixture of k Gaussians, Factor analysis]**
+
+&#10230;[kガウス分布の混合, 因子分析]
+
+<br>
+
+**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230;アルゴリズム - EMアルゴリズムは尤度の下限(E-ステップ)を繰り返し構築し、その下限(M-ステップ)次の通りに最適することにより、最尤推定を通じてパラメーターθを推定する効率な方法を共有する:
+
+<br>
+
+**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
+
+&#10230;E-ステップ: 次のように各データポイントx(i)が特定クラスターz(i)に由来する事後確率Qi(z(i))を評価する:
+
+<br>
+
+**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
+
+&#10230;M-ステップ: 次のように各クラスターモデル別途再見積もりのためデータポイントx(i)のクラスター固有の重みとして事後確率Qi(z(i))を使う:
+
+<br>
+
+**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
+
+&#10230;[ガウス分布初期化, 期待ステップ, 最大化ステップ, 収束]
+
+<br>
+
+**14. k-means clustering**
+
+&#10230;k-meansクラスタリング
+
+<br>
+
+**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
+
+&#10230;クラスタのデータポイントiをc(i)、クラスタjのセンターをμjで表示する。
+
+<br>
+
+**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230;アルゴリズム - クラスターのセンターポイントμ1,μ2,...,μk∈Rnを偶然初期化後、k-meansアルゴリズムが次のようなステップを収束まで繰り返す:
+
+<br>
+
+**17. [Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230; [Means初期化, クラスター割り立て, Means更新, 収束]
+
+<br>
+
+**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
+
+&#10230;ディストーション関数 - アルゴリズムが収束するかどうかを確認するため、次のように定義されたディストーション関数を参照する:
+
+<br>
+
+**19. Hierarchical clustering**
+
+&#10230; 階層的クラスタリング
+
+<br>
+
+**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
+
+&#10230;アルゴリズム - これは入れ子クラスタを連続で構築する凝集階層アプローチによるクラスタリングアルゴリズムだ。
+
+<br>
+
+**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**22. [Ward linkage, Average linkage, Complete linkage]**
+
+&#10230;
+
+<br>
+
+**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
+
+&#10230;
+
+<br>
+
+**24. Clustering assessment metrics**
+
+&#10230;
+
+<br>
+
+**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
+
+&#10230;
+
+<br>
+
+**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
+
+&#10230;
+
+<br>
+
+**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
+
+&#10230;
+
+<br>
+
+**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+**29. Dimension reduction**
+
+&#10230;
+
+<br>
+
+**30. Principal component analysis**
+
+&#10230;
+
+<br>
+
+**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
+
+&#10230;
+
+<br>
+
+**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+
+<br>
+
+**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+
+<br>
+
+**34. diagonal**
+
+&#10230;
+
+<br>
+
+**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230;
+
+<br>
+
+**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
+dimensions by maximizing the variance of the data as follows:**
+
+&#10230;
+
+<br>
+
+**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230;
+
+<br>
+
+**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
+
+&#10230;
+
+<br>
+
+**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
+
+&#10230;
+
+<br>
+
+**40. Step 4: Project the data on spanR(u1,...,uk).**
+
+&#10230;
+
+<br>
+
+**41. This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230;
+
+<br>
+
+**42. [Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230;
+
+<br>
+
+**43. Independent component analysis**
+
+&#10230;
+
+<br>
+
+**44. It is a technique meant to find the underlying generating sources.**
+
+&#10230;
+
+<br>
+
+**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
+
+&#10230;
+
+<br>
+
+**46. The goal is to find the unmixing matrix W=A−1.**
+
+&#10230;
+
+<br>
+
+**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
+
+&#10230;
+
+<br>
+
+**48. Write the probability of x=As=W−1s as:**
+
+&#10230;
+
+<br>
+
+**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
+
+&#10230;
+
+<br>
+
+**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
+
+&#10230;
+
+<br>
+
+**51. The Machine Learning cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+**52. Original authors**
+
+&#10230; 原著者
+
+<br>
+
+**53. Translated by X, Y and Z**
+
+&#10230; X, Y, Zによる翻訳された
+
+<br>
+
+**54. Reviewed by X, Y and Z**
+
+&#10230; X, Y, Zによるレビューされた
+
+<br>
+
+**55. [Introduction, Motivation, Jensen's inequality]**
+
+&#10230;
+
+<br>
+
+**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
+
+&#10230;[クラスタリング, EM, k-means, 階層クラスタリング, 指標]
+
+<br>
+
+**57. [Dimension reduction, PCA, ICA]**
+
+&#10230;

From fc81c157463e9631fe198ff8eba070879458fc50 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Sat, 7 Sep 2019 12:39:58 +0900
Subject: [PATCH 368/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index f12b21465..12a7a8ebd 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -32,9 +32,9 @@
 
 **5. [Filter hyperparameters, Dimensions, Stride, Padding]**
 
-&#10230;
+&#10230; [フィルタハイパーパラメータ, 大きさ, ストライド, パディング]
 
-<br> [フィルタハイパーパラメータ, 大きさ, ストライド, パディング]
+<br>
 
 
 **6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
@@ -348,9 +348,9 @@
 
 **50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
 
-&#10230;
+&#10230; モデルの種類 - 物体認識アルゴリズムは主に三つのタイプがあり、予測されるものの性質は異なります。次の表で説明される。
 
-<br> モデルの種類 - 物体認識アルゴリズムは主に三つのタイプがあり、予測されるものの性質は異なります。次の表で説明される。
+<br>
 
 
 **51. [Image classification, Classification w. localization, Detection]**
@@ -523,9 +523,9 @@
 
 **75. Types of models ― Two main types of model are summed up in table below:**
 
-&#10230;
+&#10230; モデルのタイプ - 主な二つのモデルは次の表で要約される:
 
-<br> モデルのタイプ - 主な二つのモデルは次の表で要約される:
+<br>
 
 
 **76. [Face verification, Face recognition, Query, Reference, Database]**
@@ -572,9 +572,9 @@
 
 **82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
 
-&#10230;
+&#10230; モチベーション - 神経のスタイル転送の目的は与えられたコンテンツCとスタイルSに基づく画像Gを生成する。
 
-<br> モチベーション - 神経のスタイル転送の目的は与えられたコンテンツCとスタイルSに基づく画像Gを生成する。
+<br>
 
 
 **83. [Content C, Style S, Generated image G]**
@@ -586,9 +586,9 @@
 
 **84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
 
-&#10230;
+&#10230; 活性化 - 与えられた層lで、活性化はa[l]と表示されて、nH×nw×ncの寸法。
 
-<br> 活性化 - 与えられた層lで、活性化はa[l]と表示されて、nH×nw×ncの寸法。
+<br>
 
 
 **85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
@@ -677,9 +677,9 @@
 
 **97. The Deep Learning cheatsheets are now available in [target language].**
 
-&#10230;
+&#10230; 深層学習チートシートは今[ターゲット言語]で利用可能です。
 
-<br> 深層学習チートシートは今[ターゲット言語]で利用可能です。
+<br>
 
 
 **98. Original authors**
@@ -705,9 +705,9 @@
 
 **101. View PDF version on GitHub**
 
-&#10230;
+&#10230; GithubでPDFバージョン見る
 
-<br> GithubでPDFバージョン見る
+<br>
 
 
 **102. By X and Y**

From 78b3b88cc2f5350bd2ea5ad73869e7a33641912d Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Fri, 6 Sep 2019 20:48:12 -0700
Subject: [PATCH 369/531] Update progress [ja]

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 1ce4fa514..be07af1f2 100644
--- a/README.md
+++ b/README.md
@@ -65,7 +65,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
 |**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/142)|done|
+|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/173)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/142)|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
 |**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|
 |**Português**|done|done|done|done|done|done|

From 48b5a95ed604f05744075340e5f4423adbefd20b Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Sat, 7 Sep 2019 13:09:28 +0900
Subject: [PATCH 370/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 90 ++++++++++++++---------------
 1 file changed, 45 insertions(+), 45 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index 12a7a8ebd..d8e5017b1 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -32,7 +32,7 @@
 
 **5. [Filter hyperparameters, Dimensions, Stride, Padding]**
 
-&#10230; [フィルタハイパーパラメータ, 大きさ, ストライド, パディング]
+&#10230; [フィルタハイパーパラメータ, 次元, ストライド, パディング]
 
 <br>
 
@@ -53,14 +53,14 @@
 
 **8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
 
-&#10230; [オブジェクト検出, モデルのタイプ, 検出, 積集合の和集合, 非最大抑制, YOLO, R-CNN]
+&#10230; [物体検出, モデルのタイプ, 検出, IoU, 非極大抑制, YOLO, R-CNN]
 
 <br>
 
 
 **9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
 
-&#10230; [顔認証/認識, 一発学習, シャムネットワーク, 三重項損失]
+&#10230; [顔認証/認識, One shot学習, シャムネットワーク, 三重項損失]
 
 <br>
 
@@ -95,7 +95,7 @@
 
 **14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
 
-&#10230; 畳み込み層とプーリング層は次のセクションで説明されるハイパーパラメータに関して微調整できます。
+&#10230; 畳み込み層とプーリング層は次のセクションで説明されるハイパーパラメータに関してファインチューニングできます。
 
 <br>
 
@@ -123,7 +123,7 @@
 
 **18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
 
-&#10230; プーリング (POOL) - プーリング層 (POOL)はダウンサンプリング操作で、通常は位置不変性をもつ畳み込み層の後に適用されます。特に、最大及び平均プーリングはそれぞれ最大と平均値が取られる特別な種類のプーリングです。
+&#10230; プーリング (POOL) - プーリング層 (POOL)は位置不変性をもつ縮小操作で、通常は畳み込み層の後に適用されます。特に、最大及び平均プーリングはそれぞれ最大と平均値が取られる特別な種類のプーリングです。
 
 <br>
 
@@ -172,7 +172,7 @@
 
 **25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
 
-&#10230; フィルタの大きさ - C個のチャネルを含む入力に適用されるF×Fサイズのフィルタの体積はF×F×Cで、それはI×I×Cサイズの入力に対して畳み込みを実行してO×O×1サイズの特徴マップ（活性化マップとも呼ばれる）出力を生成します。
+&#10230; フィルタの次元 - C個のチャネルを含む入力に適用されるF×Fサイズのフィルタの体積はF×F×Cで、それはI×I×Cサイズの入力に対して畳み込みを実行してO×O×1サイズの特徴マップ（活性化マップとも呼ばれる）出力を生成します。
 
 
 <br>
@@ -215,21 +215,21 @@
 
 **31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
 
-&#10230; [パディングなし, もし大きさが合わなかったら最後の畳み込みをやめる, 特徴マップのサイズが[IS]になるようなパディング, 出力サイズは数学的に扱いやすい, 「ハーフ」パディングとも呼ばれる, 入力の一番端まで畳み込みが適用されるような最大パディング, フィルタは入力を端から端まで「見る」]
+&#10230; [パディングなし, もし次元が合わなかったら最後の畳み込みをやめる, 特徴マップのサイズが[IS]になるようなパディング, 出力サイズは数学的に扱いやすい, 「ハーフ」パディングとも呼ばれる, 入力の一番端まで畳み込みが適用されるような最大パディング, フィルタは入力を端から端まで「見る」]
 
 <br>
 
 
 **32. Tuning hyperparameters**
 
-&#10230; 調律ハイパーパラメータ
+&#10230; ハイパーパラメータの調整
 
 <br>
 
 
 **33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
 
-&#10230; 畳み込み層内のパラメータ互換性 - Iを入力ボリュームサイズの長さ、Fをフィルタの長さ、Pをゼロパディングの量, Sをストライドとすると、その寸法に沿った特徴図の出力サイズOは次式で与えられる:
+&#10230; 畳み込み層内のパラメータ互換性 - Iを入力ボリュームサイズの長さ、Fをフィルタの長さ、Pをゼロパディングの量, Sをストライドとすると、その次元に沿った特徴マップの出力サイズOは次式で与えられます:
 
 <br>
 
@@ -243,28 +243,28 @@
 
 **35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
 
-&#10230; 注意: しばしば、Pstart=Pend≜P、その場合、上記の式のようにPstart+Pendを2Pに置き換える事ができる。
+&#10230; 注: 多くの場合Pstart=Pend≜Pであり、上記の式のPstart+Pendを2Pに置き換える事ができます。
 
 <br>
 
 
 **36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
 
-&#10230; モデルの複雑さを理解する - モデルの複雑さを評価する為モデルのアーキテクチャが持つことになるパラメータの数を決定することはしばしば有用です。畳み込みニューラルネットワーク内で、以下のように行なわれる。
+&#10230; モデルの複雑さを理解する - モデルの複雑さを評価するために、モデルのアーキテクチャが持つパラメータの数を測定することがしばしば有用です。畳み込みニューラルネットワークの各レイヤでは、以下のように行なわれます。
 
 <br>
 
 
 **37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
 
-&#10230; [図, 入力サイズ, 出力サイズ, 引数の数, 備考]
+&#10230; [図, 入力サイズ, 出力サイズ, パラメータの数, 備考]
 
 <br>
 
 
 **38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
 
-&#10230; [フィルタにあたり1つのバイアスパラメータ, ほとんどの場合, S<F, Kの一般的な選択は2C]
+&#10230; [フィルタごとに1つのバイアスパラメータ, ほとんどの場合, S<F, Kの一般的な選択は2C]
 
 <br>
 
@@ -278,21 +278,21 @@
 
 **40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
 
-&#10230; [入力は平坦化される, ニューラルごとにひとつのバイアスパラメータ, FCニューラルの数は構造制約がない]
+&#10230; [入力は平坦化される, ニューロンごとにひとつのバイアスパラメータ, FCのニューロンの数には構造的制約がない]
 
 <br>
 
 
 **41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
 
-&#10230; 受容的なフィルド - 層kの受容的なフィルドはk番目の活性化図の各ピックセルが見られる入力のRkxRkを表示されるエリアです。j層のフィルタサイズをFj、i層のストライド値をSi、規約S0=1とすると、k層での受容的なフィルドは式で計算される:
+&#10230; 受容野 - レイヤkにおける受容野は、k番目の活性化マップの各ピクセルが「見る」ことができる入力のRk×Rkの領域です。レイヤjのフィルタサイズをFj、レイヤiのストライド値をSiとし、慣例に従ってS0=1とすると、レイヤkでの受容野は次の式で計算されます：
 
 <br>
 
 
 **42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
 
-&#10230; 下記の例で、F1=F2=3、S1=S2=1となるのでR2=1+2⋅1+2⋅1=5となる。
+&#10230; 下記の例のようにF1=F2=3、S1=S2=1とすると、R2=1+2⋅1+2⋅1=5となります。
 
 <br>
 
@@ -306,70 +306,70 @@
 
 **44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
 
-&#10230; 整流線形ユニット - 整流線形ユニット層(ReLU)はボリュームの全ての要素に利用される活性化関数gです。ReLUの目的は非線型性をネットワークに紹介する。ReLUの変種は以下の表でまとめられる:
+&#10230; 正規化線形ユニット - 正規化線形ユニットレイヤ(ReLU)はボリュームの全ての要素に利用される活性化関数gです。ReLUの目的は非線型性をネットワークに導入することです。変種は以下の表でまとめられています：
 
 <br>
 
 
 **45. [ReLU, Leaky ReLU, ELU, with]**
 
-&#10230;[ReLU, Leaky ReLU, ELU, with]
+&#10230;[ReLU, Leaky ReLU, ELU, ただし]
 
 <br>
 
 
 **46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
 
-&#10230; [生物学的に解釈可能な非線形複雑性, 負の値の為dyingReLUの問題を示す,どこても差別化可能]
+&#10230; [生物学的に解釈可能な非線形複雑性, 負の値に対してReLUが死んでいる問題に対処する,どこても微分可能]
 
 <br>
 
 
 **47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
 
-&#10230; ソフトマックス - ソフトマックスステップは入力としてx∈Rnスコアのベクターを取り、アーキテクチャの最後にソフトマックス関数を通じてp∈Rn出力確率のベクターを出して、一般化ロジスティック関数として見る事ができる。
+&#10230; ソフトマックス - ソフトマックスのステップは入力としてスコアx∈Rnのベクトルを取り、アーキテクチャの最後にあるソフトマックス関数を通じて確率p∈Rnのベクトルを出力する一般化されたロジスティック関数として見ることができます。次のように定義されます。
 
 <br>
 
 
 **48. where**
 
-&#10230; どこ
+&#10230; ここで
 
 <br>
 
 
 **49. Object detection**
 
-&#10230; オブジェクト検出
+&#10230; 物体検出
 
 <br>
 
 
 **50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
 
-&#10230; モデルの種類 - 物体認識アルゴリズムは主に三つのタイプがあり、予測されるものの性質は異なります。次の表で説明される。
+&#10230; モデルの種類 - 物体認識アルゴリズムは主に3つの種類があり、予測されるものの性質は異なります。次の表で説明されています。
 
 <br>
 
 
 **51. [Image classification, Classification w. localization, Detection]**
 
-&#10230; [画像分類, 分類 w. 局地化, 検出]
+&#10230; [画像分類, 位置特定を伴う分類, 検出]
 
 <br>
 
 
 **52. [Teddy bear, Book]**
 
-&#10230; [テディ熊, 本]
+&#10230; [テディベア, 本]
 
 <br>
 
 
 **53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
 
-&#10230; [画像分類, オブジェクトの確率予測, 画像内のオブジェクト検出, オブジェクトの確率と所在地予測, 画像内の複数オブジェクト検出, 複数オブジェクトの確率と所在地予測]
+&#10230; [画像を分類する, 物体の確率を予測する, 画像内の物体を検出する, 物体の確率とその位置を予測する, 画像内の複数の物体を検出する, 複数の物体の確率と位置を予測する]
 
 <br>
 
@@ -383,133 +383,133 @@
 
 **55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
 
-&#10230; 検出 - 物体検出の文脈では、画像内で物体を特定するのかそれとも複雑な形状を検出するのかによって、様々な方法は使用される。二つの主なものは次の表でまとめられる:
+&#10230; 検出 - 物体検出の文脈では、画像内の物体の位置を特定したいだけなのかあるいは複雑な形状を検出したいのかによって、異なる方法が使用されます。二つの主なものは次の表でまとめられています：
 
 <br>
 
 
 **56. [Bounding box detection, Landmark detection]**
 
-&#10230; [物体検出, ランドマーク検出]
+&#10230; [境界ボックス検出, ランドマーク検出]
 
 <br>
 
 
 **57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
 
-&#10230; [物体が配置されている画像の部分検出, (例: 目)物体の特徴または形状検出, より粒状]
+&#10230; [物体が配置されている画像の部分を検出する, 物体（たとえば目）の形状または特徴を検出する, よりきめ細かい]
 
 <br>
 
 
 **58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
 
-&#10230; [センターのボックス(bx, by), 縦bhと幅bw, 各参照ポイント　(l1x,l1y), ..., (lnx,lny)]
+&#10230; [中心(bx, by)、高さbh、幅bwのボックス, 参照点(l1x,l1y), ..., (lnx,lny)]
 
 <br>
 
 
 **59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
 
-&#10230; 労働組合の交差点 - 労働組合の交差点(IoUとも呼ばれる)は予測バウンディングボックスBpが実際のバウンディングボックスBaに対してどれだけ正しくかを定量化する関数です。次のように定義される:
+&#10230; Intersection over Union - Intersection over Union （IoUとしても知られる）は予測された境界ボックスBpが実際の境界ボックスBaに対してどれだけ正しく配置されているかを定量化する関数です。次のように定義されます：
 
 <br>
 
 
 **60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
 
-&#10230; 注意: 常にIoU∈[0,1]を持ってます。規約により、予測されたバウンディングボックスBpはIoU(Bp,Ba)⩾0.5の場合適度に良いと見なされる。
+&#10230; 注：常にIoU∈[0,1]となります。慣例では、IoU(Bp,Ba)⩾0.5の場合、予測された境界ボックスBpはそこそこ良いと見なされます。
 
 <br>
 
 
 **61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
 
-&#10230; アンカーボックス - アンカーボクシングは重複バウンディングボックスを予測する為利用される技術です。実際には、同時に複雑のボックスを予測すろことを許可されており、各ボックス予測は与えられた幾何学的なプロパーティのセットを持つように制約される。例えば、最初の予測は与えられたフォームの長方形のボックスになる可能性があり、二番目のボックスは異なる幾何学的なフォームの別の長方形になります。
+&#10230; アンカーボックス - アンカーボクシングは重なり合う境界ボックスを予測するために使用される手法です。 実際には、ネットワークは同時に複数のボックスを予測することを許可されており、各ボックスの予測は特定の幾何学的属性の組み合わせを持つように制約されます。例えば、最初の予測は特定の形式の長方形のボックスになる可能性があり、2番目の予測は異なる幾何学的形式の別の長方形のボックスになります。
 
 <br>
 
 
 **62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
 
-&#10230; 非最大抑制 - 非最大抑制技術のねらいは最も代表的なもの選択によって同物体の重複する重なり合う境界ボックスを除去することです。0.6未満予測確率があるボックスを全て除去した後、残りのボックスがある間に以下のステップが繰り返される:
+&#10230; 非極大抑制 - 非極大抑制技術のねらいは、最も代表的なものを選択することによって、同じ物体の重複した重なり合う境界ボックスを除去することです。0.6未満の予測確率を持つボックスを全て除去した後、残りのボックスがある間、以下の手順が繰り返されます:
 
 <br>
 
 
 **63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
 
-&#10230; [与えられたクラス, ステップ1: 最大予測確率があるボックスを取り。, ステップ2: 前のボックスと一緒にIoU⩾0.5のボックスを切り捨てる。]
+&#10230; [特定のクラスに対して, ステップ1: 最大の予測確率を持つボックスを選ぶ。, ステップ2: そのボックスに対してIoU⩾0.5となる全てのボックスを破棄する。]
 
 <br>
 
 
 **64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
 
-&#10230; [ボックス予測, 最大確率のボックス選択, 同じクラスの重なり合う除去, 最後のバウンディングボックス]
+&#10230; [ボックス予測, 最大確率のボックス選択, 同じクラスの重複除去, 最終的な境界ボックス]
 
 <br>
 
 
 **65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
 
-&#10230; YOLO - 貴方は一度だけ見る (YOLO)は次のステップを実行するオブジェクト検出アルゴリズムです。
+&#10230; YOLO - You Only Look Once (YOLO)は次の手順を実行する物体検出アルゴリズムです。
 
 <br>
 
 
 **66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
 
-&#10230; [ステップ1: 入力画像をGxGグリッドに分ける。, ステップ2: 各グリッドセルに対して次の形式のyを予測するCNNを実行する:,k回繰り返す]
+&#10230; [ステップ1: 入力画像をGxGグリッドに分割する。, ステップ2: 各グリッドセルに対して次の形式のyを予測するCNNを実行する:,k回繰り返す]
 
 <br>
 
 
 **67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
 
-&#10230; ここで、pcは物体認識の確率、bx,by,bh,bwはバウンディングボックスのプロパーティ、c1, ..., cpはpクラスのうちどれが検出されたかのワンホット表現で、kはアンカーボックスの数です。
+&#10230; ここで、pcは物体を検出する確率、bx,by,bh,bwは検出された境界ボックスの属性、c1, ..., cpはp個のクラスのうちどれが検出されたかのOne-hot表現、kはアンカーボックスの数です。
 
 <br>
 
 
 **68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
 
-&#10230; 潜在的な重複バウンディングボックスを除去する為に非最大抑制アルゴリズムを実行する。
+&#10230; ステップ3: 重複する可能性のある重なり合う境界ボックスを全て除去するため、非極大抑制アルゴリズムを実行する。
 
 <br>
 
 
 **69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
 
-&#10230; [元の画像, GxGグリッドでの分割, 物体検出, 非最大抑制]
+&#10230; [元の画像, GxGグリッドでの分割, 境界ボックス予測, 非極大抑制]
 
 <br>
 
 
 **70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
 
-&#10230; 注意: pc=0時、ネットワークは物体を検出しません。その場合には適当な予測 bx, ..., cpそれぞれは無視する必要があります。
+&#10230; 注: pc=0のとき、ネットワークは物体を検出しません。その場合には、対応する予測 bx, ..., cpは無視する必要があります。
 
 <br>
 
 
 **71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
 
-&#10230; R-CNN - 畳み込みニューラルネットワークを利用した領域は最初に潜在的な関連する境界ボックスを見つけるため画像を分割し、次にそれらの境界ボックス内の最も可能性の高いオブジェクトを見つけるため検出アルゴリズムを実行する物体検出アルゴリズムです。
+&#10230; R-CNN - Region with Convolutional Neural Networks (R-CNN)は物体検出アルゴリズムで、最初に画像をセグメント化して潜在的に関連する境界ボックスを見つけ、次に検出アルゴリズムを実行してそれらの境界ボックス内で最も可能性の高い物体を見つけます。
 
 <br>
 
 
 **72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
 
-&#10230; [元の画像, セグメンテーション, 物体予測, 非最大抑制]
+&#10230; [元の画像, セグメンテーション, 境界ボックス予測, 非極大抑制]
 
 <br>
 
 
 **73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
 
-&#10230; 注意: 元のアルゴリズムは計算コストが高くて遅くても、より新たなアーキテクチャでは、Fast R-CNNやFaster R-CNNなど、アルゴリズムをより高い速度に実行できる。
+&#10230; 注: 元のアルゴリズムは計算コストが高くて遅いですが、Fast R-CNNやFaster R-CNNなどの、より新しいアーキテクチャではアルゴリズムをより速く実行できます。
 
 <br>
 

From f2bc0bd543c03386a606f53370f7506a2c2c4c77 Mon Sep 17 00:00:00 2001
From: Takatoshi nao <takatoshi.nao@gmail.com>
Date: Tue, 10 Sep 2019 23:05:03 +0900
Subject: [PATCH 371/531] fixed review comment

---
 ja/refresher-probability.md | 102 ++++++++++++++++++------------------
 1 file changed, 51 insertions(+), 51 deletions(-)

diff --git a/ja/refresher-probability.md b/ja/refresher-probability.md
index b30513fbf..0d575ec38 100644
--- a/ja/refresher-probability.md
+++ b/ja/refresher-probability.md
@@ -1,24 +1,24 @@
 **1. Probabilities and Statistics refresher**
 
-&#10230;確率と統計
+&#10230;確率と統計の復習
 
 <br>
 
 **2. Introduction to Probability and Combinatorics**
 
-&#10230;確率と組合せの紹介
+&#10230;確率と組合せの導入
 
 <br>
 
 **3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
 
-&#10230;標本空間 - 試行可能なすべての結果の集合は標本空間として知られ、Sと表します。
+&#10230;標本空間 - ある試行のすべての起こりうる結果の集合はその試行の標本空間として知られ、Sと表します。
 
 <br>
 
 **4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
 
-&#10230;事象 - 標本空間のすべての部分集合のEを事象と言います。つまり事象は試行可能な結果で構成された集合です。試行結果がEに含まれるなら、Eが発生した言います。
+&#10230;事象 - 標本空間の任意の部分集合Eを事象と言います。つまり、ある事象はある試行の起こりうる結果により構成された集合です。ある試行結果がEに含まれるなら、Eが起きたと言います。
 
 <br>
 
@@ -30,25 +30,25 @@
 
 **6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
 
-&#10230;公理1 - すべての確立は0と1の間に含まれ次のようになります：
+&#10230;公理1 - すべての確率は0と1を含んでその間にあります。すなわち：
 
 <br>
 
 **7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
 
-&#10230;公理2 - 全体の標本空間で少なくとも一つの根元事象が起こる確率は1で次のようになります：
+&#10230;公理2 - 標本空間全体において少なくとも一つの根元事象が起こる確率は1です。すなわち：
 
 <br>
 
 **8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
 
-&#10230;公理3 - 相互に排他的なとある連続した事象E1,...Enに対し、次のようになります：
+&#10230;公理3 - 互いに排反な事象の任意の数列E1,...,Enに対し、次が成り立ちます：
 
 <br>
 
 **9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
 
-&#10230;順列(Permutation) - 順列はn個の中からr個を順番を考慮して並べられた配列です。このような配列の数はP(n, r)と表し、次のように定義します:
+&#10230;順列(Permutation) - 順列とはn個のものの中からr個をある順序で並べた配列です。このような配列の数はP(n,r)と表し、次のように定義します:
 
 <br>
 
@@ -60,7 +60,7 @@
 
 **11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
 
-&#10230;注釈: 0⩽r⩽nに対し、P(n,r)⩾C(n,r)となります。
+&#10230;注釈: 0⩽r⩽nのとき、P(n,r)⩾C(n,r)となります。
 
 <br>
 
@@ -72,7 +72,7 @@
 
 **13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
 
-&#10230;ベイズの定理 - P(B)>0のような事象A, Bに対して次となります:
+&#10230;ベイズの定理 - P(B)>0であるような事象A, Bに対して、次が成り立ちます:
 
 <br>
 
@@ -84,25 +84,25 @@
 
 **15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
 
-&#10230;分割(Partition) - {Ai,i∈[[1,n]]}はすべてのiに対してAi≠∅としましょう。{Ai}が次のような場合、分割と言います:
+&#10230;分割(Partition) - {Ai,i∈[[1,n]]}はすべてのiに対してAi≠∅としましょう。次が成り立つとき、{Ai}は分割であると言います:
 
 <br>
 
 **16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
 
-&#10230;注釈: 標本空間で任意の事象Bに対して、P(B)=n∑i=1P(B|Ai)P(Ai)となります。
+&#10230;注釈: 標本空間において任意の事象Bに対して、P(B)=n∑i=1P(B|Ai)P(Ai)が成り立ちます。
 
 <br>
 
 **17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
 
-&#10230;ベイズの定理の応用 - {Ai,i∈[[1,n]]}を標本空間の分割としましょう。次のようになります:
+&#10230;ベイズの定理の応用 - {Ai,i∈[[1,n]]}を標本空間の分割とすると、次が成り立ちます:
 
 <br>
 
 **18. Independence ― Two events A and B are independent if and only if we have:**
 
-&#10230;独立性 - 次の場合のみ事象AとBは独立であるといいます:
+&#10230;独立性 - 次が成り立つ場合かつその場合に限り、2つの事象AとBは独立であるといいます:
 
 <br>
 
@@ -120,67 +120,67 @@
 
 **21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
 
-&#10230;確率変数 - 確率変数は主にXと表記し標本空間のすべての要素に実線で対応する関数です。
+&#10230;確率変数 - 確率変数は、よくXと表記され、ある標本空間のすべての要素を実数直線に対応させる関数です。
 
 <br>
 
 **22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
 
-&#10230;累積分布関数(CDF) - 単調非減少の累積分布関数Fはlimx→−∞F(x)=0 and limx→+∞F(x)=1となり次のように定義します:
+&#10230;累積分布関数(CDF) - 累積分布関数Fは、単調非減少かつlimx→−∞F(x)=0 and limx→+∞F(x)=1であり、次のように定義されます:
 
 <br>
 
 **23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
 
-&#10230;注釈: P(a<X⩽B)=F(b)−F(a)となります。
+&#10230;注釈: P(a<X⩽B)=F(b)−F(a)が成り立ちます。
 
 <br>
 
 **24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
 
-&#10230;確率密度関数(PDF) - 確率密度関数Fは隣接する二つの確率変数の間に置かれる確率です。
+&#10230;確率密度関数(PDF) - 確率密度関数fは確率変数Xが2つの隣接する実現値の間の値をとる確率です。
 
 <br>
 
 **25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
 
-&#10230;PDFとCDFとの関係性 - 離散(D)と連続(C)の例から知るべき重要な特性があります。
+&#10230;PDFとCDFについての関係性 - 離散値(D)と連続値(C)のそれぞれの場合について知っておくべき重要な特性をここに挙げます。
 
 <br>
 
 **26. [Case, CDF F, PDF f, Properties of PDF]**
 
-&#10230;[例、CDF F、PDF f、PDFの特性]
+&#10230;[種類、CDF F、PDF f、PDFの特性]
 
 <br>
 
 **27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
 
-&#10230;分布の期待値と積率 - 離散または連続の場合、期待値E[X]、一般化した期待値E[g(X)]、k次の積率E[Xk]と特性関数ψ(ω):
+&#10230;分布の期待値と積率 - 離散値と連続値のそれぞれの場合における期待値E[X]、一般化した期待値E[g(X)]、k次の積率E[Xk]と特性関数ψ(ω)をここに挙げます:
 
 <br>
 
 **28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
 
-&#10230;分散(Variance) - 確率変数の分散は主にVar(X)またはσ2と表記し、分布関数の散布度を測定したものです。次のように決まります。
+&#10230;分散(Variance) - 確率変数の分散は、よくVar(X)またはσ2と表記され、その確率変数の分布関数のばらつきの尺度です。次のように計算されます。
 
 <br>
 
 **29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
 
-&#10230;標準偏差(Standard deviation) - 確率変数の標準偏差は主にσと表記し実確率変数の単位をしようする分布関数の散布度を測定したものです。次のように決まります。
+&#10230;標準偏差(Standard deviation) - 確率変数の標準偏差は、よくσと表記され、その確率変数の分布関数のばらつきの尺度であり、その確率変数の単位に則ったものです。次のように計算されます。
 
 <br>
 
 **30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
 
-&#10230;確率変数の変換 - 変数XとYは任意の関数に繋がってるとします。fXとfYに各々XとYの分布関数を表記すると次のようになります:
+&#10230;確率変数の変換 - 変数XとYはなんらかの関数により関連づけられているとします。fXとfYをそれぞれXとYの分布関数として表記すると次が成り立ちます:
 
 <br>
 
 **31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
 
-&#10230;ライプニッツ積分法 - gをxの関数とし、暫定的にcとしましょう。そしてcに従属的な境界a,bに対して次のようになります。
+&#10230;ライプニッツの積分則 - gをxと潜在的にcの関数とし、a,bをcに従属的な境界とすると、次が成り立ちます。
 
 <br>
 
@@ -192,73 +192,73 @@
 
 **33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
 
-&#10230;チェビシェフの不等式 - Xを期待値μをの確率変数とします。kに対して、σ>0なら次のような不等式を持ちます。
+&#10230;チェビシェフの不等式 - Xを期待値μの確率変数とします。k,σ>0のとき次の不等式が成り立ちます:
 
 <br>
 
 **34. Main distributions ― Here are the main distributions to have in mind:**
 
-&#10230;主な分布 - 覚えておくべき主な分布があります:
+&#10230;主な分布 - 覚えておくべき主な分布をここに挙げます:
 
 <br>
 
 **35. [Type, Distribution]**
 
-&#10230;[タイプ、分布]
+&#10230;[種類、分布]
 
 <br>
 
 **36. Jointly Distributed Random Variables**
 
-&#10230;結合確率変数
+&#10230;同時分布の確率変数
 
 <br>
 
 **37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
 
-&#10230;周辺密度と累積分布 - 結合密度確率関数fXYから次のようになります。
+&#10230;周辺密度と累積分布 - 同時確率密度関数fXYから次が成り立ちます。
 
 <br>
 
 **38. [Case, Marginal density, Cumulative function]**
 
-&#10230;[例,、周辺密度、累積関数]
+&#10230;[種類,、周辺密度、累積関数]
 
 <br>
 
 **39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
 
-&#10230;条件部密度(Conditional density) - Yに対するXの条件部密度は主にfx|Yと表記され、次のように定義されます:
+&#10230;条件付き密度(Conditional density) - Yに対するXの条件付き密度はよくfX|Yと表記され、次のように定義されます:
 
 <br>
 
 **40. Independence ― Two random variables X and Y are said to be independent if we have:**
 
-&#10230;独立性(Independence) - 二つの確率変数XとYは次の場合、独立的と言います。
+&#10230;独立性(Independence) - 2つの確率変数XとYは次が成り立つとき、独立であると言います:
 
 <br>
 
 **41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
 
-&#10230;共分散(Covariance) - 次のようにふたつの確率変数X,Yの共分散をσ2XYまたはさらに一般的にはCov(X,Y)で定義します。
+&#10230;共分散(Covariance) - 2つの確率変数XとYの共分散を、σ2XYまたはより一般的にはCov(X,Y)と表記し、次のように定義します:
 
 <br>
 
 **42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
 
-&#10230;相関関係(Correlation) - X, Yの標準変数をσX,σYで表記し、確率変数X,Yの相関関係をρXYで表記し、次のように定義します。
+&#10230;相関係数(Correlation) - X, Yの標準偏差をσX,σYと表記し、確率変数X,Yの相関関係をρXYと表記し、次のように定義します:
 
 <br>
 
 **43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
 
-&#10230;注釈 1: 任意の確率変数X,Yに対してρXY∈[−1,1]となります。
+&#10230;注釈 1: 任意の確率変数X,Yに対してρXY∈[−1,1]が成り立ちます。
 
 <br>
 
 **44. Remark 2: If X and Y are independent, then ρXY=0.**
 
-&#10230;注釈 2: XとYが独立ならρXY=0です。
+&#10230;注釈 2: XとYが独立ならば、ρXY=0です。
 
 <br>
 
@@ -276,25 +276,25 @@
 
 **47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
 
-&#10230;確率標本(Random sample) - 確率標本はXと独立で同一に分布するn個の確率変数X1,...,Xnの集まりです。
+&#10230;確率標本(Random sample) - 確率標本とはXに従う独立同分布のn個の確率変数X1,...,Xnの集合です。
 
 <br>
 
 **48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
 
-&#10230;推定量(Estimator) - 推定量は統計モデルで未知のパラメータの値を推定するために使用されるデータの関数です。
+&#10230;推定量(Estimator) - 推定量とは統計モデルにおける未知のパラメータの値を推定するのに用いられるデータの関数です。
 
 <br>
 
 **49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
 
-&#10230;偏り(Bias) - 推定量^θの偏りは^θの期待値と実際の値との差で定義されます。
+&#10230;偏り(Bias) - 推定量^θの偏りは^θのの分布の期待値と真の値との差として定義されます。すなわち:
 
 <br>
 
 **50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
 
-&#10230;注釈: 推定量はE[^θ]=θの場合、不偏といいます。
+&#10230;注釈: E[^θ]=θが成り立つとき、推定量は不偏であるといいます。
 
 <br>
 
@@ -306,55 +306,55 @@
 
 **52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
 
-&#10230;標本平均(Sample mean) - 確率標本の標本平均は実の平均μを推定するのに用いられ、主に¯¯¯¯¯Xと表記され次のように定義されます。
+&#10230;標本平均(Sample mean) - 確率標本の標本平均は、ある分布の真の平均μを推定するのに用いられ、よく¯¯¯¯¯Xと表記され、次のように定義されます:
 
 <br>
 
 **53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
 
-&#10230;注釈: 標本平均は不偏です。すなわちE[¯¯¯¯¯X]=μとなります。
+&#10230;注釈: 標本平均は不偏です。すなわちE[¯¯¯¯¯X]=μが成り立ちます。
 
 <br>
 
 **54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
 
-&#10230;中心極限定理 - 平均μと分散σ2を持つ分布を従う確率標本X1,...,Xnがある。その場合、次のようになります。
+&#10230;中心極限定理 - 確率標本X1,...,Xnが平均μと分散σ2を持つある分布に従うとすると、次が成り立ちます:
 
 <br>
 
 **55. Estimating the variance**
 
-&#10230;分散推定
+&#10230;分散の推定
 
 <br>
 
 **56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
 
-&#10230;標本分散 - 確率標本の標本分散は実の分散σ2を推定するのに用いられ、主にs2または^σ2と表記し次のように定義されます。
+&#10230;標本分散 - 確率標本の標本分散は、ある分布の真の分散σ2を推定するのに用いられ、よくs2または^σ2と表記され、次のように定義されます:
 
 <br>
 
 **57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
 
-&#10230;注釈: 標本分散は不偏です。つまりE[s2]=σ2になります。
+&#10230;注釈: 標本分散は不偏です。すなわちE[s2]=σ2が成り立ちます。
 
 <br>
 
 **58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
 
-&#10230;標本分散とカイ二乗の関係 - 確率標本の標本分散をs2としよう。次のようになります。
+&#10230;標本分散とカイ二乗分布との関係 - 確率標本の標本分散をs2とすると、次が成り立ちます:
 
 <br>
 
 **59. [Introduction, Sample space, Event, Permutation]**
 
-&#10230;[紹介、標本空間、事象、順列]
+&#10230;[導入、標本空間、事象、順列]
 
 <br>
 
 **60. [Conditional probability, Bayes' rule, Independence]**
 
-&#10230;[条件部確率、ベイズの定理、独立]
+&#10230;[条件付き確率、ベイズの定理、独立]
 
 <br>
 
@@ -372,7 +372,7 @@
 
 **63. [Jointly distributed random variables, Density, Covariance, Correlation]**
 
-&#10230;[結合分布の確率変数、密度、共分散、相関関係]
+&#10230;[同時分布の確率変数、密度、共分散、相関係数]
 
 <br>
 

From b4931982d4541fee47509e89aafcc4ef6c97e080 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 10 Sep 2019 22:45:06 -0700
Subject: [PATCH 372/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index be07af1f2..df7d8bcc4 100644
--- a/README.md
+++ b/README.md
@@ -65,7 +65,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
 |**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/173)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/142)|done|
+|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/173)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
 |**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|
 |**Português**|done|done|done|done|done|done|

From 5fc63dbb8ac7ebf311915bc712f32e5f91f0ff5b Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Thu, 12 Sep 2019 22:41:16 +0900
Subject: [PATCH 373/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 72 ++++++++++++++---------------
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index d8e5017b1..b1aeb644f 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -11,7 +11,7 @@
 
 **2. CS 230 - Deep Learning**
 
-&#10230; CS 230 - 深層学習
+&#10230; CS 230 - ディープラーニング
 
 <br>
 
@@ -25,7 +25,7 @@
 
 **4. [Types of layer, Convolution, Pooling, Fully connected]**
 
-&#10230; [層のタイプ, 畳み込み, プーリング, 全結合]
+&#10230; [層の種類, 畳み込み, プーリング, 全結合]
 
 <br>
 
@@ -53,14 +53,14 @@
 
 **8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
 
-&#10230; [物体検出, モデルのタイプ, 検出, IoU, 非極大抑制, YOLO, R-CNN]
+&#10230; [物体検出, モデルの種類, 検出, IoU, 非極大抑制, YOLO, R-CNN]
 
 <br>
 
 
 **9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
 
-&#10230; [顔認証/認識, One shot学習, シャムネットワーク, 三重項損失]
+&#10230; [顔認証/認識, One shot学習, シャムネットワーク, トリプレット損失]
 
 <br>
 
@@ -88,7 +88,7 @@
 
 **13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
 
-&#10230; 伝統的な畳み込みニューラルネットワークのアーキテクチャ - CNNとしても知られる畳み込みニューラルネットワークは一般的に次の層で構成される特定タイプのニューラルネットワークです。
+&#10230; 伝統的な畳み込みニューラルネットワークのアーキテクチャ - CNNとしても知られる畳み込みニューラルネットワークは一般的に次の層で構成される特定種類のニューラルネットワークです。
 
 <br>
 
@@ -102,7 +102,7 @@
 
 **15. Types of layer**
 
-&#10230; 層のタイプ
+&#10230; 層の種類
 
 <br>
 
@@ -130,7 +130,7 @@
 
 **19. [Type, Purpose, Illustration, Comments]**
 
-&#10230; [タイプ, 目的, 図, コメント]
+&#10230; [種類, 目的, 図, コメント]
 
 <br>
 
@@ -250,7 +250,7 @@
 
 **36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
 
-&#10230; モデルの複雑さを理解する - モデルの複雑さを評価するために、モデルのアーキテクチャが持つパラメータの数を測定することがしばしば有用です。畳み込みニューラルネットワークの各レイヤでは、以下のように行なわれます。
+&#10230; モデルの複雑さを理解する - モデルの複雑さを評価するために、モデルのアーキテクチャが持つパラメータの数を測定することがしばしば有用です。畳み込みニューラルネットワークの各層では、以下のように行なわれます。
 
 <br>
 
@@ -285,7 +285,7 @@
 
 **41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
 
-&#10230; 受容野 - レイヤkにおける受容野は、k番目の活性化マップの各ピクセルが「見る」ことができる入力のRk×Rkの領域です。レイヤjのフィルタサイズをFj、レイヤiのストライド値をSiとし、慣例に従ってS0=1とすると、レイヤkでの受容野は次の式で計算されます：
+&#10230; 受容野 - 層kにおける受容野は、k番目の活性化マップの各ピクセルが「見る」ことができる入力のRk×Rkの領域です。層jのフィルタサイズをFj、層iのストライド値をSiとし、慣例に従ってS0=1とすると、層kでの受容野は次の式で計算されます：
 
 <br>
 
@@ -306,7 +306,7 @@
 
 **44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
 
-&#10230; 正規化線形ユニット - 正規化線形ユニットレイヤ(ReLU)はボリュームの全ての要素に利用される活性化関数gです。ReLUの目的は非線型性をネットワークに導入することです。変種は以下の表でまとめられています：
+&#10230; 正規化線形ユニット - 正規化線形ユニット層(ReLU)はボリュームの全ての要素に利用される活性化関数gです。ReLUの目的は非線型性をネットワークに導入することです。変種は以下の表でまとめられています：
 
 <br>
 
@@ -390,7 +390,7 @@
 
 **56. [Bounding box detection, Landmark detection]**
 
-&#10230; [境界ボックス検出, ランドマーク検出]
+&#10230; [バウンディングボックス検出, ランドマーク検出]
 
 <br>
 
@@ -523,7 +523,7 @@
 
 **75. Types of models ― Two main types of model are summed up in table below:**
 
-&#10230; モデルのタイプ - 主な二つのモデルは次の表で要約される:
+&#10230; モデルの種類 - 2種類の主要なモデルが次の表にまとめられています:
 
 <br>
 
@@ -537,42 +537,42 @@
 
 **77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
 
-&#10230; [これは正しい人ですか?, 一対一見上げる, これはデータベース内のk人のうちの一人ですか, 一対多見上げる]
+&#10230; [これは正しい人ですか?, 1対1検索, これはデータベース内のK人のうちの1人ですか, 1対多検索]
 
 <br>
 
 
 **78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
 
-&#10230; ワンショット学習 - ワンショット学習は二つの与えられた画像の違いを定量かする類似性関数を学ぶ為有限トレーニングセットを利用する顔認証アルゴリズムです。二つの画像に適用される類似性関数はしばしばd(画像１、画像２)と記される。
+&#10230; ワンショット学習 - ワンショット学習は限られた学習セットを利用して、2つの与えられた画像の違いを定量化する類似度関数を学習する顔認証アルゴリズムです。2つの画像に適用される類似度関数はしばしばd(画像1, 画像2)と記されます。
 
 <br>
 
 
 **79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
 
-&#10230; シャムネットワー - シャムネットワーは2つの画像の違いを定量化して、画像暗号化方法を学ぶことを目的としている。与えられたインプット画像x(i)に対して暗号化された出力はしばしばf(x(i))と表示される。
+&#10230; シャムネットワーク - シャムネットワークは画像のエンコード方法を学習して2つの画像の違いを定量化することを目的としています。与えられた入力画像x(i)に対してエンコードされた出力はしばしばf(x(i))と記されます。
 
 <br>
 
 
 **80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
 
-&#10230; トリプレット損失 - トリプレット損失ℓはトリプレットの画像A(アンカー)、P(ポジティブ)、N(負)の埋め込み表現で計算する損失関数です。アンカーとポジティブ例は同じクラスに属し、ネガティブ例は別のものに属する。マージンパラメータはα∈R+と呼ぶことによってこの損失は次のように定義される:
+&#10230; トリプレット損失 - トリプレット損失ℓは3つ組の画像A(アンカー)、P(ポジティブ)、N(ネガティブ)の埋め込み表現で計算される損失関数です。アンカーとポジティブ例は同じクラスに属し、ネガティブ例は別のクラスに属します。マージンパラメータをα∈R+と呼ぶことによってこの損失は次のように定義されます:
 
 <br>
 
 
 **81. Neural style transfer**
 
-&#10230; 神経のスタイル転送
+&#10230; ニューラルスタイル変換
 
 <br>
 
 
 **82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
 
-&#10230; モチベーション - 神経のスタイル転送の目的は与えられたコンテンツCとスタイルSに基づく画像Gを生成する。
+&#10230; モチベーション - ニューラルスタイル変換の目的は与えられたコンテンツCとスタイルSに基づく画像Gを生成することです。
 
 <br>
 
@@ -586,98 +586,98 @@
 
 **84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
 
-&#10230; 活性化 - 与えられた層lで、活性化はa[l]と表示されて、nH×nw×ncの寸法。
+&#10230; 活性化 - 層lにおける活性化はa[l]と表記され、次元はnH×nw×ncです。
 
 <br>
 
 
 **85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
 
-&#10230; コンテンツコスト関数 - Jcontent(C, G)というコンテンツコスト関数は元のコンテンツ画像Cと生成された画像Gとの違いを決定するため利用される。以下のように定義される:
+&#10230; コンテンツコスト関数 - Jcontent(C, G)というコンテンツコスト関数は生成された画像Gと元のコンテンツ画像Cとの違いを測定するため利用されます。以下のように定義されます:
 
 <br>
 
 
 **86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
 
-&#10230; スタイル行列 - 与えられた層lのスタイル行列 G[l]はグラム配列で、各要素G[l]kk′がチャネルkとｋ′の相関関係を定量化する。
+&#10230; スタイル行列 - 与えられた層lのスタイル行列G[l]はグラム行列で、各要素G[l]kk′がチャネルkとk′の相関関係を定量化します。活性化a[l]に関して次のように定義されます。
 
 <br>
 
 
 **87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
 
-&#10230; 注意: スタイル画像及び生成された画像に対するスタイル行列はそれぞれG[l] (S)、G[l] (G)と表示される。
+&#10230; 注: スタイル画像及び生成された画像に対するスタイル行列はそれぞれG[l] (S)、G[l] (G)と表記されます。
 
 <br>
 
 
 **88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
 
-&#10230;　スタイルコスト関数 - スタイルコスト関数Jstyle(S,G)はスタイルSと生成された画像Gどう違うかを決定する為利用される。次のように定義される:
+&#10230;　スタイルコスト関数 - スタイルコスト関数Jstyle(S,G)は生成された画像GとスタイルSとの違いを測定するため利用されます。以下のように定義されます:
 
 <br>
 
 
 **89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
 
-&#10230; 全体コスト関数 - 全体コスト関数は以下のようにパラメータα,βによって重み付けされ、スタイルコスト関数とコンテンツの組み合わせた物として定義される:
+&#10230; 全体のコスト関数 - 全体のコスト関数は以下のようにパラメータα,βによって重み付けされたコンテンツ及びスタイルコスト関数の組み合わせとして定義されます:
 
 <br>
 
 
 **90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
 
-&#10230; 注意: αのより高い値はモデルが内容をより気にするようにさせ、βのより高い値はスタイルをより気にするようになる。
+&#10230; 注: αの値を大きくするとモデルはコンテンツを重視し、βの値を大きくするとスタイルを重視します。
 
 <br>
 
 
 **91. Architectures using computational tricks**
 
-&#10230; アーキテクチャは計算の詭計を利用している。
+&#10230; 計算トリックを使うアーキテクチャ
 
 <br>
 
 
 **92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
 
-&#10230; 生成型敵対的ネットワーク - 生成型敵対的ネットワーク、GANsとも呼ばれるは生成モデルと識別モデルで構成される、生成モデルの目的は生成された画像と実像を区別する目的とする識別にフィードされる最も真実の出力を生成する。
+&#10230; 敵対的生成ネットワーク - 敵対的生成ネットワーク（GANsとも呼ばれる）は生成モデルと識別モデルで構成されます。生成モデルの目的は、生成された画像と本物の画像を区別することを目的とする識別モデルに与えられる、最も本物らしい出力を生成することです。
 
 <br>
 
 
-**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+**93. [Training set, Noise, Real-world image, Generator, Discriminator, Real Fake]**
 
-&#10230; [トレーニング, 騒音, 現実世界の画像, ジェネレータ, 弁別器, 偽のリアル]
+&#10230; [学習セット, ノイズ, 現実世界の画像, 生成器, 識別器, 真 偽]
 
 <br>
 
 
 **94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
 
-&#10230; 注意: GANsの変種を使用するユースケースには画像へのテキスト, 音楽生成及び合成があります。
+&#10230; 注: GANsの変種を使用するユースケースにはテキストからの画像生成, 音楽生成及び合成があります。
 
 <br>
 
 
 **95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
 
-&#10230; ResNet - 残渣ネットワークアーキテクチャ（ResNetとも呼ばれる）はトレーニングエラーを減らすため多数の層がある残差ブロックを使用する。残差ブロックは次の特定方程式を有する。
+&#10230; ResNet - Residual Networkアーキテクチャ（ResNetとも呼ばれる）は学習エラーを減らすため多数の層がある残差ブロックを使用します。残差ブロックは次の特性方程式を有します。
 
 <br>
 
 
 **96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
 
-&#10230; インセプションネットワーク - このアーキテクチャはインセプションモジュールを利用し、特徴多様化を通じてパーフォーマンス改善の為別の畳み込みを試してみる目的とする。特に、計算負荷を限定する為1×1畳み込みトリックを使う。
+&#10230; インセプションネットワーク - このアーキテクチャはインセプションモジュールを利用し、特徴量の多様化を通じてパーフォーマンスを向上させるため、様々な畳み込みを試すことを目的としています。特に、計算負荷を限定するため1×1畳み込みトリックを使います。
 
 <br>
 
 
 **97. The Deep Learning cheatsheets are now available in [target language].**
 
-&#10230; 深層学習チートシートは今[ターゲット言語]で利用可能です。
+&#10230; ディープラーニングのチートシートが日本語で利用可能になりました。
 
 <br>
 
@@ -691,7 +691,7 @@
 
 **99. Translated by X, Y and Z**
 
-&#10230; X, Y, Zによる翻訳された
+&#10230; X・Y・Z 訳
 
 <br>
 
@@ -705,13 +705,13 @@
 
 **101. View PDF version on GitHub**
 
-&#10230; GithubでPDFバージョン見る
+&#10230; GitHubでPDF版を見る
 
 <br>
 
 
 **102. By X and Y**
 
-&#10230; X, Yによる
+&#10230; X・Y 著
 
 <br>

From bb287185faae355ea88df7cd70c2484427f24d12 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Wed, 18 Sep 2019 22:15:30 -0700
Subject: [PATCH 374/531] Update progress pt

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index df7d8bcc4..677c233aa 100644
--- a/README.md
+++ b/README.md
@@ -93,7 +93,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
 |**Polski**|not started|not started|not started|
-|**Português**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|not started|
+|**Português**|done|not started|not started|
 |**Русский**|not started|not started|not started|
 |**Türkçe**|done|done|done|
 |**Українська**|not started|not started|not started|

From 49c436d9ebf8589629fb4845718fe64846e33b91 Mon Sep 17 00:00:00 2001
From: Redouane Lguensat <redouanelg@hotmail.com>
Date: Thu, 26 Sep 2019 11:15:29 +0200
Subject: [PATCH 375/531] Update cheatsheet-unsupervised-learning.md

---
 ar/cheatsheet-unsupervised-learning.md | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/ar/cheatsheet-unsupervised-learning.md b/ar/cheatsheet-unsupervised-learning.md
index 300fd8dfd..b3471ab4e 100644
--- a/ar/cheatsheet-unsupervised-learning.md
+++ b/ar/cheatsheet-unsupervised-learning.md
@@ -1,7 +1,7 @@
 **1. Unsupervised Learning cheatsheet**
 
 <div dir="rtl">
-ورقة مراجعة للتعلم بدون إشراف
+  مرجع سريع للتعلّم غير المُوَجَّه
 </div>
 
 <br>
@@ -9,7 +9,7 @@
 **2. Introduction to Unsupervised Learning**
 
 <div dir=\"rtl\">
-  مقدمة للتعلم بدون إشراف
+  مقدمة للتعلّم غير المُوَجَّه
 </div>
 
 <br>
@@ -17,7 +17,7 @@
 **3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
 
 <div dir=\"rtl\"> 
-  {x(1),...,x(m)} الحافز ― الهدف من التعلم بدون إشراف هو إيجاد الأنماط الخفية في البيانات غير الموسومة 
+  {x(1),...,x(m)} الحافز ― الهدف من التعلّم غير المُوَجَّه هو إيجاد الأنماط الخفية في البيانات غير المٌعلمّة 
 </div> 
 
 <br>
@@ -25,8 +25,7 @@
 **4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
 
 <div dir="rtl">
-متباينة جينسن  ― لتكن f دالة محدبة و X متغير عشوائي. لدينا المتفاوتة التالية
-:
+متباينة جينسن ― لتكن f دالة محدبة و X متغير عشوائي. لدينا المتباينة التالية:
 </div>
 
 <br>
@@ -34,14 +33,14 @@
 **5. Clustering**
 
 <div dir="rtl">
-  تجميع
+  التجميع
 </div>
 <br>
 
 **6. Expectation-Maximization**
 
 <div dir="rtl">
-تحقيق أقصى قدر للتوقع
+  
 </div>
 <br>
 

From cdb5aba0cc3f052312d2fb078fb515b977eac462 Mon Sep 17 00:00:00 2001
From: Redouane Lguensat <redouanelg@hotmail.com>
Date: Thu, 26 Sep 2019 11:35:50 +0200
Subject: [PATCH 376/531] Approved reviews by @qunaieer

---
 ar/cheatsheet-unsupervised-learning.md | 123 ++++++++++++-------------
 1 file changed, 61 insertions(+), 62 deletions(-)

diff --git a/ar/cheatsheet-unsupervised-learning.md b/ar/cheatsheet-unsupervised-learning.md
index b3471ab4e..d98e37ea2 100644
--- a/ar/cheatsheet-unsupervised-learning.md
+++ b/ar/cheatsheet-unsupervised-learning.md
@@ -40,87 +40,91 @@
 **6. Expectation-Maximization**
 
 <div dir="rtl">
-  
+  تعظيم القيمة المتوقعة (Expectation-Maximization)
 </div>
 <br>
 
 **7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
 
 <div dir="rtl">
-المتغيرات الكامنة ― المتغيرات الكامنة هي متغيرات باطنية/غير معاينة تزيد من صعوبة مشاكل التقدير، غالبا ما ترمز بالحرف z. في مايلي الإعدادات الشائعة التي تحتوي على متغيرات كامنة.</div>
+  المتغيرات الكامنة ― المتغيرات الكامنة هي متغيرات مخفية/غير معاينة تزيد من صعوبة مشاكل التقدير، غالباً ما ترمز بالحرف z. في مايلي الإعدادات الشائعة التي تحتوي على متغيرات كامنة:
+</div>
 <br>
 
 **8. [Setting, Latent variable z, Comments]**
 
 <div dir="rtl">
-إعداد، متغير كامن z، تعاليق</div>
+  [الإعداد، المتغير الكامن z، ملاحظات]
+</div>
 <br>
 
 **9. [Mixture of k Gaussians, Factor analysis]**
 
 <div dir="rtl">
-مزيج من k غاوسيات، تحليل العوامل </div>
+  [خليط من k توزيع جاوسي، تحليل عاملي]
+</div>
 <br>
 
 **10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
 
 <div dir="rtl">
-خوارزمية ― خوارزمية تحقيق أقصى قدر للتوقع هي عبارة عن طريقة فعالة لتقدير المعامل θ عبر تقدير الاحتمال الأرجح، و يتم ذلك بشكل تكراري حيث يتم إيجاد حد أدنى لدالة الإمكان (الخطوة M) ثم يتم استمثال ذلك الحد الأدنى (الخطوة E) كما يلي:
+خوارزمية ― تعظيم القيمة المتوقعة (Expectation-Maximization) هي عبارة عن طريقة فعالة لتقدير المُدخل θ عبر تقدير تقدير الأرجحية الأعلى (maximum likelihood estimation)، ويتم ذلك بشكل تكراري حيث يتم إيجاد حد أدنى للأرجحية (الخطوة M)، ثم يتم تحسين (optimizing) ذلك الحد الأدنى (الخطوة E)، كما يلي:
 </div>
 <br>
 
 **11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
 
 <div dir="rtl">
-الخطوة E : حساب الاحتمال البعدي Qi(z(i)) بأن تصدر كل نقطة x(i) من التجمع z(i) كما يلي:
+الخطوة E : حساب الاحتمال البعدي Qi(z(i)) بأن تصدر كل نقطة x(i) من مجموعة (cluster) z(i) كما يلي:
 </div>
 <br>
 
 **12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
 
 <div dir="rtl">
-الخطوة M : يتم استعمال الاحتمالات البعدية Qi(z(i)) كأثقال خاصة لكل تجمع على النقط x(i) ، لكي يتم تقدير نموذج لكل تجمع بشكل منفصل، و ذلك كما يلي: 
+  الخطوة M : يتم استعمال الاحتمالات البعدية Qi(z(i)) كأوزان خاصة لكل مجموعة (cluster) على النقط x(i)، لكي يتم تقدير نموذج لكل مجموعة بشكل منفصل، و ذلك كما يلي:
 </div>
 <br>
 
 **13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
 
 <div dir="rtl">
-[ استهلالات غاوسية، خطوة التوقع، خطوة التعظيم، تقارب]
+[استهلالات جاوسية، خطوة القيمة المتوقعة، خطوة التعظيم، التقارب]
 </div>
 <br>
 
 **14. k-means clustering**
 
 <div dir="rtl">
-تجميع k-متوسطات
+التجميع بالمتوسطات k (k-mean clustering)
 </div>
 <br>
 
 **15. We note c(i) the cluster of data point i and μj the center of cluster j.**
 
 <div dir="rtl">
-نرمز تجمع النقط i ب c(i) ، و نرمز ب μj  j مركز التجمع
+نرمز لمجموعة النقط i بـ c(i)، ونرمز بـ μj مركز المجموعات j.
 </div>
 <br>
 
 **16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
 
 <div dir="rtl">
-بعد الاستهلال العشوائي لمتوسطات التجمعات μ1,μ2,...,μk∈Rn، خوارزمية تجميع k-متوسطات تكرر الخطوة التالية حتى التقارب
+خوارزمية - بعد الاستهلال العشوائي للنقاط المركزية (centroids) للمجوعات μ1,μ2,...,μk∈Rn، التجميع بالمتوسطات k تكرر الخطوة التالية حتى التقارب:
 </div>
 <br>
 
 **17. [Means initialization, Cluster assignment, Means update, Convergence]**
 
 <div dir="rtl">
-[استهلال المتوسطات، تعيين تجمع، تحديث المتوسطات، التقارب]</div>
+[استهلال المتوسطات، تعيين المجموعات، تحديث المتوسطات، التقارب]
+</div>
 <br>
 
 **18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
 
 <div dir="rtl">
-  دالة التشويه - لكي نتأكد من أن الخوارزمية تقاربت، ننظر إلى دالة التشويه المعرفة كما يلي:
+دالة التحريف (distortion function) - لكي نتأكد من أن الخوارزمية تقاربت، ننظر إلى دالة التحريف المعرفة كما يلي:
 </div>
 <br>
 
@@ -134,93 +138,95 @@
 **20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
 
 <div dir="rtl">
-  خوارزمية - هي عبارة عن خوارزمية تجميع تعتمد على طريقة تجميعية هرمية تبني مجموعات متداخلة بشكل متتال
+خوارزمية - هي عبارة عن خوارزمية تجميع تعتمد على طريقة تجميع هرمية تبني مجموعات متداخلة بشكل متتال.
 </div>
 <br>
 
 **21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
 
 <div dir="rtl">
-أنواع هنالك عدة أنواع من خوارزميات التجميع الهرمي التي ترمي إلى تحسين دوال هدف مختلفة، هاته الأنواع ملخصة في الجدول أسفله
+الأنواع - هنالك عدة أنواع من خوارزميات التجميع الهرمي التي ترمي إلى تحسين دوال هدف (objective functions) مختلفة، هذه الأنواع ملخصة في الجدول التالي:
 </div>
 <br>
 
 **22. [Ward linkage, Average linkage, Complete linkage]**
 
 <div dir="rtl">
-[الربط البَينِي، الربط المتوسط، الربط الكامل]</div>
+[ربط وارْد (ward linkage)، الربط المتوسط، الربط الكامل]
+</div>
 <br>
 
 **23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
 
 <div dir="rtl">
-[تقليل داخل مسافة التجمع، تقليل متوسط المسافات بين أزواج التجمعات، تقليل المسافة القصوى بين أزواج التجمعات]</div>
+[تصغير المسافة داخل المجموعة، تصغير متوسط المسافة بين أزواج المجموعات، تصغير المسافة العظمى بين أزواج المجموعات]</div>
 <br>
 
 **24. Clustering assessment metrics**
 
 <div dir="rtl">
-مقاييس تقدير التجميع
+مقاييس تقدير المجموعات
 </div>
 <br>
 
 **25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
 
 <div dir="rtl">
-في إعداد للتعلم بدون إشراف، من الصعب غالبا تقدير أداء نموذج ما لأننا لا نتوفر على القيم الحقيقية كما كان الحال في إعداد التعلم تحت إشراف 
-</div>
+في التعلّم غير المُوَجَّه من الصعب غالباً تقدير أداء نموذج ما، لأن القيم الحقيقية تكون غير متوفرة كما هو الحال في التعلًم المُوَجَّه.</div>
 <br>
 
 **26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
 
 <div dir="rtl">
-المعامل الظِلِّي - إذا رمزنا  aو b متوسط المسافة بين عينة و كل النقط المنتمية لنفس الصنف، و بين عينة  و كل النقط المنتمية لأقرب صنف، المعامل الظِلِّي s لعينة وحيدة معرف كالتالي:
+معامل الظّل (silhouette coefficient) - إذا رمزنا a و b لمتوسط المسافة بين عينة وكل النقط المنتمية لنفس الصنف، و بين عينة وكل النقط المنتمية لأقرب مجموعة، المعامل الظِلِّي s لعينة واحدة معرف كالتالي:
 </div>
 <br>
 
 **27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
 
 <div dir="rtl">
-مؤشر كالينسكي هاراباز - إذا رمزنا بk لعدد التجمعات، Bk و Wk مصفوفات التشتت بين التجمعات و داخلها معرفة كالتالي: </div>
+مؤشر كالينسكي-هارباز (Calinski-Harabaz index) - إذا رمزنا بـ k لعدد المجموعات، فإن Bk و Wk مصفوفتي التشتت بين المجموعات وداخلها تعرف كالتالي:
+</div>
 <br>
 
 **28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
 
 <div dir="rtl">
-مؤشر كالينسكي هاراباز s(k) يعطي تقييما للتجمعات الناتجة عن نموذج تجميعي، بحيث كلما كان التقييم أعلى كلما دل ذلك على  أن التجمعات أكثر كثافة و أكثر انفصالا. هذا المؤشر معرّف كالتالي</div>
+مؤشر كالينسكي-هارباز s(k) يشير إلى جودة نموذج تجميعي في تعريف مجموعاته، بحيث كلما كانت النتيجة أعلى كلما دل ذلك على أن المجموعات أكثر كثافة وأكثر انفصالاً فيما بينها. هذا المؤشر معرّف كالتالي:
+</div>
 <br>
 
 **29. Dimension reduction**
 
 <div dir="rtl">
-تخفيض الأبعاد</div>
+تقليص الأبعاد</div>
 <br>
 
 **30. Principal component analysis**
 
 <div dir="rtl">
-تحليل المكون الرئيسي
+تحليل المكون الرئيس
 </div>
 <br>
 
 **31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
 
 <div dir="rtl">
-إنها تقنية لخفض الأبعاد ترمي إلى إيجاد الاتجاهات المكبرة للتباين و التي تسقط عليها البيانات
+إنها طريقة لتقليص الأبعاد ترمي إلى إيجاد الاتجاهات المعظمة للتباين من أجل إسقاط البيانات عليها.
 </div>
 <br>
 
 **32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
 <div dir="rtl">
-  قيمة ذاتية، متجه ذاتي - لتكن A∈Rn×n مصفوفة ، نقول أن λ قيمة ذاتية للمصفوفة A إذا وُجِد متجه z∈Rn∖{0} يسمى متجها ذاتيا، بحيث:
+قيمة ذاتية (eigenvalue)، متجه ذاتي (eigenvector) - لتكن A∈Rn×n مصفوفة، نقول أن λ قيمة ذاتية للمصفوفة A إذا وُجِد متجه z∈Rn∖{0} يسمى متجهاً ذاتياً، بحيث:
 </div>
 <br>
 
 **33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
 <div dir="rtl">
-  نظرية الطّيف لتكن A∈Rn×n. إذا كانت A متماثلة فإنها شبه قطرية بمصفوفة متعامدة U∈Rn×n. إذا رمزنا Λ=diag(λ1,...,λn) ، لدينا:
+مبرهنة الطّيف (Spectral theorem) - لتكن A∈Rn×n. إذا كانت A متناظرة فإنها يمكن أن تكون شبه قطرية عن طريق مصفوفة متعامدة حقيقية U∈Rn×n. إذا رمزنا Λ=diag(λ1,...,λn) ، لدينا:
 </div>
 <br>
 
@@ -234,7 +240,7 @@
 **35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
 
 <div dir="rtl">
-ملحوظة: المتجه الذاتي المرتبط بأكبر قيمة ذاتية يسمى بالمتجه الذاتي الرئيسي للمصفوفة A
+ملحوظة: المتجه الذاتي المرتبط بأكبر قيمة ذاتية يسمى بالمتجه الذاتي الرئيسي (principal eigenvector) للمصفوفة A.
 </div>
 <br>
 
@@ -242,41 +248,42 @@
 dimensions by maximizing the variance of the data as follows:**
 
 <div dir="rtl">
-خوارزمية - تحليل المكون الرئيسي تقنية لخفض الأبعاد تهدف إلى إسقاط البيانات على k بعد بحيث يتم تكبير التباين، خطواتها كالتالي:</div>
+خوارزمية - تحليل المكون الرئيس (Principal Component Analysis (PCA)) طريقة لخفض الأبعاد تهدف إلى إسقاط البيانات على k بُعد بحيث يتم تعطيم التباين (variance)، خطواتها كالتالي:
+</div>
 <br>
 
 **37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
 
 <div dir="rtl">
-الخطوة 1: تسوية البيانات بحيث تصبح ذات متوسط يساوي صفر و انحراف معياري يساوي واحد
- </div>
+الخطوة 1: تسوية البيانات بحيث تصبح ذات متوسط يساوي صفر وانحراف معياري يساوي واحد.
+</div>
  <br>
 
 **38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
 
 <div dir="rtl">
-الخطوة 2: حساب Σ=1mm∑i=1x(i)x(i)T∈Rn×n ، و هي متماثلة و ذات قيم ذاتية حقيقية
+الخطوة 2: حساب Σ=1mm∑i=1x(i)x(i)T∈Rn×n، وهي متناظرة وذات قيم ذاتية حقيقية.
 </div>
 <br>
 
 **39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
 
 <div dir="rtl">
-الخطوة 3: حساب u1,...,uk∈Rn المتجهات الذاتية الرئيسية المتعامدة لΣ و عددها k ، يعني k من المتجهات الذاتية المتعامدة ذات القيم الذاتية الأكبر
- </div>
- <br>
+الخطوة 3: حساب u1,...,uk∈Rn المتجهات الذاتية الرئيسية المتعامدة لـ Σ وعددها k ، بعبارة أخرى، k من المتجهات الذاتية المتعامدة ذات القيم الذاتية الأكبر. 
+</div>
+<br>
 
 **40. Step 4: Project the data on spanR(u1,...,uk).**
 
 <div dir="rtl">
-الخطوة 4: إسقاط البيانات على spanR(u1,...,uk) 
+الخطوة 4: إسقاط البيانات على spanR(u1,...,uk).
 </div>
 <br>
 
 **41. This procedure maximizes the variance among all k-dimensional spaces.**
 
 <div dir="rtl">
-هذا الإجراء يضخم التباين بين كل الفضاءات البعدية
+هذا الإجراء يعظم التباين بين كل الفضاءات البُعدية.
 </div>
 <br>
 
@@ -297,59 +304,55 @@ dimensions by maximizing the variance of the data as follows:**
 **44. It is a technique meant to find the underlying generating sources.**
 
 <div dir="rtl">
-هي تقنية تهدف إلى إيجاد المصادر التوليدية الكامنة
+هي طريقة تهدف إلى إيجاد المصادر التوليدية الكامنة.
 </div>
 <br>
 
 **45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
 
 <div dir="rtl">
-افتراضات - لنفترض أن بياناتنا x تم توليدها من طرف s=(s1,...,sn) المصدر المتجهي ال n بعدي، بحيث متغيرات عشوائية مستقلة، و ذلك عبر مصفوفة خلط غير منفردة A
-كالتالي
+افتراضات - لنفترض أن بياناتنا x تم توليدها عن طريق المتجه المصدر s=(s1,...,sn) ذا n بُعد، حيث si متغيرات عشوائية مستقلة، وذلك عبر مصفوفة خلط غير منفردة (mixing and non-singular) A كالتالي:
 </div>
 <br>
 
 **46. The goal is to find the unmixing matrix W=A−1.**
 
 <div dir="rtl">
-الهدف هو العثور على مصفوفة الفصل W=A−1 </div>
+الهدف هو العثور على مصفوفة الفصل W=A−1.
+</div>
 <br>
 
 **47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
 <div dir="rtl">
-خوارزمية ICA
-Bell و Sejnowski ل 
-هاته الخوارزمية تجد مصفوفة الفصل W عن طريق الخطوات التالية
+خوارزمية تحليل المكونات المستقلة (ICA) لبيل وسجنوسكي (Bell and Sejnowski) - هذه الخوارزمية تجد مصفوفة الفصل W عن طريق الخطوات التالية:
 </div>
 <br>
 
 **48. Write the probability of x=As=W−1s as:**
 
 <div dir="rtl">
-اكتب احتمال x=As=W−1s كالتالي</div>
+اكتب الاحتمال لـ x=As=W−1s كالتالي:
+</div>
 <br>
 
 **49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
 
 <div dir="rtl">
- لتكن {x(i),i∈[[1,m]]}
-بيانات التمرن
-و  g دالة سيجمويد
-اكتب الأرجحية اللوغاريتمية كالتالي
+لتكن {x(i),i∈[[1,m]]} بيانات التمرن و g دالة سيجمويد، اكتب الأرجحية اللوغاريتمية (log likelihood) كالتالي:
 </div>
 <br>
 
 **50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
 
 <div dir="rtl">
-و منه، قاعدة التعلم للصعود التفاضلي العشوائي تقتضي أن لكل مثال تمرين x(i) ، نقوم بتحديث W  كما يلي
+هكذا، باستخدام الصعود الاشتقاقي العشوائي (stochastic gradient ascent)، لكل عينة تدريب x(i) نقوم بتحديث W كما يلي:
 </div>
 <br>
 
 **51. The Machine Learning cheatsheets are now available in Arabic.**
 
 <div dir="rtl">
-ورقات المراجعة للتعلم الآلي متوفرة حاليا باللغة العربية
+المرجع السريع لتعلم الآلة متوفر الآن باللغة العربية.
 </div>
 <br>
 
@@ -363,38 +366,34 @@ Bell و Sejnowski ل
 **53. Translated by X, Y and Z**
 
 <div dir="rtl">
-تم ترجمته بواسطة X, Y و Z </div>
+تمت الترجمة بواسطة X,Y و Z
 </div>
 <br>
 
 **54. Reviewed by X, Y and Z**
 
 <div dir="rtl">
-تم مراجعته بواسطة X, Y و Z </div>
+تمت المراجعة بواسطة X,Y و Z
+</div>
 <br>
 
 **55. [Introduction, Motivation, Jensen's inequality]**
 
 <div dir="rtl">
-[تقديم، تحفيز، متفاوتة جنسن]
+[مقدمة، الحافز، متباينة جينسن]
 </div>
 <br>
 
 **56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
 
 <div dir="rtl">
-[تجميع,
-  التوقع-التعظيم
-  , k-متوسطات
-  , التجميع الهرمي
-  , مقاييس]
-  
+[التجميع، تعظيم القيمة المتوقعة، تجميع k-متوسطات، التجميع الهرمي، مقاييس] 
 </div>
 <br>
 
 **57. [Dimension reduction, PCA, ICA]**
 
 <div dir="rtl">
-[خفض الأبعاد, PCA, ICA]
+[تقليص الأبعاد، تحليل المكون الرئيس (PCA)، تحليل المكونات المستقلة (ICA)]
 </div>
 <br>

From 810373587ef4c96e52fc0efb1fdd498988098c5f Mon Sep 17 00:00:00 2001
From: Minh Tuan <tuanhm2911@gmail.com>
Date: Fri, 27 Sep 2019 00:32:00 +0700
Subject: [PATCH 377/531] update vi/cs-229-probability

---
 vi/cs-229-probability.md | 385 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 385 insertions(+)
 create mode 100644 vi/cs-229-probability.md

diff --git a/vi/cs-229-probability.md b/vi/cs-229-probability.md
new file mode 100644
index 000000000..d0784543e
--- /dev/null
+++ b/vi/cs-229-probability.md
@@ -0,0 +1,385 @@
+**Probabilities and Statistics translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/refresher-probabilities-statistics)
+
+<br>
+
+**1. Probabilities and Statistics refresher**
+
+&#10230; Xác suất và Thống kê cơ bản
+
+<br>
+
+**2. Introduction to Probability and Combinatorics**
+
+&#10230; Giới thiệu về Xác suất và Tổ hợp
+
+<br>
+
+**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
+
+&#10230; Không gian mẫu - Một tập hợp các kết cục có thể xảy ra của một phép thử được gọi là không gian mẫu của phép thử và được kí hiệu là S.
+
+<br>
+
+**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
+
+&#10230; Sự kiện (hay còn gọi là biến cố) - Bất kỳ một tập hợp con E nào của không gian mẫu đều được gọi là một sự kiện. Một sự kiện là một tập các kết cục có thể xảy ra của phép thử. Nếu kết quả của phép thử chứa trong E, chúng ta nói sự kiện E đã xảy ra.
+
+<br>
+
+**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
+
+&#10230; Tiên đề của xác suất Với mỗi sự kiện E, chúng ta kí hiệu P(E) là xác suất sự kiện E xảy ra.
+
+<br>
+
+**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
+
+&#10230; Tiên đề 1 - Mọi xác suất bất kì đều nằm trong khoảng 0 đến 1.
+
+<br>
+
+**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
+
+&#10230; Tiên đề 2 - Xác suất xảy ra của ít nhất một phần tử trong toàn bộ không gian mẫu là 1. 
+
+<br>
+
+**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
+
+&#10230; Tiên đề 3 - Với một chuỗi các biến cố xung khắc E1,...,En, ta có:
+
+<br>
+
+**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
+
+&#10230; Hoán vị - Hoán vị là một cách sắp xếp r phần tử từ một nhóm n phần tử, theo một thứ tự nhất định. Số lượng cách sắp xếp như vậy là P(n,r), được định nghĩa như sau:
+
+<br>
+
+**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
+
+&#10230; Tổ hợp - Một tổ hợp là một cách sắp xếp r phần tử từ n phần tử, không quan trọng thứ tự. Số lượng cách sắp xếp như vậy là C(n,r), được định nghĩa như sau:
+
+<br>
+
+**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
+
+&#10230; Ghi chú: Chúng ta lưu ý rằng với 0⩽r⩽n, ta có P(n,r)⩾C(n,r)
+
+<br>
+
+**12. Conditional Probability**
+
+&#10230; Xác suất có điều kiện
+
+<br>
+
+**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
+
+&#10230; Định lí Bayes - Với các sự kiện A và B sao cho P(B)>0, ta có:
+
+<br>
+
+**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
+
+&#10230; Ghi chú: ta có P(A∩B)=P(A)P(B|A)=P(A|B)P(B)
+
+<br>
+
+**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
+
+&#10230; Phân vùng ― Cho {Ai,i∈[[1,n]]} sao cho với mỗi i, Ai≠∅. Chúng ta nói rằng {Ai} là một phân vùng nếu có:
+
+<br>
+
+**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
+
+&#10230; Ghi chú: với bất cứ sự kiện B nào trong không gian mẫu, ta có P(B)=n∑i=1P(B|Ai)P(Ai).
+
+<br>
+
+**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
+
+&#10230; Định lý Bayes mở rộng - Cho {Ai,i∈[[1,n]]} là một phân vùng của không gian mẫu. Ta có:
+
+<br>
+
+**18. Independence ― Two events A and B are independent if and only if we have:**
+
+&#10230; Sự kiện độc lập - Hai sự kiện A và B được coi là độc lập khi và chỉ khi ta có:
+
+<br>
+
+**19. Random Variables**
+
+&#10230; Biến ngẫu nhiên
+
+<br>
+
+**20. Definitions**
+
+&#10230; Định nghĩa
+
+<br>
+
+**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
+
+&#10230; Biến ngẫu nhiên - Một biến ngẫu nhiên, thường được kí hiệu là X, là một hàm nối mỗi phần tử trong một không gian mẫu thành một số thực
+
+<br>
+
+**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
+
+&#10230; Hàm phân phối tích lũy (CDF) ― Hàm phân phối tích lũy F, là một hàm đơn điệu không giảm, sao cho limx→−∞F(x)=0 và limx→+∞F(x)=1, được định nghĩa là:
+
+<br>
+
+**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
+
+&#10230; Ghi chú: chúng ta có P(a<X⩽B)=F(b)−F(a).
+
+<br>
+
+**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
+
+&#10230; Hàm mật độ xác suất (PDF) - Hàm mật độ xác suất f là xác suất mà X nhận các giá trị giữa hai giá trị thực liền kề của biến ngẫu nhiên.
+
+<br>
+
+**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
+
+&#10230; Mối quan hệ liên quan giữa PDF và CDF - Dưới đây là các thuộc tính quan trọng cần biết trong trường hợp rời rạc (D) và liên tục (C).
+
+<br>
+
+**26. [Case, CDF F, PDF f, Properties of PDF]**
+
+&#10230; [Trường hợp, CDF F, PDF f, Thuộc tính của PDF]
+
+<br>
+
+**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
+
+&#10230; Kỳ vọng và moment của phân phối - Dưới đây là các biểu thức của giá trị kì vọng E[X], giá trị kì vọng ​​tổng quát E[g(X)], moment bậc k E[Xk] và hàm đặc trưng ψ(ω) cho các trường hợp rời rạc và liên tục:
+
+<br>
+
+**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
+
+&#10230; Phương sai - Phương sai của một biến ngẫu nhiên, thường được kí hiệu là Var (X) hoặc σ2, là một độ đo mức độ phân tán của hàm phân phối. Nó được xác định như sau:
+
+<br>
+
+**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
+
+&#10230; Độ lệch chuẩn - Độ lệch chuẩn của một biến ngẫu nhiên, thường được kí hiệu σ, là thước đo mức độ phân tán của hàm phân phối của nó so với các đơn vị của biến ngẫu nhiên thực tế. Nó được xác định như sau:
+
+<br>
+
+**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
+
+&#10230; Biến đổi các biến ngẫu nhiên - Đặt các biến X và Y được liên kết với nhau bởi một hàm. Kí hiệu fX và fY lần lượt là các phân phối của X và Y, ta có:
+
+<br>
+
+**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
+
+&#10230; Quy tắc tích phân Leibniz - Gọi g là hàm của x và có khả năng c, và a, b là các ranh giới có thể phụ thuộc vào c. Chúng ta có:
+
+<br>
+
+**32. Probability Distributions**
+
+&#10230; Phân bố xác suất
+
+<br>
+
+**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
+
+&#10230; Bất đẳng thức Chebyshev - Gọi X là biến ngẫu nhiên có giá trị kỳ vọng μ. Với k,σ>0, chúng ta có bất đẳng thức sau:
+
+<br>
+
+**34. Main distributions ― Here are the main distributions to have in mind:**
+
+&#10230; Các phân phối chính - Dưới là các phân phối chính cần ghi nhớ:
+
+<br>
+
+**35. [Type, Distribution]**
+
+&#10230; [Loại, Phân phối]
+
+<br>
+
+**36. Jointly Distributed Random Variables**
+
+&#10230; Phân phối đồng thời biến ngẫu nhiên
+
+<br>
+
+**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
+
+&#10230; Mật độ biên và phân phối tích lũy - Từ hàm phân phối mật độ đồng thời fXY, ta có
+
+<br>
+
+**38. [Case, Marginal density, Cumulative function]**
+
+&#10230; [Trường hợp, Mật độ biên, Hàm tích lũy]
+
+<br>
+
+**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
+
+&#10230; Mật độ có điều kiện - Mật độ có điều kiện của X với Y, thường được kí hiệu là fX|Y, được định nghĩa như sau:
+
+<br>
+
+**40. Independence ― Two random variables X and Y are said to be independent if we have:**
+
+&#10230; Tính chất độc lập - Hai biến ngẫu nhiên X và Y độc lập nếu ta có:
+
+<br>
+
+**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
+
+&#10230; Hiệp phương sai - Chúng ta xác định hiệp phương sai của hai biến ngẫu nhiên X và Y, thường được kí hiệu σ2XY hay Cov(X,Y), như sau:
+
+<br>
+
+**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
+
+&#10230; Mối tương quan ― Kí hiệu σX,σY là độ lệch chuẩn của X và Y, chúng ta xác định mối tương quan giữa X và Y, kí hiệu ρXY, như sau:
+
+<br>
+
+**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
+
+&#10230; Ghi chú 1: chúng ta lưu ý rằng với bất cứ biến ngẫu nhiên X,Y nào, ta luôn có ρXY∈[−1,1].
+
+<br>
+
+**44. Remark 2: If X and Y are independent, then ρXY=0.**
+
+&#10230; Ghi chú 2: Nếu X và Y độc lập với nhau thì ρXY=0.
+
+<br>
+
+**45. Parameter estimation**
+
+&#10230; Ước lượng tham số
+
+<br>
+
+**46. Definitions**
+
+&#10230; Định nghĩa
+
+<br>
+
+**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
+
+&#10230; Mẫu ngẫu nhiên - Mẫu ngẫu nhiên là tập hợp của n biến ngẫu nhiên X1,...,Xn độc lập và được phân phối giống hệt với X.
+
+<br>
+
+**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
+
+&#10230; Công cụ ước tính (Estimator) - Công cụ ước tính (Estimator) là một hàm của dữ liệu được sử dụng để suy ra giá trị của một tham số chưa biết trong mô hình thống kê.
+
+<br>
+
+**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
+
+&#10230; Thiên vị (Bias) - Thiên vị (Bias) của Estimator ^θ được định nghĩa là chênh lệch giữa giá trị kì vọng ​​của phân phối ^θ và giá trị thực, tức là
+
+<br>
+
+**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
+
+&#10230; Ghi chú: một công cụ ước tính (estimator) được cho là không thiên vị (unbias) khi chúng ta có E[^θ]=θ.
+
+<br>
+
+**51. Estimating the mean**
+
+&#10230; Ước lượng trung bình
+
+<br>
+
+**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
+
+&#10230; Giá trị trung bình mẫu - Giá trị trung bình mẫu của mẫu ngẫu nhiên được sử dụng để ước tính giá trị trung bình thực μ của phân phối, thường được kí hiệu ¯¯¯¯¯X và được định nghĩa như sau:
+
+<br>
+
+**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
+
+&#10230; Ghi chú: Trung bình mẫu là không thiên vị (unbias), nghĩa là E[¯¯¯¯¯X]=μ.
+
+<br>
+
+**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
+
+&#10230; Định lý giới hạn trung tâm - Giả sử chúng ta có một mẫu ngẫu nhiên X1,...,Xn theo một phân phối nhất định với trung bình μ và phương sai σ2, sau đó chúng ta có:
+
+<br>
+
+**55. Estimating the variance**
+
+&#10230; Ước lượng phương sai
+
+<br>
+
+**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
+
+&#10230; Phương sai mẫu - Phương sai mẫu của mẫu ngẫu nhiên được sử dụng để ước lượng phương sai thực sự σ2 của phân phối, thường được kí hiệu là s2 hoặc ^σ2 và được định nghĩa như sau:
+
+<br>
+
+**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
+
+&#10230; Ghi chú: phương sai mẫu không thiên vị (unbias), nghĩa là E[s2]=σ2.
+
+<br>
+
+**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
+
+&#10230; Quan hệ Chi-Squared với phương sai mẫu - Với s2 là phương sai mẫu của một mẫu ngẫu nhiên, ta có:
+
+<br>
+
+**59. [Introduction, Sample space, Event, Permutation]**
+
+&#10230; [Giới thiệu, Không gian mẫu, Sự kiện, Hoán vị]
+
+<br>
+
+**60. [Conditional probability, Bayes' rule, Independence]**
+
+&#10230; [Xác suất có điều kiện, Định lý Bayes, Sự độc lập]
+
+<br>
+
+**61. [Random variables, Definitions, Expectation, Variance]**
+
+&#10230; [Biến ngẫu nhiên, Định nghĩa, Kì vọng, Phương sai]
+
+<br>
+
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
+
+&#10230; [Phân bố xác suất, Bất đẳng thức Chebyshev, Xác suất chính]
+
+<br>
+
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
+
+&#10230; [Các biến ngẫu nhiên đồng thời, Mật độ, Hiệp phương sai, Mối tương quan]
+
+<br>
+
+**64. [Parameter estimation, Mean, Variance]**
+
+&#10230; [Ước lượng tham số, Trung bình, Phương sai]

From aac5fa73ed32c57d017ca8e74c9286b06adabe7b Mon Sep 17 00:00:00 2001
From: Minh Tuan <tuanhm2911@gmail.com>
Date: Fri, 27 Sep 2019 01:50:34 +0700
Subject: [PATCH 378/531] update vi/cs-229-linear-algebra

---
 vi/cs-229-linear-algebra.md | 345 ++++++++++++++++++++++++++++++++++++
 1 file changed, 345 insertions(+)
 create mode 100644 vi/cs-229-linear-algebra.md

diff --git a/vi/cs-229-linear-algebra.md b/vi/cs-229-linear-algebra.md
new file mode 100644
index 000000000..8d12bc89a
--- /dev/null
+++ b/vi/cs-229-linear-algebra.md
@@ -0,0 +1,345 @@
+**Linear Algebra and Calculus translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus)
+
+<br>
+
+**1. Linear Algebra and Calculus refresher**
+
+&#10230; Đại số tuyến tính và Giải tích cơ bản
+
+<br>
+
+**2. General notations**
+
+&#10230; Kí hiệu chung
+
+<br>
+
+**3. Definitions**
+
+&#10230; Định nghĩa
+
+<br>
+
+**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
+
+&#10230; Vector - Chúng ta kí hiệu x∈Rn là một vector với n phần tử, với xi∈R là phần tử thứ i:
+
+<br>
+
+**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
+
+&#10230; Ma trận - Kí hiệu A∈Rm×n là một ma trận với m hàng và n cột, Ai,j∈R là phần tử nằm ở hàng thứ i, cột j:
+
+<br>
+
+**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
+
+&#10230; Ghi chú: Vector x được xác định ở trên có thể coi như một ma trận nx1 và được gọi là vector cột.
+
+<br>
+
+**7. Main matrices**
+
+&#10230; Ma trận chính
+
+<br>
+
+**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
+
+&#10230; Ma trận đơn vị - Ma trận đơn vị I∈Rn×n là một ma trận vuông với các phần tử trên đường chéo chính bằng 1 và các phần tử còn lại bằng 0:
+
+<br>
+
+**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
+
+&#10230; Ghi chú: với mọi ma trận vuông A∈Rn×n, ta có A×I=I×A=A.
+
+<br>
+
+**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
+
+&#10230; Ma trận đường chéo - Ma trận đường chéo D∈Rn×n là một ma trận vuông với các phần tử trên đường chéo chính khác 0 và các phần tử còn lại bằng 0:
+
+<br>
+
+**11. Remark: we also note D as diag(d1,...,dn).**
+
+&#10230; Ghi chú: Chúng ta kí hiệu D là diag(d1,...,dn).
+
+<br>
+
+**12. Matrix operations**
+
+&#10230; Các phép toán ma trận
+
+<br>
+
+**13. Multiplication**
+
+&#10230; Phép nhân
+
+<br>
+
+**14. Vector-vector ― There are two types of vector-vector products:**
+
+&#10230; Vector-vector ― Có hai loại phép nhân vector-vector:
+
+<br>
+
+**15. inner product: for x,y∈Rn, we have:**
+
+&#10230; Phép nhân inner: với x,y∈Rn, ta có:
+
+<br>
+
+**16. outer product: for x∈Rm,y∈Rn, we have:**
+
+&#10230; Phép nhân outer: với x∈Rm,y∈Rn, ta có:
+
+<br>
+
+**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
+
+&#10230; Ma trận - Vector ― Phép nhân giữa ma trận A∈Rm×n và vector x∈Rn là một vector có kích thước Rn:
+
+<br>
+
+**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
+
+&#10230; với aTr,i là các vector hàng và ac,j là các vector cột của A, và xi là các phần tử của x.
+
+<br>
+
+**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
+
+&#10230; Ma trận - ma trận ― Phép nhân giữa ma trận A∈Rm×n và B∈Rn×p là một ma trận kích thước Rn×p:
+
+<br>
+
+**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
+
+&#10230; với aTr,i,bTr,i là các vector hàng và ac,j,bc,j lần lượt là các vector cột của A and B.
+
+<br>
+
+**21. Other operations**
+
+&#10230; Một số phép toán khác
+
+<br>
+
+**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
+
+&#10230; Chuyển vị ― Chuyển vị của một ma trận A∈Rm×n, kí hiệu AT, khi các phần tử hàng cột hoán vị trí cho nhau:
+
+<br>
+
+**23. Remark: for matrices A,B, we have (AB)T=BTAT**
+
+&#10230; Ghi chú: với ma trận A,B, ta có (AB)T=BTAT
+
+<br>
+
+**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
+
+&#10230; Nghịch đảo ― Nghịch đảo của ma trận vuông khả đảo A được kí hiệu là A-1 và chỉ tồn tại duy nhất:
+
+<br>
+
+**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
+
+&#10230; Ghi chú: không phải tất cả các ma trận vuông đều khả đảo. Ngoài ra, với ma trận A,B, ta có (AB)−1=B−1A−1
+
+<br>
+
+**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
+
+&#10230; Truy vết ― Truy vết của ma trận vuông A, kí hiệu tr(A), là tổng của các phần tử trên đường chéo chính của nó:
+
+<br>
+
+**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
+
+&#10230; Ghi chú: với ma trận A,B, chúng ta có tr(AT)=tr(A) và tr(AB)=tr(BA)
+
+<br>
+
+**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
+
+&#10230; Định thức ― Định thức của một ma trận vuông A∈Rn×n, kí hiệu |A| hay det(A) được tính hồi quy với A∖i,∖j, ma trận A xóa đi hàng thứ i và cột thứ j:
+
+<br>
+
+**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
+
+&#10230; Ghi chú: A khả đảo nếu và chỉ nếu |A|≠0. Ngoài ra, |AB|=|A||B| và |AT|=|A|.
+
+<br>
+
+**30. Matrix properties**
+
+&#10230; Những tính chất của ma trận
+
+<br>
+
+**31. Definitions**
+
+&#10230; Định nghĩa
+
+<br>
+
+**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
+
+&#10230; Phân rã đối xứng - Một ma trận A đã cho có thể được biểu diễn dưới dạng các phần đối xứng và phản đối xứng của nó như sau:
+
+<br>
+
+**33. [Symmetric, Antisymmetric]**
+
+&#10230; [Đối xứng, Phản đối xứng]
+
+<br>
+
+**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
+
+&#10230; Chuẩn (norm) ― Một chuẩn (norm) là một hàm N:V⟶[0,+∞[ mà V là một không gian vector, và với mọi x,y∈V, ta có:
+
+<br>
+
+**35. N(ax)=|a|N(x) for a scalar**
+
+&#10230; N(ax)=|a|N(x) với a là một số
+
+<br>
+
+**36. if N(x)=0, then x=0**
+
+&#10230; nếu N(x)=0, thì x=0
+
+<br>
+
+**37. For x∈V, the most commonly used norms are summed up in the table below:**
+
+&#10230; Với x∈V, các chuẩn thường dùng được tổng hợp ở bảng dưới đây:
+
+<br>
+
+**38. [Norm, Notation, Definition, Use case]**
+
+&#10230; [Chuẩn, Kí hiệu, Định nghĩa, Trường hợp dùng]
+
+<br>
+
+**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
+
+&#10230; Sự phụ thuộc tuyến tính―- Một tập hợp các vectơ được cho là phụ thuộc tuyến tính nếu một trong các vectơ trong tập hợp có thể được biểu diễn bởi một tổ hợp tuyến tính của các vectơ khác.
+
+<br>
+
+**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
+
+&#10230; Ghi chú: nếu không có vectơ nào có thể được viết theo cách này, thì các vectơ được cho là độc lập tuyến tính
+
+<br>
+
+**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
+
+&#10230; Hạng ma trận (rank) ― Hạng của một ma trận A kí hiệu rank(A) và là số chiều của không gian vectơ được tạo bởi các cột của nó. Điều này tương đương với số cột độc lập tuyến tính tối đa của A.
+
+<br>
+
+**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
+
+&#10230; Ma trận bán xác định dương - Ma trận A∈Rn×n là bán xác định dương (PSD) kí hiệu A⪰0 nếu chúng ta có:
+
+<br>
+
+**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
+
+&#10230; Ghi chú: tương tự, một ma trận A được cho là xác định dương và được kí hiệu A≻0, nếu đó là ma trận PSD thỏa mãn cho tất cả các vectơ khác không x, xTAx>0.
+
+<br>
+
+**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230; Giá trị riêng, vector riêng - Cho ma trận A∈Rn×n, λ được gọi là giá trị riêng của A nếu tồn tại một vectơ z∈Rn∖{0}, được gọi là vector riêng, sao cho:
+
+<br>
+
+**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230; Định lý phổ - Cho A∈Rn×n. Nếu A đối xứng, thì A có thể chéo hóa bởi một ma trận trực giao thực U∈Rn×n. Bằng cách kí hiệu Λ=diag(1,...,n), chúng ta có:
+
+<br>
+
+**46. diagonal**
+
+&#10230; đường chéo
+
+<br>
+
+**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
+
+&#10230; Phân tích giá trị suy biến - Đối với một ma trận A có kích thước m×n, Phân tích giá trị suy biến (SVD) là một kỹ thuật phân tích nhân tố nhằm đảm bảo sự tồn tại của đơn vị U m×m, đường chéo Σm×n và đơn vị V n×n ma trận, sao cho:
+
+<br>
+
+**48. Matrix calculus**
+
+&#10230; Giải tích ma trận
+
+<br>
+
+**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
+
+&#10230; Gradient ― Cho f:Rm×n→R là một hàm và A∈Rm×n là một ma trận. Gradient của f đối với A là ma trận m×n, được kí hiệu là ∇Af(A), sao cho:
+
+
+
+<br>
+
+**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
+
+&#10230; Ghi chú: gradient của f chỉ được xác định khi f là hàm trả về một số.
+
+<br>
+
+**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
+
+&#10230; Hessian - Cho f:Rn→R là một hàm và x∈Rn là một vector. Hessian của f đối với x là một ma trận đối xứng n×n, ghi chú ∇2xf(x), sao cho:
+
+<br>
+
+**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
+
+&#10230; Ghi chú: hessian của f chỉ được xác định khi f là hàm trả về một số.
+
+<br>
+
+**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
+
+&#10230; Các phép toán của gradient ― Đối với ma trận A,B,C, các thuộc tính gradient sau cần để lưu ý:
+
+<br>
+
+**54. [General notations, Definitions, Main matrices]**
+
+&#10230; [Kí hiệu chung, Định nghĩa, Ma trận chính]
+
+<br>
+
+**55. [Matrix operations, Multiplication, Other operations]**
+
+&#10230; [Phép toán ma trận, Phép nhân, Các phép toán khác]
+
+<br>
+
+**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
+
+&#10230; [Các thuộc tính ma trận, Chuẩn, Giá trị riêng/Vector riêng, Phân tích giá trị suy biến]
+
+<br>
+
+**57. [Matrix calculus, Gradient, Hessian, Operations]**
+
+&#10230; [Giải tích ma trận, Gradient, Hessian, Phép tính]

From b61342f35d3bcfb3d4b9dec993bf8de0fb0ccf24 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Fri, 27 Sep 2019 11:11:18 +0900
Subject: [PATCH 379/531] [ja] Cheatsheet Unsupervised learning

---
 ja/cheatsheet-unsupervised-learning.md | 105 ++++++++++++-------------
 1 file changed, 52 insertions(+), 53 deletions(-)

diff --git a/ja/cheatsheet-unsupervised-learning.md b/ja/cheatsheet-unsupervised-learning.md
index 08e3f593a..77cf53bf9 100644
--- a/ja/cheatsheet-unsupervised-learning.md
+++ b/ja/cheatsheet-unsupervised-learning.md
@@ -6,19 +6,19 @@
 
 **2. Introduction to Unsupervised Learning**
 
-&#10230;教師なし学習のはじめに
+&#10230;教師なし学習の概要
 
 <br>
 
 **3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
 
-&#10230;モチベーション - 教師なし学習の目的はラベルんなしデータ{x(1),...,x(m)}の中の隠されたパターンを探す。
+&#10230;モチベーション - 教師なし学習の目的はラベルのないデータ{x(1),...,x(m)}に隠されたパターンを探すことです。
 
 <br>
 
 **4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
 
-&#10230;ジェンセンの不平等 - fを凸関数とし、Xをランダム変数。次の不平等がある:
+&#10230;イェンセンの不等式 - fを凸関数、Xを確率変数とすると、次の不等式が成り立ちます:
 
 <br>
 
@@ -30,13 +30,13 @@
 
 **6. Expectation-Maximization**
 
-&#10230;EM
+&#10230;期待値最大化
 
 <br>
 
 **7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
 
-&#10230;潜在変数 - 潜在変数は推定問題を困難にする隠される変数であり、zで示される。潜在変数がある最も一般的な設定はこれ:
+&#10230;潜在変数 - 潜在変数は推定問題を困難にする隠れた/観測されていない変数であり、多くの場合zで示されます。潜在変数がある最も一般的な設定は次のとおりです:
 
 <br>
 
@@ -48,61 +48,61 @@
 
 **9. [Mixture of k Gaussians, Factor analysis]**
 
-&#10230;[kガウス分布の混合, 因子分析]
+&#10230;[k個のガウス分布の混, 因子分析]
 
 <br>
 
 **10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
 
-&#10230;アルゴリズム - EMアルゴリズムは尤度の下限(E-ステップ)を繰り返し構築し、その下限(M-ステップ)次の通りに最適することにより、最尤推定を通じてパラメーターθを推定する効率な方法を共有する:
+&#10230;アルゴリズム - EMアルゴリズムは次のように尤度の下限の構築(E-ステップ)と、その下限の最適化(M-ステップ)を繰り返し行うことによる最尤推定によりパラメーターθを推定する効率的な方法を提供します:
 
 <br>
 
 **11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
 
-&#10230;E-ステップ: 次のように各データポイントx(i)が特定クラスターz(i)に由来する事後確率Qi(z(i))を評価する:
+&#10230;E-ステップ: 各データポイントx(i)が特定クラスターz(i)に由来する事後確率Qi(z(i))を次のように評価します:
 
 <br>
 
 **12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
 
-&#10230;M-ステップ: 次のように各クラスターモデル別途再見積もりのためデータポイントx(i)のクラスター固有の重みとして事後確率Qi(z(i))を使う:
+&#10230;M-ステップ: 事後確率Qi(z(i))をデータポイントx(i)のクラスター固有の重みとして使い、次のように各クラスターモデルを個別に再推定します:
 
 <br>
 
 **13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
 
-&#10230;[ガウス分布初期化, 期待ステップ, 最大化ステップ, 収束]
+&#10230;[ガウス分布初期化, 期待値ステップ, 最大化ステップ, 収束]
 
 <br>
 
 **14. k-means clustering**
 
-&#10230;k-meansクラスタリング
+&#10230;k平均法
 
 <br>
 
 **15. We note c(i) the cluster of data point i and μj the center of cluster j.**
 
-&#10230;クラスタのデータポイントiをc(i)、クラスタjのセンターをμjで表示する。
+&#10230;データポイントiのクラスタをc(i)、クラスタjの中心をμjと表記します。
 
 <br>
 
 **16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
 
-&#10230;アルゴリズム - クラスターのセンターポイントμ1,μ2,...,μk∈Rnを偶然初期化後、k-meansアルゴリズムが次のようなステップを収束まで繰り返す:
+&#10230;クラスターの重心μ1,μ2,...,μk∈Rnをランダムに初期化後、k-meansアルゴリズムが収束するまで次のようなステップを繰り返します:
 
 <br>
 
 **17. [Means initialization, Cluster assignment, Means update, Convergence]**
 
-&#10230; [Means初期化, クラスター割り立て, Means更新, 収束]
+&#10230; [平均の初期化, クラスター割り当て,平均の更新, 収束]
 
 <br>
 
 **18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
 
-&#10230;ディストーション関数 - アルゴリズムが収束するかどうかを確認するため、次のように定義されたディストーション関数を参照する:
+&#10230;ひずみ関数 - アルゴリズムが収束するかどうかを確認するため、次のように定義されたひずみ関数を参照します:
 
 <br>
 
@@ -114,194 +114,193 @@
 
 **20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
 
-&#10230;アルゴリズム - これは入れ子クラスタを連続で構築する凝集階層アプローチによるクラスタリングアルゴリズムだ。
+&#10230;アルゴリズム - これは入れ子になったクラスタを逐次的に構築する凝集階層アプローチによるクラスタリングアルゴリズムです。
 
 <br>
 
 **21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
 
-&#10230;
+&#10230; 種類 ― 様々な目的関数を最適化するための様々な種類の階層クラスタリングアルゴリズムが以下の表にまとめられています。
 
 <br>
 
 **22. [Ward linkage, Average linkage, Complete linkage]**
 
-&#10230;
+&#10230; [Ward linkage, Average linkage, Complete linkage]
 
 <br>
 
 **23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
 
-&#10230;
+&#10230; [クラスター内の距離最小化、クラスターペア間の平均距離の最小化、クラスターペア間の最大距離の最小化]
 
 <br>
 
 **24. Clustering assessment metrics**
 
-&#10230;
+&#10230; クラスタリング評価指標
 
 <br>
 
 **25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
 
-&#10230;
+&#10230; 教師なし学習では、教師あり学習の場合のような正解ラベルがないため、モデルの性能を評価することが難しい場合が多いです。
 
 <br>
 
 **26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
 
-&#10230;
+&#10230; シルエット係数 ― サンプルと同じクラスタ内のその他全ての点との平均距離をa、最も近いクラスタ内の全ての点との平均距離をbと表記すると、サンプルのシルエット係数sは次のように定義されます:
 
 <br>
 
 **27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
 
-&#10230;
+&#10230; Calinski-Harabazインデックス ― クラスタの数をkと表記すると、クラスタ間およびクラスタ内の分散行列であるBkおよびWkはそれぞれ以下のように定義されます。
 
 <br>
 
 **28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
 
-&#10230;
+&#10230; Calinski-Harabazインデックスs(k)はクラスタリングモデルが各クラスタをどの程度適切に定義しているかを示します。スコアが高いほど、各クラスタはより密で、十分に分離されています。 それは次のように定義されます:
 
 <br>
 
 **29. Dimension reduction**
 
-&#10230;
+&#10230; 次元削減
 
 <br>
 
 **30. Principal component analysis**
 
-&#10230;
+&#10230; 主成分分析
 
 <br>
 
 **31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
 
-&#10230;
+&#10230; これはデータを投影する方向で、分散を最大にする方向を見つける次元削減手法です。
 
 <br>
 
 **32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
-&#10230;
+&#10230; 固有値、固有ベクトル - 行列 A∈Rn×nが与えられたとき、次の式で固有ベクトルと呼ばれるベクトルz∈Rn∖{0}が存在した場合に、λはAの固有値と呼ばれる。
 
 <br>
 
 **33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230;
+&#10230; スペクトル定理 - A∈Rn×nとする。Aが対称のとき、Aは実直交行列U∈Rn×nを用いて対角化可能である。Λ=diag(λ1,...,λn)と表記することで、次の式を得る。
 
 <br>
 
 **34. diagonal**
 
-&#10230;
+&#10230; 対角
 
 <br>
 
 **35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
 
-&#10230;
+&#10230; 注釈: 最大固有値に対応する固有ベクトルは行列Aの第1固有ベクトルと呼ばれる。
 
 <br>
 
-**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
-dimensions by maximizing the variance of the data as follows:**
+**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:**
 
-&#10230;
+&#10230; アルゴリズム ― 主成分分析 (PCA)の過程は、次のようにデータの分散を最大化することによりデータをk次元に射影する次元削減の技術である。
 
 <br>
 
 **37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
 
-&#10230;
+&#10230; ステップ1：平均が0で標準偏差が1となるようにデータを正規化します。
 
 <br>
 
 **38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
 
-&#10230;
+&#10230; ステップ2：実固有値に関して対称であるΣ=1mm∑i=1x(i)x(i)T∈Rn×nを計算します。
 
 <br>
 
 **39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
 
-&#10230;
+&#10230; ステップ3：k個のΣの対角主値固有ベクトルu1,...,uk∈Rn、すなわちk個の最大の固有値の対角固有ベクトルを計算します。
 
 <br>
 
 **40. Step 4: Project the data on spanR(u1,...,uk).**
 
-&#10230;
+&#10230; ステップ4：データをspanR(u1,...,uk)に射影します。
 
 <br>
 
 **41. This procedure maximizes the variance among all k-dimensional spaces.**
 
-&#10230;
+&#10230; この過程は全てのk次元空間の間の分散を最大化します。
 
 <br>
 
 **42. [Data in feature space, Find principal components, Data in principal components space]**
 
-&#10230;
+&#10230; [特徴空間内のデータ, 主成分を見つける, 主成分空間内のデータ]
 
 <br>
 
 **43. Independent component analysis**
 
-&#10230;
+&#10230; 独立成分分析
 
 <br>
 
 **44. It is a technique meant to find the underlying generating sources.**
 
-&#10230;
+&#10230; 隠れた生成源を見つけることを意図した技術です。
 
 <br>
 
 **45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
 
-&#10230;
+&#10230; 仮定 ― 混合かつ非特異行列Aを通じて、データxはn次元の元となるベクトルs=(s1,...,sn)から次のように生成されると仮定します。ただしsiは独立でランダムな変数です：
 
 <br>
 
 **46. The goal is to find the unmixing matrix W=A−1.**
 
-&#10230;
+&#10230; 非混合行列W=A−1を見つけることが目的です。
 
 <br>
 
 **47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
 
-&#10230;
+&#10230; ベルとシノスキーのICAアルゴリズム ― このアルゴリズムは非混合行列Wを次のステップによって見つけます：
 
 <br>
 
 **48. Write the probability of x=As=W−1s as:**
 
-&#10230;
+&#10230; x=As=W−1sの確率を次のように表します：
 
 <br>
 
 **49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
 
-&#10230;
+&#10230; 学習データを{x(i),i∈[[1,m]]}、シグモイド関数をgとし、対数尤度を次のように表します：
 
 <br>
 
 **50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
 
-&#10230;
+&#10230; そのため、確率的勾配上昇法の学習規則は、学習サンプルx(i)に対して次のようにwを更新するものです：
 
 <br>
 
 **51. The Machine Learning cheatsheets are now available in [target language].**
 
-&#10230;
+&#10230; 機械学習チートシートは日本語で読めます。
 
 <br>
 
@@ -313,19 +312,19 @@ dimensions by maximizing the variance of the data as follows:**
 
 **53. Translated by X, Y and Z**
 
-&#10230; X, Y, Zによる翻訳された
+&#10230; X, Y, Zによる翻訳
 
 <br>
 
 **54. Reviewed by X, Y and Z**
 
-&#10230; X, Y, Zによるレビューされた
+&#10230; X, Y, Zによるレビュー
 
 <br>
 
 **55. [Introduction, Motivation, Jensen's inequality]**
 
-&#10230;
+&#10230; [導入, 動機, イェンセンの不等式]
 
 <br>
 
@@ -337,4 +336,4 @@ dimensions by maximizing the variance of the data as follows:**
 
 **57. [Dimension reduction, PCA, ICA]**
 
-&#10230;
+&#10230; [次元削減, PCA, ICA]

From 0c7d275ef387f0dbd3d7ed423c9f49439fff3036 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Fri, 27 Sep 2019 11:15:09 +0900
Subject: [PATCH 380/531] [ja] Convolutional Neural Networks

---
 ja/convolutional-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/convolutional-neural-networks.md b/ja/convolutional-neural-networks.md
index b1aeb644f..bff314dce 100644
--- a/ja/convolutional-neural-networks.md
+++ b/ja/convolutional-neural-networks.md
@@ -698,7 +698,7 @@
 
 **100. Reviewed by X, Y and Z**
 
-&#10230; X, Y, Zによるレビューされた
+&#10230; X, Y, Z 校正
 
 <br>
 

From 309cdf701456aa925eb4f3128ee19c8fb196472b Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Fri, 27 Sep 2019 07:19:03 -0700
Subject: [PATCH 381/531] Add [vi] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 677c233aa..c5d5c1c9f 100644
--- a/README.md
+++ b/README.md
@@ -72,7 +72,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|done|done|
 |**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
-|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/162)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|not started|not started|
+|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/162)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/177)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
 |**中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 
 ### CS 230 (Deep Learning)

From f551f179f4ba27278413eb87edef61e536c77046 Mon Sep 17 00:00:00 2001
From: Minh Tuan <tuanhm2911@gmail.com>
Date: Fri, 27 Sep 2019 22:15:51 +0700
Subject: [PATCH 382/531] update vi/deep-learning-tips-and-tricks

---
 vi/cs-230-deep-learning-tips-and-tricks.md | 456 +++++++++++++++++++++
 1 file changed, 456 insertions(+)
 create mode 100644 vi/cs-230-deep-learning-tips-and-tricks.md

diff --git a/vi/cs-230-deep-learning-tips-and-tricks.md b/vi/cs-230-deep-learning-tips-and-tricks.md
new file mode 100644
index 000000000..6edf85a81
--- /dev/null
+++ b/vi/cs-230-deep-learning-tips-and-tricks.md
@@ -0,0 +1,456 @@
+**Deep Learning Tips and Tricks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-deep-learning-tips-and-tricks)
+
+<br>
+
+**1. Deep Learning Tips and Tricks cheatsheet**
+
+&#10230; Một số mẹo trong học sâu cheatsheet
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Học sâu
+
+<br>
+
+
+**3. Tips and tricks**
+
+&#10230; Mẹo và thủ thuật
+
+<br>
+
+
+**4. [Data processing, Data augmentation, Batch normalization]**
+
+&#10230; [Xử lí dữ liệu, Thêm dữ liệu, Chuẩn hóa batch]
+
+<br>
+
+
+**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
+
+&#10230; [Huấn luyện một mô hình nhân tạo, Epoch, Mini-batch, Cross-entropy loss, Lan truyền ngược, Gradient descent, Cập nhật trọng số, Kiểm tra gradient]
+
+<br>
+
+
+**6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
+
+&#10230; [Tinh chỉnh tham số, Khởi tạo Xavier, Học chuyển tiếp, Tốc độ học, Tốc độ học đáp ứng]
+
+<br>
+
+
+**7. [Regularization, Dropout, Weight regularization, Early stopping]**
+
+&#10230; [Sự phạt mô hình, Dropout, Khởi tạo trọng số, Dừng sớm]
+
+<br>
+
+
+**8. [Good practices, Overfitting small batch, Gradient checking]**
+
+&#10230; [Thói quen tốt, Quá khớp tập nhỏ, Kiểm tra đạo hàm]
+
+<br>
+
+
+**9. View PDF version on GitHub**
+
+&#10230; [Xem bản PDF trên GitHub]
+
+<br>
+
+
+**10. Data processing**
+
+&#10230; Xử lí dữ liệu
+
+<br>
+
+
+**11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
+
+&#10230; Tăng cường dữ liệu - Các mô hình học sâu thường cần rất nhiều dữ liệu để có thể được huấn luyện đúng cách. Nó thường hữu ích để có được nhiều dữ liệu hơn từ những cái hiện có bằng cách sử dụng các kỹ thuật tăng dữ liệu. Những cái chính được tóm tắt trong bảng dưới đây. Chính xác hơn, với hình ảnh đầu vào sau đây, đây là những kỹ thuật mà chúng ta có thể áp dụng:
+
+<br>
+
+
+**12. [Original, Flip, Rotation, Random crop]**
+
+&#10230; [Hình gốc, Lật, Xoay, Cắt ngẫu nhiên]
+
+<br>
+
+
+**13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
+
+&#10230; [Hình ảnh không có bất kỳ sửa đổi nào, Lật đối với một trục mà ý nghĩa của hình ảnh được giữ nguyên, Xoay với một góc nhỏ, Mô phỏng hiệu chỉnh đường chân trời không chính xác, Lấy nét ngẫu nhiên trên một phần của hình ảnh, Một số cách cắt ngẫu nhiên có thể được thực hiện trên một hàng]
+
+<br>
+
+
+**14. [Color shift, Noise addition, Information loss, Contrast change]**
+
+&#10230; [Dịch chuyển màu, Thêm nhiễu, Mất mát thông tin, Thay đổi độ tương phản]
+
+<br>
+
+
+**15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
+
+&#10230; [Các sắc thái của RGB bị thay đổi một chút, Nhiễu nhiễu có thể xảy ra khi tiếp xúc với ánh sáng nhẹ, Bổ sung nhiễu, Chịu được sự thay đổi chất lượng của các yếu tố đầu vào, Các phần của hình ảnh bị bỏ qua, Bắt chước mất khả năng của các phần của hình ảnh, Thay đổi độ sáng, Kiểm soát sự khác biệt do phơi sáng do thời gian trong ngày]
+
+<br>
+
+
+**16. Remark: data is usually augmented on the fly during training.**
+
+&#10230; Ghi chú: dữ liệu thường được tăng cường khi huấn luyện
+
+<br>
+
+
+**17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
+
+&#10230; Chuẩn hóa batch ― Đây là một bước của siêu tham số γ,β chuẩn hóa tập dữ liệu {xi}. Kí hiệu μB,σ2B là trung bình và phương sai của tập dữ liệu ta muốn chuẩn hóa, tuân theo công thức sau:
+
+<br>
+
+
+**18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
+
+&#10230; Thường hoàn thành sau một lớp fully connected/nhân chập và trước lớp phi tuyến tính và mục đích cho phép tốc độc học cao hơn và giảm thiểu sự phụ thuộc vào khởi tạo
+
+<br>
+
+
+**19. Training a neural network**
+
+&#10230; Huấn luyện một mô hình nhân tạo
+
+<br>
+
+
+**20. Definitions**
+
+&#10230; Định nghĩa
+
+<br>
+
+
+**21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
+
+&#10230; Vòng lặp ― Trong ngữ cảnh huấn luyện mô hình, vòng lặp là một từ chỉ một lần lặp qua toàn bộ dữ liệu huấn luyện để cập nhật tham số.
+
+<br>
+
+
+**22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
+
+&#10230; Giảm độ dốc theo lô nhỏ - Trong giai đoạn đào tạo, việc cập nhật trọng số thường không dựa trên toàn bộ tập huấn cùng một lúc do độ phức tạp tính toán hoặc một điểm dữ liệu do vấn đề nhiễu. Thay vào đó, bước cập nhật được thực hiện trên các lô nhỏ, trong đó số lượng điểm dữ liệu trong một lô là một siêu tham số mà chúng ta có thể điều chỉnh.
+
+<br>
+
+
+**23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
+
+&#10230; Hàm mất mát - Để định lượng cách thức một mô hình nhất định thực hiện, hàm mất L thường được sử dụng để đánh giá mức độ đầu ra thực tế y được dự đoán chính xác bởi mô hình đầu ra z.
+
+<br>
+
+
+**24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
+
+&#10230; Mất entropy chéo - Trong bối cảnh phân loại nhị phân trong các mạng thần kinh, tổn thất entropy chéo L(z,y) thường được sử dụng và được định nghĩa như sau:
+
+<br>
+
+
+**25. Finding optimal weights**
+
+&#10230; Tìm trọng số tối ưu
+
+<br>
+
+
+**26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
+
+&#10230; Lan truyền ngược - Lan truyền ngược là một phương pháp để cập nhật các trọng số trong mạng nhân tạo bằng cách tính đến đầu ra thực tế và đầu ra mong muốn. Đạo hàm tương ứng với từng trọng số w được tính bằng quy tắc chuỗi.
+
+<br>
+
+
+**27. Using this method, each weight is updated with the rule:**
+
+&#10230; Sử dụng mô hình này, mỗi trọng số được cập nhật theo quy luật:
+
+<br>
+
+
+**28. Updating weights ― In a neural network, weights are updated as follows:**
+
+&#10230; Cập nhật trọng số ― Trong một mô hình nhân tạo, trọng số được cập nhật như sau:
+
+<br>
+
+
+**29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
+
+&#10230; [Bước 1: Lấy một loạt dữ liệu huấn luyện và thực hiện lan truyền thẳng để tính toán mất mát, Bước 2: Sao lưu lại mất mát để có được độ dốc của mất mát theo từng trọng số, Bước 3: Sử dụng độ dốc để cập nhật trọng số của mạng.]
+
+<br>
+
+
+**30. [Forward propagation, Backpropagation, Weights update]**
+
+&#10230; [Lan truyền thẳng, Lan truyền ngược, Cập nhật trọng số]
+
+<br>
+
+
+**31. Parameter tuning**
+
+&#10230; Tinh chỉnh tham số
+
+<br>
+
+
+**32. Weights initialization**
+
+&#10230; Khởi tạo trọng số
+
+<br>
+
+
+**33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
+
+&#10230; Khởi tạo Xavier - Thay vì khởi tạo trọng số một cách ngẫu nhiên, khởi tạo Xavier cho chúng ta một cách khởi tạo tham số dựa trên một đặc tính độc nhất của mô hình.
+
+<br>
+
+
+**34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
+
+&#10230; Học chuyển tiếp - Huấn luyện một mô hình học tập sâu đòi hỏi nhiều dữ liệu và quan trọng hơn là rất nhiều thời gian. Sẽ rất hữu ích để tận dụng các trọng số được đào tạo trước trên các bộ dữ liệu khổng lồ mất vài ngày / tuần để đào tạo và tận dụng nó cho trường hợp sử dụng của chúng ta. Tùy thuộc vào lượng dữ liệu chúng ta có trong tay, đây là các cách khác nhau để tận dụng điều này:
+
+<br>
+
+
+**35. [Training size, Illustration, Explanation]**
+
+&#10230; [Kích thước tập huấn luyện, Mô phỏng, Giải thích]
+
+<br>
+
+
+**36. [Small, Medium, Large]**
+
+&#10230; [Nhỏ, Trung bình, Lớn]
+
+<br>
+
+
+**37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
+
+&#10230; [Đông đặc các lớp, huấn luyện hàm softmax, Đông đặc hầu hết các lớp, huấn luyện trên lớp cuối và hàm softmax, Huấn luyện trọng số trên tầng và softmax với khởi tạo trọng số trên mô hình đã huấn luyện sẵn]
+
+<br>
+
+
+**38. Optimizing convergence**
+
+&#10230; Tối ưu hội tự
+
+<br>
+
+
+**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
+
+&#10230; Tốc độ học - Tốc độ học, thường được kí hiệu là α hoặc đôi khi là η, cho biết mức độ thay đổi của các trọng số sau mỗi lần cập nhật. Nó có thể được cố định hoặc thay đổi thích ứng. Phương pháp phổ biến nhất hiện nay được gọi là Adam, đây là phương pháp thích nghi với tốc độ học.
+
+<br>
+
+
+**40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
+
+&#10230; Tốc độ học thích nghi - Để tốc độ học thay đổi khi huấn luyện một mô hình có thể giảm thời gian huấn luyện và cải thiện giải pháp tối ưu số. Trong khi Adam tối ưu hóa là kỹ thuật được sử dụng phổ biến nhất, những phương pháp khác cũng có thể hữu ích. Chúng được tóm tắt trong bảng dưới đây:
+
+<br>
+
+
+**41. [Method, Explanation, Update of w, Update of b]**
+
+&#10230; [Phương pháp, Giải thích, Cập nhật của w, Cập nhật của b]
+
+<br>
+
+
+**42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
+
+&#10230; [Momentum, Làm giảm dao động, Cải thiện SGD, 2 tham số để tinh chỉnh]
+
+<br>
+
+
+**43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
+
+&#10230; [RMSprop, lan truyền Root Mean Square, Thuật toán tăng tốc bằng kiểm soát dao động]
+
+<br>
+
+
+**44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
+
+&#10230; [Adam, Ước lượng Adam Moment, Các phương pháp phổ biến, 4 tham số để tinh chỉnh]
+
+<br>
+
+
+**45. Remark: other methods include Adadelta, Adagrad and SGD.**
+
+&#10230; Chú ý: những phương pháp khác bao gồm Adadelta, Adagrad và SGD.
+
+<br>
+
+
+**46. Regularization**
+
+&#10230; Phạt mô hình
+
+<br>
+
+
+**47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
+
+&#10230; Dropout - Dropout là một kỹ thuật được sử dụng trong các mạng nhân tạo để ngăn chặn hiện tượng quá khớp bằng cách loại bỏ các nơ-ron với xác suất p>0. Nó buộc mô hình tránh phụ thuộc quá nhiều vào một tập thuộc tính nào đó.
+
+<br>
+
+
+**48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
+
+&#10230; Ghi chú: hầu hết các framework học máy có cài đặt dropout thông qua biến 'keep' với tham số 1-p.
+
+<br>
+
+
+**49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
+
+&#10230; Phạt trọng số - Để đảm bảo rằng các trọng số không quá lớn và mô hình không vượt quá tập huấn luyện, các kỹ thuật chính quy thường được thực hiện trên các trọng số mô hình. Những cái chính được tóm tắt trong bảng dưới đây:
+
+<br>
+
+
+**50. [LASSO, Ridge, Elastic Net]**
+
+&#10230; [LASSO, Ridge, Elastic Net]
+
+<br>
+
+**50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230; bis. Thu hẹp hệ số về 0, Tốt cho lựa chọn biến, Làm cho hệ số nhỏ hơn, Trao đổi giữa lựa chọn biến và hệ số nhỏ]
+
+<br>
+
+**51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
+
+&#10230; Dừng sớm - Kĩ thuật regularization này sẽ dừng quá trình huấn luyện một khi mất mát trên tập thẩm định đạt đến một độ nào đó hoặc bắt đầu tăng
+
+<br>
+
+
+**52. [Error, Validation, Training, early stopping, Epochs]**
+
+&#10230; [Lỗi, Thẩm định, Huấn luyện, dừng sớm, Vòng]
+
+<br>
+
+
+**53. Good practices**
+
+&#10230; Thói quen tốt
+
+<br>
+
+
+**54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
+
+&#10230; Quá khớp batch nhỏ - Khi gỡ lỗi một mô hình, thường rất hữu ích khi thực hiện các thử nghiệm nhanh để xem liệu có bất kỳ vấn đề lớn nào với kiến ​​trúc của chính mô hình đó không. Đặc biệt, để đảm bảo rằng mô hình có thể được huấn luyện đúng cách, một batch nhỏ được truyền vào bên trong mạng để xem liệu nó có thể phù hợp với nó không. Nếu không thể, điều đó có nghĩa là mô hình quá phức tạp hoặc không đủ phức tạp để thậm chí vượt quá trên batch nhỏ, chứ đừng nói đến một tập huấn luyện có kích thước bình thường.
+
+<br>
+
+
+**55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
+
+&#10230; Kiểm tra gradient - Kiểm tra gradient là một phương pháp được sử dụng trong quá trình thực hiện đường truyền ngược của mạng thần kinh. Nó so sánh giá trị của gradient phân tích với gradient số tại các điểm đã cho và đóng vai trò kiểm tra độ chính xác.
+
+<br>
+
+
+**56. [Type, Numerical gradient, Analytical gradient]**
+
+&#10230; [Loại, Gradient số, Gradient phân tích]
+
+<br>
+
+
+**57. [Formula, Comments]**
+
+&#10230; [Công thức, Bình luận]
+
+<br>
+
+
+**58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
+
+&#10230; [Đắt; Mất mát phải được tính hai lần cho mỗi chiều, Được sử dụng để xác minh tính chính xác của việc triển khai phân tích, Đánh đổi trong việc chọn h không quá nhỏ (mất ổn định số) cũng không quá lớn (xấp xỉ độ dốc kém)]
+
+<br>
+
+
+**59. ['Exact' result, Direct computation, Used in the final implementation]**
+
+&#10230; [Kết quả 'Chính xác', Tính toán trực tiếp, Được sử dụng trong quá trình thực hiện cuối cùng]
+
+<br>
+
+
+**60. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230; Học sâu cheetsheets đã khả dụng trên [Tiếng Việt]
+
+
+**61. Original authors**
+
+&#10230; Những tác giả
+
+<br>
+
+**62.Translated by X, Y and Z**
+
+&#10230; Dịch bởi X, Y và Z
+
+<br>
+
+**63.Reviewed by X, Y and Z**
+
+&#10230; Đánh giá bởi X, Y và Z
+
+<br>
+
+**64.View PDF version on GitHub**
+
+&#10230; Xem bản PDF trên GitHub
+
+<br>
+
+**65.By X and Y**
+
+&#10230; Bởi X và Y
+
+<br>

From af61575cb95a8a49514493c1b140872570afdbcc Mon Sep 17 00:00:00 2001
From: Minh Tuan <tuanhm2911@gmail.com>
Date: Fri, 27 Sep 2019 23:32:20 +0700
Subject: [PATCH 383/531] update vi/cs-221-logic-models

---
 vi/cs-221-logic-models.md | 462 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 462 insertions(+)
 create mode 100644 vi/cs-221-logic-models.md

diff --git a/vi/cs-221-logic-models.md b/vi/cs-221-logic-models.md
new file mode 100644
index 000000000..045dc5851
--- /dev/null
+++ b/vi/cs-221-logic-models.md
@@ -0,0 +1,462 @@
+**Logic-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-logic-models)
+
+<br>
+
+**1. Logic-based models with propositional and first-order logic**
+
+&#10230; Các mô hình dựa trên logic với logic mệnh đề và logic bậc nhất
+
+<br>
+
+
+**2. Basics**
+
+&#10230; Cơ bản
+
+<br>
+
+
+**3. Syntax of propositional logic ― By noting f,g formulas, and ¬,∧,∨,→,↔ connectives, we can write the following logical expressions:**
+
+&#10230; Cú pháp của logic mệnh đề ― Kí hiệu f,g là các công thức, và ¬,∧,∨,→,↔ các kết nối, chúng ta có thể viết các biểu thức logic sau:
+
+<br>
+
+
+**4. [Name, Symbol, Meaning, Illustration]**
+
+&#10230; [Tên, Kí hiệu, Ý nghĩa, Miêu tả]
+
+<br>
+
+
+**5. [Affirmation, Negation, Conjunction, Disjunction, Implication, Biconditional]**
+
+&#10230; [Khẳng định, phủ định, kết hợp, phân ly, hàm ý, nhị phân]
+
+<br>
+
+
+**6. [not f, f and g, f or g, if f then g, f, that is to say g]**
+
+&#10230; [phủ định f, f và g, f hoặc g, nếu f thì g, f, đó là nói g]
+
+<br>
+
+
+**7. Remark: formulas can be built up recursively out of these connectives.**
+
+&#10230; Ghi chú: công thức có thể được xây dựng đệ quy từ các kết nối này.
+
+<br>
+
+
+**8. Model ― A model w denotes an assignment of binary weights to propositional symbols.**
+
+&#10230; Mô hình - Một mô hình w biểu thị việc gán trọng số nhị phân cho các ký hiệu mệnh đề.
+
+<br>
+
+
+**9. Example: the set of truth values w={A:0,B:1,C:0} is one possible model to the propositional symbols A, B and C.**
+
+&#10230; Ví dụ: tập hợp các giá trị chân lý w ={A:0,B:1,C:0} là một mô hình có thể có cho các ký hiệu mệnh đề A, B và C.
+
+<br>
+
+
+**10. Interpretation function ― The interpretation function I(f,w) outputs whether model w satisfies formula f:**
+
+&#10230; Hàm giải thích - Hàm giải thích I (f, w) đưa ra liệu mô hình w có thỏa mãn công thức f:
+
+<br>
+
+
+**11. Set of models ― M(f) denotes the set of models w that satisfy formula f. Mathematically speaking, we define it as follows:**
+
+&#10230; Tập hợp các mô hình - M(f) biểu thị tập hợp các mô hình w thỏa mãn công thức f. Về mặt toán học, chúng ta định nghĩa nó như sau:
+
+<br>
+
+
+**12. Knowledge base**
+
+&#10230; Cơ sở tri thức
+
+<br>
+
+
+**13. Definition ― The knowledge base KB is the conjunction of all formulas that have been considered so far. The set of models of the knowledge base is the intersection of the set of models that satisfy each formula. In other words:**
+
+&#10230; Định nghĩa - Cơ sở tri thức KB là sự kết hợp của tất cả các công thức đã được xem xét cho đến nay. Tập hợp các mô hình của cơ sở tri thức là giao điểm của tập hợp các mô hình thỏa mãn từng công thức. Nói cách khác:
+
+<br>
+
+
+**14. Probabilistic interpretation ― The probability that query f is evaluated to 1 can be seen as the proportion of models w of the knowledge base KB that satisfy f, i.e.:**
+
+&#10230; Giải thích xác suất - Xác suất mà truy vấn f được ước tính là 1 có thể được xem là tỷ lệ của các mô hình w của cơ sở tri thức KB thỏa mãn f, tức là:
+
+<br>
+
+
+**15. Satisfiability ― The knowledge base KB is said to be satisfiable if at least one model w satisfies all its constraints. In other words:**
+
+&#10230; Mức độ thỏa mãn - Cơ sở tri thức KB được cho là thỏa đáng nếu có ít nhất một mô hình w thỏa mãn tất cả các ràng buộc của nó. Nói cách khác:
+
+<br>
+
+
+**16. satisfiable**
+
+&#10230; thỏa đáng
+
+<br>
+
+
+**17. Remark: M(KB) denotes the set of models compatible with all the constraints of the knowledge base.**
+
+&#10230; Ghi chú: M(KB) biểu thị tập hợp các mô hình tương thích với tất cả các ràng buộc của cơ sở tri thức.
+
+<br>
+
+
+**18. Relation between formulas and knowledge base - We define the following properties between the knowledge base KB and a new formula f:**
+
+&#10230; Mối liên hệ giữa công thức và cơ sở tri thức - Chúng tôi xác định các thuộc tính sau giữa KB cơ sở tri thức và công thức mới f:
+
+<br>
+
+
+**19. [Name, Mathematical formulation, Illustration, Notes]**
+
+&#10230; [Tên, Công thức toán học, Minh họa, Ghi chú]
+
+<br>
+
+
+**20. [KB entails f, KB contradicts f, f contingent to KB]**
+
+&#10230; [KB đòi hỏi f, KB mâu thuẫn với f, f phụ thuộc vào KB]
+
+<br>
+
+
+**21. [f does not bring any new information, Also written KB⊨f, No model satisfies the constraints after adding f, Equivalent to KB⊨¬f, f does not contradict KB, f adds a non-trivial amount of information to KB]**
+
+&#10230; [f không mang lại bất kỳ thông tin mới nào, KB writtenf cũng được viết, Không có mô hình nào thỏa mãn các ràng buộc sau khi thêm f, Tương đương với KB⊨¬f, f không mâu thuẫn với KB, f thêm một lượng thông tin không tầm thường vào KB]
+
+<br>
+
+
+**22. Model checking ― A model checking algorithm takes as input a knowledge base KB and outputs whether it is satisfiable or not.**
+
+&#10230; Kiểm tra mô hình - Thuật toán kiểm tra mô hình lấy đầu vào là KB cơ sở tri thức và đưa ra liệu nó có thỏa đáng hay không.
+
+<br>
+
+
+**23. Remark: popular model checking algorithms include DPLL and WalkSat.**
+
+&#10230; Ghi chú: các thuật toán kiểm tra mô hình phổ biến bao gồm DPLL và WalkSat.
+
+<br>
+
+
+**24. Inference rule ― An inference rule of premises f1,...,fk and conclusion g is written:**
+
+&#10230; Quy tắc suy luận - Một quy tắc suy luận của các cơ sở f1,...,fk và kết luận g được viết:
+
+<br>
+
+
+**25. Forward inference algorithm ― From a set of inference rules Rules, this algorithm goes through all possible f1,...,fk and adds g to the knowledge base KB if a matching rule exists. This process is repeated until no more additions can be made to KB.**
+
+&#10230; Thuật toán suy luận chuyển tiếp - Từ một tập hợp các quy tắc suy luận Quy tắc, thuật toán này sẽ đi qua tất cả các F1, ..., fk và thêm g vào cơ sở kiến ​​thức KB nếu tồn tại quy tắc phù hợp. Quá trình này được lặp lại cho đến khi không thể bổ sung thêm vào KB.
+
+<br>
+
+
+**26. Derivation ― We say that KB derives f (written KB⊢f) with rules Rules if f already is in KB or gets added during the forward inference algorithm using the set of rules Rules.**
+
+&#10230; Đạo hàm - Chúng ta nói rằng KB xuất phát f (viết KB⊢f) với các quy tắc Quy tắc nếu f đã có trong KB hoặc được thêm vào trong thuật toán suy luận chuyển tiếp bằng cách sử dụng bộ quy tắc Quy tắc.
+
+<br>
+
+
+**27. Properties of inference rules ― A set of inference rules Rules can have the following properties:**
+
+&#10230; Thuộc tính của quy tắc suy luận - Một tập hợp các quy tắc suy luận Quy tắc có thể có các thuộc tính sau:
+
+<br>
+
+
+**28. [Name, Mathematical formulation, Notes]**
+
+&#10230; [Tên, Công thức toán học, Ghi chú]
+
+<br>
+
+
+**29. [Soundness, Completeness]**
+
+&#10230; [Âm thanh, Hoàn chỉnh]
+
+<br>
+
+
+**30. [Inferred formulas are entailed by KB, Can be checked one rule at a time, "Nothing but the truth", Formulas entailing KB are either already in the knowledge base or inferred from it, "The whole truth"]**
+
+&#10230; [Các công thức được suy luận được KB yêu cầu, Có thể kiểm tra một quy tắc tại một thời điểm, "Không có gì ngoài sự thật", Các công thức đòi hỏi KB đã có trong cơ sở tri thức hoặc được suy ra từ đó, "Toàn bộ sự thật"]
+
+<br>
+
+
+**31. Propositional logic**
+
+&#10230; Logic mệnh đề
+
+<br>
+
+
+**32. In this section, we will go through logic-based models that use logical formulas and inference rules. The idea here is to balance expressivity and computational efficiency.**
+
+&#10230; Trong phần này, chúng ta sẽ đi qua các mô hình dựa trên logic sử dụng các công thức logic và quy tắc suy luận. Ý tưởng ở đây là để cân bằng giữa tính biểu thức và hiệu quả tính toán.
+
+<br>
+
+
+**33. Horn clause ― By noting p1,...,pk and q propositional symbols, a Horn clause has the form:**
+
+&#10230; Mệnh đề sừng - Bằng cách lưu ý các ký hiệu mệnh đề p1,...,pk và q, mệnh đề Sừng có dạng:
+
+<br>
+
+
+**34. Remark: when q=false, it is called a "goal clause", otherwise we denote it as a "definite clause".**
+
+&#10230; Ghi chú: khi q = false, nó được gọi là "mệnh đề mục tiêu", nếu không, chúng ta biểu thị nó là "mệnh đề xác định".
+
+<br>
+
+
+**35. Modus ponens ― For propositional symbols f1,...,fk and p, the modus ponens rule is written:**
+
+&#10230; Modus ponens - Đối với các ký hiệu mệnh đề F1, ..., fk và p, quy tắc modus ponens được viết:
+
+<br>
+
+
+**36. Remark: it takes linear time to apply this rule, as each application generate a clause that contains a single propositional symbol.**
+
+&#10230; Lưu ý: phải mất thời gian tuyến tính để áp dụng quy tắc này, vì mỗi ứng dụng tạo ra một mệnh đề có chứa một ký hiệu mệnh đề duy nhất.
+
+<br>
+
+
+**37. Completeness ― Modus ponens is complete with respect to Horn clauses if we suppose that KB contains only Horn clauses and p is an entailed propositional symbol. Applying modus ponens will then derive p.**
+
+&#10230; Tính đầy đủ - Modus ponens hoàn thành đối với các mệnh đề Sừng nếu chúng ta cho rằng KB chỉ chứa các mệnh đề Sừng và p là một biểu tượng mệnh đề bắt buộc. Áp dụng modus ponens sau đó sẽ lấy được p.
+
+<br>
+
+
+**38. Conjunctive normal form ― A conjunctive normal form (CNF) formula is a conjunction of clauses, where each clause is a disjunction of atomic formulas.**
+
+&#10230; Dạng bình thường kết hợp - Một công thức dạng thường kết hợp (CNF) là một sự kết hợp của các mệnh đề, trong đó mỗi mệnh đề là một sự tách rời của các công thức nguyên tử.
+
+<br>
+
+
+**39. Remark: in other words, CNFs are ∧ of ∨.**
+
+&#10230; Ghi chú: nói cách khác, CNF là ∧ của ∨.
+
+<br>
+
+
+**40. Equivalent representation ― Every formula in propositional logic can be written into an equivalent CNF formula. The table below presents general conversion properties:**
+
+&#10230; Biểu diễn tương đương - Mọi công thức trong logic mệnh đề có thể được viết thành một công thức CNF tương đương. Bảng dưới đây trình bày các thuộc tính chuyển đổi chung:
+
+<br>
+
+
+**41. [Rule name, Initial, Converted, Eliminate, Distribute, over]**
+
+&#10230; [Tên quy tắc, Ban đầu, Chuyển đổi, Loại bỏ, Phân phối, kết thúc]
+
+<br>
+
+
+**42. Resolution rule ― For propositional symbols f1,...,fn, and g1,...,gm as well as p, the resolution rule is written:**
+
+&#10230; Quy tắc phân giải - Đối với các ký hiệu mệnh đề F1, ..., fn và g1, ..., gm cũng như p, quy tắc phân giải được viết:
+
+<br>
+
+
+**43. Remark: it can take exponential time to apply this rule, as each application generates a clause that has a subset of the propositional symbols.**
+
+&#10230; Lưu ý: có thể mất thời gian theo cấp số nhân để áp dụng quy tắc này, vì mỗi ứng dụng tạo ra một mệnh đề có tập hợp con của các ký hiệu mệnh đề.
+
+<br>
+
+
+**44. [Resolution-based inference ― The resolution-based inference algorithm follows the following steps:, Step 1: Convert all formulas into CNF, Step 2: Repeatedly apply resolution rule, Step 3: Return unsatisfiable if and only if False, is derived]**
+
+&#10230; [Suy luận dựa trên độ phân giải - Thuật toán suy luận dựa trên độ phân giải tuân theo các bước sau:, Bước 1: Chuyển đổi tất cả các công thức thành CNF, Bước 2: Áp dụng lại quy tắc độ phân giải, Bước 3: Trả về không thỏa đáng khi và chỉ khi Sai, được dẫn xuất]
+
+<br>
+
+
+**45. First-order logic**
+
+&#10230; Logic bậc nhất
+
+<br>
+
+
+**46. The idea here is to use variables to yield more compact knowledge representations.**
+
+&#10230; Ý tưởng ở đây là sử dụng các biến để mang lại các biểu diễn tri thức nhỏ gọn hơn.
+
+<br>
+
+
+**47. [Model ― A model w in first-order logic maps:, constant symbols to objects, predicate symbols to tuple of objects]**
+
+&#10230; [Mô hình - Một mô hình w trong các ánh xạ logic bậc nhất:, các ký hiệu không đổi cho các đối tượng, các ký hiệu vị ngữ cho đến các đối tượng]
+
+<br>
+
+
+**48. Horn clause ― By noting x1,...,xn variables and a1,...,ak,b atomic formulas, the first-order logic version of a horn clause has the form:**
+
+&#10230; Mệnh đề sừng - Bằng cách lưu ý các biến x1,...,xn và a1,...,ak,b công thức nguyên tử, phiên bản logic thứ nhất của mệnh đề sừng có dạng:
+
+<br>
+
+
+**49. Substitution ― A substitution θ maps variables to terms and Subst[θ,f] denotes the result of substitution θ on f.**
+
+&#10230; Thay thế - Một thay thế θ ánh xạ các biến thành các thuật ngữ và Subst [θ, f] biểu thị kết quả của sự thay thế θ trên f.
+
+<br>
+
+
+**50. Unification ― Unification takes two formulas f and g and returns the most general substitution θ that makes them equal:**
+
+&#10230; Hợp nhất - Hợp nhất có hai công thức f và g và trả về sự thay thế chung nhất làm cho chúng bằng nhau:
+
+<br>
+
+
+**51. such that**
+
+&#10230; sao cho
+
+<br>
+
+
+**52. Note: Unify[f,g] returns Fail if no such θ exists.**
+
+&#10230; Lưu ý: Thống nhất [f, g] trả về Fail nếu không tồn tại θ.
+
+<br>
+
+
+**53. Modus ponens ― By noting x1,...,xn variables, a1,...,ak and a′1,...,a′k atomic formulas and by calling θ=Unify(a′1∧...∧a′k,a1∧...∧ak) the first-order logic version of modus ponens can be written:**
+
+&#10230; Modus ponens - Bằng cách lưu ý các biến x1, ..., xn, a1, ..., ak và a′1, ..., a′k công thức nguyên tử và bằng cách gọi θ=Unify(a′1∧... ∧a′k,a1∧...ak) phiên bản logic bậc nhất của modus ponens có thể được viết:
+
+<br>
+
+
+**54. Completeness ― Modus ponens is complete for first-order logic with only Horn clauses.**
+
+&#10230; Tính đầy đủ - Modus ponens hoàn thành cho logic thứ nhất chỉ với các mệnh đề Horn.
+
+<br>
+
+
+**55. Resolution rule ― By noting f1,...,fn, g1,...,gm, p, q formulas and by calling θ=Unify(p,q), the first-order logic version of the resolution rule can be written:**
+
+&#10230; Quy tắc phân giải - Bằng cách lưu ý các công thức f1,...,fn,g1,...,gm,p,q và bằng cách gọi θ=Unify(p,q), có thể viết phiên bản logic bậc nhất của quy tắc phân giải :
+
+<br>
+
+
+**56. [Semi-decidability ― First-order logic, even restricted to only Horn clauses, is semi-decidable., if KB⊨f, forward inference on complete inference rules will prove f in finite time, if KB⊭f, no algorithm can show this in finite time]**
+
+&#10230; [Độ phân giải bán - Logic bậc một, thậm chí chỉ giới hạn ở các mệnh đề Sừng, là bán có thể quyết định., Nếu KB⊨f, suy luận về các quy tắc suy luận hoàn chỉnh sẽ chứng minh f trong thời gian hữu hạn, nếu KB⊭f, không thuật toán nào có thể hiển thị Điều này trong thời gian hữu hạn]
+
+<br>
+
+
+**57. [Basics, Notations, Model, Interpretation function, Set of models]**
+
+&#10230; [Khái niệm cơ bản, Ký hiệu, Mô hình, Hàm diễn giải, Bộ mô hình]
+
+<br>
+
+
+**58. [Knowledge base, Definition, Probabilistic interpretation, Satisfiability, Relationship with formulas, Forward inference, Rule properties]**
+
+&#10230; [Cơ sở tri thức, Định nghĩa, Giải thích xác suất, Hài lòng, Mối quan hệ với các công thức, Suy luận chuyển tiếp, Thuộc tính quy tắc]
+
+<br>
+
+
+**59. [Propositional logic, Clauses, Modus ponens, Conjunctive normal form, Representation equivalence, Resolution]**
+
+&#10230; [Logic đề xuất, Mệnh đề, Modus ponens, Hình thức bình thường kết hợp, Tương đương đại diện, Độ phân giải]
+
+<br>
+
+
+**60. [First-order logic, Substitution, Unification, Resolution rule, Modus ponens, Resolution, Semi-decidability]**
+
+&#10230; [Logic thứ nhất, Thay thế, Thống nhất, Quy tắc giải quyết, Modus ponens, Độ phân giải, Bán quyết định]
+
+<br>
+
+
+**61. View PDF version on GitHub**
+
+&#10230; Xem bản PDF trên GitHub
+
+<br>
+
+
+**62. Original authors**
+
+&#10230; Các tác giả
+
+<br>
+
+
+**63. Translated by X, Y and Z**
+
+&#10230; Dịch bởi X, Y và Z
+
+<br>
+
+
+**64. Reviewed by X, Y and Z**
+
+&#10230; Đánh giá bới X, Y và Z
+
+<br>
+
+
+**65. By X and Y**
+
+&#10230; Bởi X và Y
+
+<br>
+
+
+**66. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230; Trí tuệ nhân tạo cheatsheats hiện đã có vơi ngôn ngữ [Tiếng Việt]

From 540b51065b9a9d61fb83c66215cdb8c17ba26f37 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Fri, 27 Sep 2019 22:58:27 -0700
Subject: [PATCH 384/531] Update progress [vi]

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index c5d5c1c9f..f4839cfbf 100644
--- a/README.md
+++ b/README.md
@@ -47,7 +47,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**한국어**|not started|not started|not started|not started|
 |**Português**|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|
-|**Tiếng Việt**|not started|not started|not started|not started|
+|**Tiếng Việt**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/179)|
 |**中文**|not started|not started|not started|not started|
 
 ### CS 229 (Machine Learning)
@@ -97,7 +97,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Русский**|not started|not started|not started|
 |**Türkçe**|done|done|done|
 |**Українська**|not started|not started|not started|
-|**Tiếng Việt**|not started|not started|not started|
+|**Tiếng Việt**|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
 |**中文**|not started|not started|not started|
 
 ## Acknowledgements

From 0377fd5fd1ff1b04787a1c7dcc866f3c108bf586 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Sun, 29 Sep 2019 22:27:28 +0900
Subject: [PATCH 385/531] vi translating for cheatsheet-deep-learning

---
 vi/cheatsheet-deep-learning.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/vi/cheatsheet-deep-learning.md b/vi/cheatsheet-deep-learning.md
index 45dad28fa..e34e5eb70 100644
--- a/vi/cheatsheet-deep-learning.md
+++ b/vi/cheatsheet-deep-learning.md
@@ -66,7 +66,7 @@
 
 **12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
 
-&#10230; Backpropagation (Lan truyền ngược) - Backpropagation là phương thức dùng để cập nhật trọng số trong mạng neural bằng cách tính toán đầu ra thực sự và đầu ra mong muốn. Đạo hàm liên quan tới trọng số w được tính bằng cách sử dụng quy tắc chuỗi (chain rule) theo như cách dưới đây:
+&#10230; Backpropagation (Lan truyền ngược) - Backpropagation là phương thức dùng để cập nhật trọng số trong mạng neural bằng cách tính toán đầu ra thực sự và đầu ra mong muốn. Đạo hàm theo trọng số w được tính bằng cách sử dụng quy tắc chuỗi (chain rule) theo như cách dưới đây:
 
 <br>
 
@@ -162,7 +162,7 @@
 
 **28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
 
-&#10230; LSTM - Mạng bộ nhớ ngắn dài (LSTM) là 1 loại RNN model tránh vấn đề vanishing gradient (gradient biến mất đột ngột) bằng cách thêm vào cổng 'quên' ('forget' gates).
+&#10230; LSTM - Mạng bộ nhớ dài-ngắn (LSTM) là 1 loại RNN model tránh vấn đề vanishing gradient (gradient biến mất đột ngột) bằng cách thêm vào cổng 'quên' ('forget' gates).
 
 <br>
 
@@ -258,7 +258,7 @@
 
 **44. 1) We initialize the value:**
 
-&#10230; 1) Ta khởi tạo gái trị (value):
+&#10230; 1) Ta khởi tạo giá trị (value):
 
 <br>
 

From 988275a25665cc9adc482491081c09556997bb85 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 29 Sep 2019 10:42:54 -0700
Subject: [PATCH 386/531] Add [ja] contributors

---
 CONTRIBUTORS | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index addb9870f..c2c4e3ae1 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -89,6 +89,10 @@
   Kwang Hyeok Ahn (translation of Unsupervised Learning)
 
 --ja
+  Tran Tuan Anh (translation of convolutional neural networks)
+  Yoshiyuki Nakai (review of convolutional neural networks)
+  Linh Dang (review of convolutional neural networks)
+  
   Kamuela Lau (translation of deep learning tips and tricks)
   Yoshiyuki Nakai (review of deep learning tips and tricks)
   Hiroki Mori (review of deep learning tips and tricks)

From ecbfb12b6e6698b2d96e5e2661e6b5c56f69c532 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 29 Sep 2019 10:45:06 -0700
Subject: [PATCH 387/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index f4839cfbf..48c60d76d 100644
--- a/README.md
+++ b/README.md
@@ -90,7 +90,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Magyar**|not started|not started|not started|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/155)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
 |**Italiano**|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|done|done|
+|**日本語**|done|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
 |**Polski**|not started|not started|not started|
 |**Português**|done|not started|not started|

From 5eda787663efb9e98d855b884ad9a5496f67b3fe Mon Sep 17 00:00:00 2001
From: Pham Hong Vinh <phamvinh257@gmail.com>
Date: Mon, 30 Sep 2019 15:46:50 +0700
Subject: [PATCH 388/531] Translating the CNN cheatsheet

---
 vi/cs-230-convolutional-neural-networks.md | 716 +++++++++++++++++++++
 1 file changed, 716 insertions(+)
 create mode 100644 vi/cs-230-convolutional-neural-networks.md

diff --git a/vi/cs-230-convolutional-neural-networks.md b/vi/cs-230-convolutional-neural-networks.md
new file mode 100644
index 000000000..d6e13e2d2
--- /dev/null
+++ b/vi/cs-230-convolutional-neural-networks.md
@@ -0,0 +1,716 @@
+**Convolutional Neural Networks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks)
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230; Convolutional Neural Networks cheatsheet
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Deep Learning
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+&#10230; [T?ng quan, K?t c?u ki?n tr�c]
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230; [C�c ki?u t?ng(layer), T�ch ch?p, Pooling, K?t n?i ??y ??] 
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230; [Filter hyperparameters, Dimensions, Stride, Padding] 
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230; [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230; [Activation functions, Rectified Linear Unit, Softmax]
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230; [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230; [Face verification/recognition, One shot learning, Siamese network, Triplet loss]
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230; [Neural style transfer, Activation, Style matrix, Style/content cost function]
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230; [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]
+
+<br>
+
+
+**12. Overview**
+
+&#10230; T?ng quan
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230; Ki?n tr�c truy?n th?ng c?a m?t m?ng CNN ― M?ng neural t�ch ch?p (Convolutional neural networks), c�n ???c bi?t ??n v?i t�n CNNs, l� m?t d?ng m?ng neural ???c c?u th�nh b?i c�c t?ng sau: 
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230; T?ng t�ch ch?p v� t?ng pooling c� th? ???c hi?u ch?nh theo c�c tham s? c?u h�nh (hyperparameters) ???c m� t? ? nh?ng ph?n ti?p theo.
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230; Types of layer
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230; T?ng t�ch ch?p (CONV) ― T?ng t�ch ch?p (CONV) s? d?ng c�c b? l?c ?? th?c hi?n ph�p t�ch ch?p khi ??a ch�ng ?i qua ??u v�o I theo c�c chi?u c?a n�. C�c tham s? c?u h�nh c?a b? c�c b? l?c n�y bao g?m k�ch th??c b? l?c F v� ?? tr??t (stride) S. K?t qu? ??u ra O ???c g?i l� feature map hay activation map.
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230; L?u �: B??c t�ch ch?p c?ng c� th? ???c t?ng quan h�a c? v?i tr??ng h?p m?t chi?u (1D) v� ba chi?u (3D).
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230; Pooling (POOL) ― T?ng pooling (POOL) l� m?t ph�p downsampling, th??ng ???c s? d?ng sau t?ng t�ch ch?p, gi�p t?ng t�nh b?t bi?n kh�ng gian. C? th?, max pooling v� average pooling l� nh?ng d?ng pooling ??c bi?t, m� t??ng ?ng l� trong ?� gi� tr? l?n nh?t v� gi� tr? trung b�nh ???c l?y ra.
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230; [Type, Purpose, Illustration, Comments]
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230; [Max pooling, Average pooling, T?ng ph�p pooling ch?n gi� tr? l?n nh?t trong khu v?c m� n� ?ang ???c �p d?ng, T?ng ph�p pooling t�nh trung b�nh c�c gi� tr? trong khu v?c m� n� ?ang ???c s? d?ng]
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230; [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230;
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230;
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230;
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230;
+
+<br>
+
+
+**26. Filter**
+
+&#10230;
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230;
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230;
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230;
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230;
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230;
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230;
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230;
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230;
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230;
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230;
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230;
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230;
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230;
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230;
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230;
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230;
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230;
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230;
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230;
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230;
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**48. where**
+
+&#10230;
+
+<br>
+
+
+**49. Object detection**
+
+&#10230;
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230;
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230;
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230;
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230;
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230;
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230;
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230;
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230;
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230;
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230;
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230;
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230;
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230;
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230;
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230;
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230;
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230;
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230;
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230;
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230;
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230;
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230;
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230;
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230;
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230;
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230;
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230;
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230;
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230;
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230;
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230;
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230;
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230;
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230;
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230;
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230;
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230;
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230;
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230;
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230;
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230;
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230;
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230;
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+
+**98. Original authors**
+
+&#10230;
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230;
+
+<br>

From 05ee0294e553a5b4c6b9ceee617a8b0570026a62 Mon Sep 17 00:00:00 2001
From: Pham Hong Vinh <phamvinh257@gmail.com>
Date: Mon, 30 Sep 2019 16:11:19 +0700
Subject: [PATCH 389/531] Fix encoding issues

---
 vi/cs-230-convolutional-neural-networks.md | 98 +++++++++++-----------
 1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/vi/cs-230-convolutional-neural-networks.md b/vi/cs-230-convolutional-neural-networks.md
index d6e13e2d2..7f7ff5620 100644
--- a/vi/cs-230-convolutional-neural-networks.md
+++ b/vi/cs-230-convolutional-neural-networks.md
@@ -18,14 +18,14 @@
 
 **3. [Overview, Architecture structure]**
 
-&#10230; [T?ng quan, K?t c?u ki?n tr�c]
+&#10230; [Tổng quan, Kết cấu kiến trúc]
 
 <br>
 
 
 **4. [Types of layer, Convolution, Pooling, Fully connected]**
 
-&#10230; [C�c ki?u t?ng(layer), T�ch ch?p, Pooling, K?t n?i ??y ??] 
+&#10230; [Các kiểu tầng(layer), Tích chập, Pooling, Kết nối đầy đủ] 
 
 <br>
 
@@ -81,21 +81,21 @@
 
 **12. Overview**
 
-&#10230; T?ng quan
+&#10230; Tổng quan
 
 <br>
 
 
-**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+**13. Architecture of a traditional CNN â€• Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
 
-&#10230; Ki?n tr�c truy?n th?ng c?a m?t m?ng CNN ― M?ng neural t�ch ch?p (Convolutional neural networks), c�n ???c bi?t ??n v?i t�n CNNs, l� m?t d?ng m?ng neural ???c c?u th�nh b?i c�c t?ng sau: 
+&#10230; Kiến trúc truyền thống của một mạng CNN â€• Mạng neural tích chập (Convolutional neural networks), còn được biết đến với tên CNNs, là một dạng mạng neural được cấu thành bởi các tầng sau: 
 
 <br>
 
 
 **14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
 
-&#10230; T?ng t�ch ch?p v� t?ng pooling c� th? ???c hi?u ch?nh theo c�c tham s? c?u h�nh (hyperparameters) ???c m� t? ? nh?ng ph?n ti?p theo.
+&#10230; Tầng tích chập và tầng pooling có thể được hiệu chỉnh theo các tham số cấu hình (hyperparameters) được mô tả ở những phần tiếp theo.
 
 <br>
 
@@ -107,23 +107,23 @@
 <br>
 
 
-**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+**16. Convolution layer (CONV) â€• The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
 
-&#10230; T?ng t�ch ch?p (CONV) ― T?ng t�ch ch?p (CONV) s? d?ng c�c b? l?c ?? th?c hi?n ph�p t�ch ch?p khi ??a ch�ng ?i qua ??u v�o I theo c�c chi?u c?a n�. C�c tham s? c?u h�nh c?a b? c�c b? l?c n�y bao g?m k�ch th??c b? l?c F v� ?? tr??t (stride) S. K?t qu? ??u ra O ???c g?i l� feature map hay activation map.
+&#10230; Tầng tích chập (CONV) â€• Tầng tích chập (CONV) sử dụng các bộ lọc để thực hiện phép tích chập khi đưa chúng đi qua đầu vào I theo các chiều của nó. Các tham số cấu hình của các bộ lọc này bao gồm kích thước bộ lọc F và độ trượt (stride) S. Kết quả đầu ra O được gọi là feature map hay activation map.
 
 <br>
 
 
 **17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
 
-&#10230; L?u �: B??c t�ch ch?p c?ng c� th? ???c t?ng quan h�a c? v?i tr??ng h?p m?t chi?u (1D) v� ba chi?u (3D).
+&#10230; Lưu ý: Bước tích chập cũng có thể được khái quát hóa cả với trường hợp một chiều (1D) và ba chiều (3D).
 
 <br>
 
 
-**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+**18. Pooling (POOL) â€• The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
 
-&#10230; Pooling (POOL) ― T?ng pooling (POOL) l� m?t ph�p downsampling, th??ng ???c s? d?ng sau t?ng t�ch ch?p, gi�p t?ng t�nh b?t bi?n kh�ng gian. C? th?, max pooling v� average pooling l� nh?ng d?ng pooling ??c bi?t, m� t??ng ?ng l� trong ?� gi� tr? l?n nh?t v� gi� tr? trung b�nh ???c l?y ra.
+&#10230; Pooling (POOL) â€• Tầng pooling (POOL) là một phép downsampling, thường được sử dụng sau tầng tích chập, giúp tăng tính bất biến không gian. Cụ thể, max pooling và average pooling là những dạng pooling đặc biệt, mà tương ứng là trong đó giá trị lớn nhất và giá trị trung bình được lấy ra.
 
 <br>
 
@@ -137,7 +137,7 @@
 
 **20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
 
-&#10230; [Max pooling, Average pooling, T?ng ph�p pooling ch?n gi� tr? l?n nh?t trong khu v?c m� n� ?ang ???c �p d?ng, T?ng ph�p pooling t�nh trung b�nh c�c gi� tr? trong khu v?c m� n� ?ang ???c s? d?ng]
+&#10230; [Max pooling, Average pooling, Từng phép pooling chọn giá trị lớn nhất trong khu vực mà nó đang được áp dụng, Từng phép pooling tính trung bình các giá trị trong khu vực mà nó đang được áp dụng]
 
 <br>
 
@@ -149,7 +149,7 @@
 <br>
 
 
-**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+**22. Fully Connected (FC) â€• The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
 
 &#10230;
 
@@ -170,7 +170,7 @@
 <br>
 
 
-**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+**25. Dimensions of a filter â€• A filter of size FÃ—F applied to an input containing C channels is a FÃ—FÃ—C volume that performs convolutions on an input of size IÃ—IÃ—C and produces an output feature map (also called activation map) of size OÃ—OÃ—1.**
 
 &#10230;
 
@@ -184,21 +184,21 @@
 <br>
 
 
-**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+**27. Remark: the application of K filters of size FÃ—F results in an output feature map of size OÃ—OÃ—K.**
 
 &#10230;
 
 <br>
 
 
-**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+**28. Stride â€• For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
 
 &#10230;
 
 <br>
 
 
-**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+**29. Zero-padding â€• Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
 
 &#10230;
 
@@ -212,7 +212,7 @@
 <br>
 
 
-**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size âŒˆISâŒ‰, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
 
 &#10230;
 
@@ -226,7 +226,7 @@
 <br>
 
 
-**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+**33. Parameter compatibility in convolution layer â€• By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
 
 &#10230;
 
@@ -240,14 +240,14 @@
 <br>
 
 
-**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+**35. Remark: often times, Pstart=Pendâ‰œP, in which case we can replace Pstart+Pend by 2P in the formula above.**
 
 &#10230;
 
 <br>
 
 
-**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+**36. Understanding the complexity of the model â€• In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
 
 &#10230;
 
@@ -282,14 +282,14 @@
 <br>
 
 
-**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+**41. Receptive field â€• The receptive field at layer k is the area denoted RkÃ—Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
 
 &#10230;
 
 <br>
 
 
-**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2â‹…1+2â‹…1=5.**
 
 &#10230;
 
@@ -303,7 +303,7 @@
 <br>
 
 
-**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+**44. Rectified Linear Unit â€• The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
 
 &#10230;
 
@@ -324,7 +324,7 @@
 <br>
 
 
-**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+**47. Softmax â€• The softmax step can be seen as a generalized logistic function that takes as input a vector of scores xâˆˆRn and outputs a vector of output probability pâˆˆRn through a softmax function at the end of the architecture. It is defined as follows:**
 
 &#10230;
 
@@ -345,7 +345,7 @@
 <br>
 
 
-**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+**50. Types of models â€• There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
 
 &#10230;
 
@@ -380,7 +380,7 @@
 <br>
 
 
-**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+**55. Detection â€• In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
 
 &#10230;
 
@@ -408,35 +408,35 @@
 <br>
 
 
-**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+**59. Intersection over Union â€• Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
 
 &#10230;
 
 <br>
 
 
-**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+**60. Remark: we always have IoUâˆˆ[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)â©¾0.5.**
 
 &#10230;
 
 <br>
 
 
-**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+**61. Anchor boxes â€• Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
 
 &#10230;
 
 <br>
 
 
-**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+**62. Non-max suppression â€• The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
 
 &#10230;
 
 <br>
 
 
-**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoUâ©¾0.5 with the previous box.]**
 
 &#10230;
 
@@ -450,14 +450,14 @@
 <br>
 
 
-**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+**65. YOLO â€• You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
 
 &#10230;
 
 <br>
 
 
-**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+**66. [Step 1: Divide the input image into a GÃ—G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
 
 &#10230;
 
@@ -492,7 +492,7 @@
 <br>
 
 
-**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+**71. R-CNN â€• Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
 
 &#10230;
 
@@ -520,7 +520,7 @@
 <br>
 
 
-**75. Types of models ― Two main types of model are summed up in table below:**
+**75. Types of models â€• Two main types of model are summed up in table below:**
 
 &#10230;
 
@@ -541,21 +541,21 @@
 <br>
 
 
-**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+**78. One Shot Learning â€• One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
 
 &#10230;
 
 <br>
 
 
-**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+**79. Siamese Network â€• Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
 
 &#10230;
 
 <br>
 
 
-**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+**80. Triplet loss â€• The triplet loss â„“ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling Î±âˆˆR+ the margin parameter, this loss is defined as follows:**
 
 &#10230;
 
@@ -569,7 +569,7 @@
 <br>
 
 
-**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+**82. Motivation â€• The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
 
 &#10230;
 
@@ -583,21 +583,21 @@
 <br>
 
 
-**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+**84. Activation â€• In a given layer l, the activation is noted a[l] and is of dimensions nHÃ—nwÃ—nc**
 
 &#10230;
 
 <br>
 
 
-**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+**85. Content cost function â€• The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
 
 &#10230;
 
 <br>
 
 
-**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+**86. Style matrix â€• The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kkâ€² quantifies how correlated the channels k and kâ€² are. It is defined with respect to activations a[l] as follows:**
 
 &#10230;
 
@@ -611,21 +611,21 @@
 <br>
 
 
-**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+**88. Style cost function â€• The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
 
 &#10230;
 
 <br>
 
 
-**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+**89. Overall cost function â€• The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters Î±,Î², as follows:**
 
 &#10230;
 
 <br>
 
 
-**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+**90. Remark: a higher value of Î± will make the model care more about the content while a higher value of Î² will make it care more about the style.**
 
 &#10230;
 
@@ -639,7 +639,7 @@
 <br>
 
 
-**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+**92. Generative Adversarial Network â€• Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
 
 &#10230;
 
@@ -660,14 +660,14 @@
 <br>
 
 
-**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+**95. ResNet â€• The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
 
 &#10230;
 
 <br>
 
 
-**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+**96. Inception Network â€• This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1Ã—1 convolution trick to limit the computational burden.**
 
 &#10230;
 

From ae686c480d8b23e8ecf0f0c242cfdd0556fc99d2 Mon Sep 17 00:00:00 2001
From: Pham Hong Vinh <phamvinh257@gmail.com>
Date: Mon, 30 Sep 2019 16:20:36 +0700
Subject: [PATCH 390/531] Fix encoding issues

---
 vi/cs-230-convolutional-neural-networks.md | 94 +++++++++++-----------
 1 file changed, 47 insertions(+), 47 deletions(-)

diff --git a/vi/cs-230-convolutional-neural-networks.md b/vi/cs-230-convolutional-neural-networks.md
index 7f7ff5620..52a148ef4 100644
--- a/vi/cs-230-convolutional-neural-networks.md
+++ b/vi/cs-230-convolutional-neural-networks.md
@@ -4,7 +4,7 @@
 
 **1. Convolutional Neural Networks cheatsheet**
 
-&#10230; Convolutional Neural Networks cheatsheet
+&#10230;Convolutional Neural Networks cheatsheet
 
 <br>
 
@@ -25,21 +25,21 @@
 
 **4. [Types of layer, Convolution, Pooling, Fully connected]**
 
-&#10230; [Các kiểu tầng(layer), Tích chập, Pooling, Kết nối đầy đủ] 
+&#10230; [Các kiểu tầng(layer), Tích chập, Pooling, Kết nối đầy đủ]
 
 <br>
 
 
 **5. [Filter hyperparameters, Dimensions, Stride, Padding]**
 
-&#10230; [Filter hyperparameters, Dimensions, Stride, Padding] 
+&#10230; [Filter hyperparameters, Dimensions, Stride, Padding]
 
 <br>
 
 
 **6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
 
-&#10230; [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+&#10230; [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]
 
 <br>
 
@@ -86,9 +86,9 @@
 <br>
 
 
-**13. Architecture of a traditional CNN â€• Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
 
-&#10230; Kiến trúc truyền thống của một mạng CNN â€• Mạng neural tích chập (Convolutional neural networks), còn được biết đến với tên CNNs, là một dạng mạng neural được cấu thành bởi các tầng sau: 
+&#10230; Kiến trúc truyền thống của một mạng CNN â€• Mạng neural tích chập (Convolutional neural networks), còn được biết đến với tên CNNs, là một dạng mạng neural được cấu thành bởi các tầng sau:
 
 <br>
 
@@ -107,9 +107,9 @@
 <br>
 
 
-**16. Convolution layer (CONV) â€• The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
 
-&#10230; Tầng tích chập (CONV) â€• Tầng tích chập (CONV) sử dụng các bộ lọc để thực hiện phép tích chập khi đưa chúng đi qua đầu vào I theo các chiều của nó. Các tham số cấu hình của các bộ lọc này bao gồm kích thước bộ lọc F và độ trượt (stride) S. Kết quả đầu ra O được gọi là feature map hay activation map.
+&#10230; Tầng tích chập (CONV) ― Tầng tích chập (CONV) sử dụng các bộ lọc để thực hiện phép tích chập khi đưa chúng đi qua đầu vào I theo các chiều của nó. Các tham số cấu hình của các bộ lọc này bao gồm kích thước bộ lọc F và độ trượt (stride) S. Kết quả đầu ra O được gọi là feature map hay activation map.
 
 <br>
 
@@ -121,9 +121,9 @@
 <br>
 
 
-**18. Pooling (POOL) â€• The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
 
-&#10230; Pooling (POOL) â€• Tầng pooling (POOL) là một phép downsampling, thường được sử dụng sau tầng tích chập, giúp tăng tính bất biến không gian. Cụ thể, max pooling và average pooling là những dạng pooling đặc biệt, mà tương ứng là trong đó giá trị lớn nhất và giá trị trung bình được lấy ra.
+&#10230; Pooling (POOL) ― Tầng pooling (POOL) là một phép downsampling, thường được sử dụng sau tầng tích chập, giúp tăng tính bất biến không gian. Cụ thể, max pooling và average pooling là những dạng pooling đặc biệt, mà tương ứng là trong đó giá trị lớn nhất và giá trị trung bình được lấy ra.
 
 <br>
 
@@ -149,7 +149,7 @@
 <br>
 
 
-**22. Fully Connected (FC) â€• The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
 
 &#10230;
 
@@ -170,7 +170,7 @@
 <br>
 
 
-**25. Dimensions of a filter â€• A filter of size FÃ—F applied to an input containing C channels is a FÃ—FÃ—C volume that performs convolutions on an input of size IÃ—IÃ—C and produces an output feature map (also called activation map) of size OÃ—OÃ—1.**
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
 
 &#10230;
 
@@ -184,21 +184,21 @@
 <br>
 
 
-**27. Remark: the application of K filters of size FÃ—F results in an output feature map of size OÃ—OÃ—K.**
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
 
 &#10230;
 
 <br>
 
 
-**28. Stride â€• For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
 
 &#10230;
 
 <br>
 
 
-**29. Zero-padding â€• Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
 
 &#10230;
 
@@ -212,7 +212,7 @@
 <br>
 
 
-**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size âŒˆISâŒ‰, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
 
 &#10230;
 
@@ -226,7 +226,7 @@
 <br>
 
 
-**33. Parameter compatibility in convolution layer â€• By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
 
 &#10230;
 
@@ -240,14 +240,14 @@
 <br>
 
 
-**35. Remark: often times, Pstart=Pendâ‰œP, in which case we can replace Pstart+Pend by 2P in the formula above.**
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
 
 &#10230;
 
 <br>
 
 
-**36. Understanding the complexity of the model â€• In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
 
 &#10230;
 
@@ -282,14 +282,14 @@
 <br>
 
 
-**41. Receptive field â€• The receptive field at layer k is the area denoted RkÃ—Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
 
 &#10230;
 
 <br>
 
 
-**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2â‹…1+2â‹…1=5.**
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
 
 &#10230;
 
@@ -303,7 +303,7 @@
 <br>
 
 
-**44. Rectified Linear Unit â€• The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
 
 &#10230;
 
@@ -324,7 +324,7 @@
 <br>
 
 
-**47. Softmax â€• The softmax step can be seen as a generalized logistic function that takes as input a vector of scores xâˆˆRn and outputs a vector of output probability pâˆˆRn through a softmax function at the end of the architecture. It is defined as follows:**
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
 
 &#10230;
 
@@ -345,7 +345,7 @@
 <br>
 
 
-**50. Types of models â€• There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
 
 &#10230;
 
@@ -380,7 +380,7 @@
 <br>
 
 
-**55. Detection â€• In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
 
 &#10230;
 
@@ -408,35 +408,35 @@
 <br>
 
 
-**59. Intersection over Union â€• Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
 
 &#10230;
 
 <br>
 
 
-**60. Remark: we always have IoUâˆˆ[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)â©¾0.5.**
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
 
 &#10230;
 
 <br>
 
 
-**61. Anchor boxes â€• Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
 
 &#10230;
 
 <br>
 
 
-**62. Non-max suppression â€• The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
 
 &#10230;
 
 <br>
 
 
-**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoUâ©¾0.5 with the previous box.]**
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
 
 &#10230;
 
@@ -450,14 +450,14 @@
 <br>
 
 
-**65. YOLO â€• You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
 
 &#10230;
 
 <br>
 
 
-**66. [Step 1: Divide the input image into a GÃ—G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
 
 &#10230;
 
@@ -492,7 +492,7 @@
 <br>
 
 
-**71. R-CNN â€• Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
 
 &#10230;
 
@@ -520,7 +520,7 @@
 <br>
 
 
-**75. Types of models â€• Two main types of model are summed up in table below:**
+**75. Types of models ― Two main types of model are summed up in table below:**
 
 &#10230;
 
@@ -541,21 +541,21 @@
 <br>
 
 
-**78. One Shot Learning â€• One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
 
 &#10230;
 
 <br>
 
 
-**79. Siamese Network â€• Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
 
 &#10230;
 
 <br>
 
 
-**80. Triplet loss â€• The triplet loss â„“ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling Î±âˆˆR+ the margin parameter, this loss is defined as follows:**
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
 
 &#10230;
 
@@ -569,7 +569,7 @@
 <br>
 
 
-**82. Motivation â€• The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
 
 &#10230;
 
@@ -583,21 +583,21 @@
 <br>
 
 
-**84. Activation â€• In a given layer l, the activation is noted a[l] and is of dimensions nHÃ—nwÃ—nc**
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
 
 &#10230;
 
 <br>
 
 
-**85. Content cost function â€• The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
 
 &#10230;
 
 <br>
 
 
-**86. Style matrix â€• The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kkâ€² quantifies how correlated the channels k and kâ€² are. It is defined with respect to activations a[l] as follows:**
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
 
 &#10230;
 
@@ -611,21 +611,21 @@
 <br>
 
 
-**88. Style cost function â€• The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
 
 &#10230;
 
 <br>
 
 
-**89. Overall cost function â€• The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters Î±,Î², as follows:**
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
 
 &#10230;
 
 <br>
 
 
-**90. Remark: a higher value of Î± will make the model care more about the content while a higher value of Î² will make it care more about the style.**
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
 
 &#10230;
 
@@ -639,7 +639,7 @@
 <br>
 
 
-**92. Generative Adversarial Network â€• Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
 
 &#10230;
 
@@ -660,14 +660,14 @@
 <br>
 
 
-**95. ResNet â€• The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
 
 &#10230;
 
 <br>
 
 
-**96. Inception Network â€• This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1Ã—1 convolution trick to limit the computational burden.**
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
 
 &#10230;
 

From fa8b38a1fc26f82f7e7e455035e77e6d66928d08 Mon Sep 17 00:00:00 2001
From: Pham Hong Vinh <phamvinh257@gmail.com>
Date: Mon, 30 Sep 2019 16:25:39 +0700
Subject: [PATCH 391/531] Fix encoding issues

---
 vi/cs-230-convolutional-neural-networks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vi/cs-230-convolutional-neural-networks.md b/vi/cs-230-convolutional-neural-networks.md
index 52a148ef4..f4c2b023f 100644
--- a/vi/cs-230-convolutional-neural-networks.md
+++ b/vi/cs-230-convolutional-neural-networks.md
@@ -88,7 +88,7 @@
 
 **13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
 
-&#10230; Kiến trúc truyền thống của một mạng CNN â€• Mạng neural tích chập (Convolutional neural networks), còn được biết đến với tên CNNs, là một dạng mạng neural được cấu thành bởi các tầng sau:
+&#10230; Kiến trúc truyền thống của một mạng CNN ― Mạng neural tích chập (Convolutional neural networks), còn được biết đến với tên CNNs, là một dạng mạng neural được cấu thành bởi các tầng sau:
 
 <br>
 

From a6411bc912c6c02c3538636fce9a993867c82616 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Mon, 30 Sep 2019 22:24:33 +0900
Subject: [PATCH 392/531] [ja] Cheatsheet Unsupervised learning

---
 ja/cheatsheet-unsupervised-learning.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/ja/cheatsheet-unsupervised-learning.md b/ja/cheatsheet-unsupervised-learning.md
index 77cf53bf9..69e6ba152 100644
--- a/ja/cheatsheet-unsupervised-learning.md
+++ b/ja/cheatsheet-unsupervised-learning.md
@@ -180,25 +180,25 @@
 
 **31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
 
-&#10230; これはデータを投影する方向で、分散を最大にする方向を見つける次元削減手法です。
+&#10230; これは分散を最大にするデータの射影方向を見つける次元削減手法です。
 
 <br>
 
 **32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
-&#10230; 固有値、固有ベクトル - 行列 A∈Rn×nが与えられたとき、次の式で固有ベクトルと呼ばれるベクトルz∈Rn∖{0}が存在した場合に、λはAの固有値と呼ばれる。
+&#10230; 固有値、固有ベクトル - 行列 A∈Rn×nが与えられたとき、次の式で固有ベクトルと呼ばれるベクトルz∈Rn∖{0}が存在した場合に、λはAの固有値と呼ばれます。
 
 <br>
 
 **33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230; スペクトル定理 - A∈Rn×nとする。Aが対称のとき、Aは実直交行列U∈Rn×nを用いて対角化可能である。Λ=diag(λ1,...,λn)と表記することで、次の式を得る。
+&#10230; スペクトル定理 - A∈Rn×nとする。Aが対称のとき、Aは実直交行列U∈Rn×nを用いて対角化可能です。Λ=diag(λ1,...,λn)と表記することで、次の式を得ます。
 
 <br>
 
 **34. diagonal**
 
-&#10230; 対角
+&#10230; diagonal
 
 <br>
 
@@ -210,7 +210,7 @@
 
 **36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:**
 
-&#10230; アルゴリズム ― 主成分分析 (PCA)の過程は、次のようにデータの分散を最大化することによりデータをk次元に射影する次元削減の技術である。
+&#10230; アルゴリズム ― 主成分分析 (PCA)の過程は、次のようにデータの分散を最大化することによりデータをk次元に射影する次元削減の技術です。
 
 <br>
 
@@ -294,7 +294,7 @@
 
 **50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
 
-&#10230; そのため、確率的勾配上昇法の学習規則は、学習サンプルx(i)に対して次のようにwを更新するものです：
+&#10230; そのため、確率的勾配上昇法の学習規則は、学習サンプルx(i)に対して次のようにWを更新するものです：
 
 <br>
 
@@ -312,13 +312,13 @@
 
 **53. Translated by X, Y and Z**
 
-&#10230; X, Y, Zによる翻訳
+&#10230; X・Y・Z 訳
 
 <br>
 
 **54. Reviewed by X, Y and Z**
 
-&#10230; X, Y, Zによるレビュー
+&#10230; X・Y・Z 校正
 
 <br>
 

From 44bf281b15f6c468bc69f74cc13239ab53087c3b Mon Sep 17 00:00:00 2001
From: Pham Hong Vinh <phamvinh257@gmail.com>
Date: Tue, 1 Oct 2019 13:28:31 +0700
Subject: [PATCH 393/531] Translated to line 40

---
 vi/cs-230-convolutional-neural-networks.md | 56 +++++++++++-----------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/vi/cs-230-convolutional-neural-networks.md b/vi/cs-230-convolutional-neural-networks.md
index f4c2b023f..302aeb58e 100644
--- a/vi/cs-230-convolutional-neural-networks.md
+++ b/vi/cs-230-convolutional-neural-networks.md
@@ -25,42 +25,42 @@
 
 **4. [Types of layer, Convolution, Pooling, Fully connected]**
 
-&#10230; [Các kiểu tầng(layer), Tích chập, Pooling, Kết nối đầy đủ]
+&#10230; [Các kiểu tầng (layer), Tích chập, Pooling, Kết nối đầy đủ]
 
 <br>
 
 
 **5. [Filter hyperparameters, Dimensions, Stride, Padding]**
 
-&#10230; [Filter hyperparameters, Dimensions, Stride, Padding]
+&#10230; [Các tham số cấu hình của bộ lọc, Các chiều, Stride, Padding]
 
 <br>
 
 
 **6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
 
-&#10230; [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]
+&#10230; [Điều chỉnh các tham số cấu hình, Độ tương thích tham số, Độ phức tạp mô hình, Receptive field]
 
 <br>
 
 
 **7. [Activation functions, Rectified Linear Unit, Softmax]**
 
-&#10230; [Activation functions, Rectified Linear Unit, Softmax]
+&#10230; [Các hàm kích hoạt, Rectified Linear Unit, Softmax]
 
 <br>
 
 
 **8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
 
-&#10230; [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]
+&#10230; [Nhận diện vật thể, Các kiểu mô hình, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]
 
 <br>
 
 
 **9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
 
-&#10230; [Face verification/recognition, One shot learning, Siamese network, Triplet loss]
+&#10230; [Nhận diện/ xác nhận gương mặt, One shot learning, Siamese network, Triplet loss]
 
 <br>
 
@@ -102,7 +102,7 @@
 
 **15. Types of layer**
 
-&#10230; Types of layer
+&#10230; Các kiểu tầng
 
 <br>
 
@@ -130,7 +130,7 @@
 
 **19. [Type, Purpose, Illustration, Comments]**
 
-&#10230; [Type, Purpose, Illustration, Comments]
+&#10230; [Kiểu, Chức năng, Minh họa, Nhận xét]
 
 <br>
 
@@ -144,140 +144,140 @@
 
 **21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
 
-&#10230; [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]
+&#10230; [Bảo toàn các đặc trưng đã phát hiện, Được sử dụng thường xuyên, Giảm kích thước feature map, Được sử dụng trong mạng LeNet]
 
 <br>
 
 
 **22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
 
-&#10230;
+&#10230;  Fully Connected (FC) ― Tầng kết nối đầy đủ (FC) nhận đầu vào là các dữ liệu đã được làm phẳng, mà mỗi đầu vào đó được kết nối đến tất cả neuron. Trong mô hình mạng CNNs, các tầng kết nối đầy đủ thường được tìm thấy ở cuối mạng và được dùng để tối ưu hóa mục tiêu của mạng ví dụ như độ chính xác của lớp (class).
 
 <br>
 
 
 **23. Filter hyperparameters**
 
-&#10230;
+&#10230; Các tham số cấu hình của bộ lọc
 
 <br>
 
 
 **24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
 
-&#10230;
+&#10230; Tầng tích chập chứa các bộ lọc mà rất quan trọng cho ta khi biết ý nghĩa đằng sau các tham số cấu hình của chúng.
 
 <br>
 
 
 **25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
 
-&#10230;
+&#10230; Các chiều của một bộ lọc ― Một bộ lọc kích thước F×F áp dụng lên đầu vào chứa C kênh (channels) thì có kích thước tổng kể là F×F×C thực hiện phép tích chập trên đầu vào kích thước I×I×C và cho ra một  feature map (hay còn gọi là activation map) có kích thước O×O×1.
 
 <br>
 
 
 **26. Filter**
 
-&#10230;
+&#10230; Bộ lọc
 
 <br>
 
 
 **27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
 
-&#10230;
+&#10230; Lưu ý: Việc áp dụng K bộ lọc có kích thước F×F cho ra một feature map có kích thước O×O×K.
 
 <br>
 
 
 **28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
 
-&#10230;
+&#10230; Stride ― Đối với phép tích chập hoặc phép pooling, độ trượt S ký hiệu số pixel mà cửa sổ sẽ di chuyển sau mỗi lần thực hiện phép tính.
 
 <br>
 
 
 **29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
 
-&#10230;
+&#10230;  Zero-padding ― Zero-padding là tên gọi của quá trình thêm P số không vào các biên của đầu vào. Giá trị này có thể được lựa chọn thủ công hoặc một cách tự động bằng một trong ba những phương pháp mô tả bên dưới:
 
 <br>
 
 
 **30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
 
-&#10230;
+&#10230; [Phương pháp, Giá trị, Mục đích, Valid, Same, Full]
 
 <br>
 
 
 **31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
 
-&#10230;
+&#10230; [Không sử dụng padding, Bỏ phép tích chập cuối nếu số chiều không khớp, Sử dụng padding để làm cho feature map có kích  thước ⌈IS⌉, Kích thước đầu ra thuận lợi về mặt toán học, Còn được gọi là 'half' padding, Padding tối đa sao cho các phép tích chập có thể được sử dụng tại các rìa của đầu vào, Bộ lọc 'thấy' được đầu vào từ đầu đến cuối]
 
 <br>
 
 
 **32. Tuning hyperparameters**
 
-&#10230;
+&#10230; Điều chỉnh tham số cấu hình
 
 <br>
 
 
 **33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
 
-&#10230;
+&#10230; Tính tương thích của tham số trong tầng tích chập ― Bằng cách ký hiệu I là độ dài kích thước đầu vào, F là độ dài của bộ lọc, P là số lượng zero padding, S là độ trượt, ta có thể tính được độ dài O của feature map theo một chiều bằng công thức:
 
 <br>
 
 
 **34. [Input, Filter, Output]**
 
-&#10230;
+&#10230; [Đầu vào, Bộ lọc, Đầu ra]
 
 <br>
 
 
 **35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
 
-&#10230;
+&#10230; Lưu ý: Trong một số trường hợp, Pstart=Pend≜P, ta có thể thay thế Pstart+Pend bằng 2P trong công thức trên.
 
 <br>
 
 
 **36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
 
-&#10230;
+&#10230; Hiểu về độ phức tạp của mô hình ― Để đánh giá độ phức tạp của một mô hình, cách hữu hiệu là xác định số tham số mà mô hình đó sẽ có. Trong một tầng của mạng neural tích chập, nó sẽ được tính toán như sau:
 
 <br>
 
 
 **37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
 
-&#10230;
+&#10230; [Minh họa, Kích thước đầu vào, Kích thước đầu ra, Số lượng tham số, Lưu ý]
 
 <br>
 
 
 **38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
 
-&#10230;
+&#10230; [Một tham số bias với mỗi bộ lọc, Trong đa số trường hợp, S<F, Một lựa chọn phổ biến cho K là 2C]
 
 <br>
 
 
 **39. [Pooling operation done channel-wise, In most cases, S=F]**
 
-&#10230;
+&#10230; [Phép pooling được áp dụng lên từng kênh (channel-wise), Trong đa số trường hợp, S=F]
 
 <br>
 
 
 **40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
 
-&#10230;
+&#10230; [Đầu vào được làm phẳng, Mỗi neuron có một tham số bias, Số neuron trong một tầng FC phụ thuộc vào ràng buộc kết cấu]
 
 <br>
 

From 36cc2c2f835197019aa92281892462ef45ed724f Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Wed, 2 Oct 2019 00:04:19 -0700
Subject: [PATCH 394/531] Add [vi] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 48c60d76d..8d14e6a12 100644
--- a/README.md
+++ b/README.md
@@ -97,7 +97,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Русский**|not started|not started|not started|
 |**Türkçe**|done|done|done|
 |**Українська**|not started|not started|not started|
-|**Tiếng Việt**|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
+|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/180)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
 |**中文**|not started|not started|not started|
 
 ## Acknowledgements

From 544fe99166933d262359ae8fb770b43feb07d159 Mon Sep 17 00:00:00 2001
From: Pham Hong Vinh <phamvinh257@gmail.com>
Date: Fri, 4 Oct 2019 21:21:55 +0700
Subject: [PATCH 395/531] Translated to line 55

---
 vi/cs-230-convolutional-neural-networks.md | 26 +++++++++++-----------
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/vi/cs-230-convolutional-neural-networks.md b/vi/cs-230-convolutional-neural-networks.md
index 302aeb58e..3c286c268 100644
--- a/vi/cs-230-convolutional-neural-networks.md
+++ b/vi/cs-230-convolutional-neural-networks.md
@@ -298,91 +298,91 @@
 
 **43. Commonly used activation functions**
 
-&#10230;
+&#10230; Các hàm kích hoạt thường gặp
 
 <br>
 
 
 **44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
 
-&#10230;
+&#10230; Rectified Linear Unit ― Tầng rectified linear unit (ReLU) là một hàm kích hoạt g  được sử dụng trên tất cả các thành phần. Mục đích của nó là tăng tính phi tuyến tính cho mạng. Những biến thể khác của ReLU được tổng hợp ở bảng dưới:
 
 <br>
 
 
 **45. [ReLU, Leaky ReLU, ELU, with]**
 
-&#10230;
+&#10230; [ReLU, Leaky ReLU, ELU, with]
 
 <br>
 
 
 **46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
 
-&#10230;
+&#10230; [Độ phức tạp phi tuyến tính có thể thông dịch được về mặt sinh học, Gán vấn đề ReLU chết cho những giá trị âm, Khả vi tại mọi nơi]
 
 <br>
 
 
 **47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
 
-&#10230;
+&#10230; Softmax ― Bước softmax có thể được coi là một hàm logistic tổng quát lấy đầu vào là một vector chứa các giá trị x∈Rn và cho ra là một vector gồm các xác suất p∈Rn thông qua một hàm softmax ở cuối kiến trúc. Nó được định nghĩa như sau:
 
 <br>
 
 
 **48. where**
 
-&#10230;
+&#10230; với
 
 <br>
 
 
 **49. Object detection**
 
-&#10230;
+&#10230; Nhận diện vật thể (Object detection)
 
 <br>
 
 
 **50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
 
-&#10230;
+&#10230; Các kiểu mô hình ― Có 3 kiểu thuật toán nhận diện vật thể chính, vì thế mà bản chất của thứ được dự đoán sẽ khác nhau. Chúng được miêu tả ở bảng dưới:
 
 <br>
 
 
 **51. [Image classification, Classification w. localization, Detection]**
 
-&#10230;
+&#10230; [Phân loại hình ảnh, Phân loại cùng với định vị, Nhận diện]
 
 <br>
 
 
 **52. [Teddy bear, Book]**
 
-&#10230;
+&#10230; [Gấu bông, Sách]
 
 <br>
 
 
 **53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
 
-&#10230;
+&#10230; [Phân loại một tấm ảnh, Dự đoán xác suất của một vật thể, Nhận diện một vật thể trong ảnh, Dự đoán xác suất của vật thể và định vị nó, Nhận diện nhiều vật thể trong cùng một tấm ảnh, Dự đoán xác suất của các vật thể và định vị chúng]
 
 <br>
 
 
 **54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
 
-&#10230;
+&#10230; [CNN cổ điển, YOLO đơn giản hóa, R-CNN, YOLO, R-CNN]
 
 <br>
 
 
 **55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
 
-&#10230;
+&#10230; Detection ― Trong bối cảnh nhận diện vật thể, những phương pháp khác nhau được áp dụng tùy thuộc vào liệu chúng ta chỉ muốn định vị vật thể hay nhận diện được những hình dạng phức tạp hơn trong tấm ảnh. Hai phương pháp chính được tổng hợp ở bảng dưới: 
 
 <br>
 

From d3eda7818cd5f0058d72b79476965439cc1ea57b Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Sun, 6 Oct 2019 12:03:22 +0900
Subject: [PATCH 396/531] [ja] Cheatsheet Unsupervised learning

---
 ja/cheatsheet-unsupervised-learning.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/ja/cheatsheet-unsupervised-learning.md b/ja/cheatsheet-unsupervised-learning.md
index 69e6ba152..917433d45 100644
--- a/ja/cheatsheet-unsupervised-learning.md
+++ b/ja/cheatsheet-unsupervised-learning.md
@@ -126,7 +126,7 @@
 
 **22. [Ward linkage, Average linkage, Complete linkage]**
 
-&#10230; [Ward linkage, Average linkage, Complete linkage]
+&#10230; [ウォードリンケージ, 平均リンケージ, 完全リンケージ]
 
 <br>
 
@@ -144,13 +144,13 @@
 
 **25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
 
-&#10230; 教師なし学習では、教師あり学習の場合のような正解ラベルがないため、モデルの性能を評価することが難しい場合が多いです。
+&#10230; 教師なし学習では、教師あり学習の場合のような正解ラベルがないため、モデルの性能を評価することが困難な場合が多いです。
 
 <br>
 
 **26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
 
-&#10230; シルエット係数 ― サンプルと同じクラスタ内のその他全ての点との平均距離をa、最も近いクラスタ内の全ての点との平均距離をbと表記すると、サンプルのシルエット係数sは次のように定義されます:
+&#10230; シルエット係数 ― ある1つのサンプルと同じクラス内のその他全ての点との平均距離をa、そのサンプルから最も近いクラスタ内の全ての点との平均距離をbと表記すると、そのサンプルのシルエット係数sは次のように定義されます:
 
 <br>
 
@@ -162,7 +162,7 @@
 
 **28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
 
-&#10230; Calinski-Harabazインデックスs(k)はクラスタリングモデルが各クラスタをどの程度適切に定義しているかを示します。スコアが高いほど、各クラスタはより密で、十分に分離されています。 それは次のように定義されます:
+&#10230; Calinski-Harabazインデックスs(k)はクラスタリングモデルが各クラスタをどの程度適切に定義しているかを示します。つまり、スコアが高いほど、各クラスタはより密で、十分に分離されています。 それは次のように定義されます:
 
 <br>
 
@@ -330,7 +330,7 @@
 
 **56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
 
-&#10230;[クラスタリング, EM, k-means, 階層クラスタリング, 指標]
+&#10230;[クラスタリング, 期待値最大化法, k-means, 階層クラスタリング, 指標]
 
 <br>
 

From a702b15ade11a7ccb1ccf345fd0b12c34668643b Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 6 Oct 2019 10:24:41 -0700
Subject: [PATCH 397/531] Add [zh] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 8d14e6a12..746c7ff83 100644
--- a/README.md
+++ b/README.md
@@ -98,7 +98,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Türkçe**|done|done|done|
 |**Українська**|not started|not started|not started|
 |**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/180)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
-|**中文**|not started|not started|not started|
+|**中文**|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/181)|not started|
 
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From 0d44ee443a0df49d11fd0b2c242e93665714b51e Mon Sep 17 00:00:00 2001
From: shervinea <shervine.amidi@centraliens.net>
Date: Sun, 6 Oct 2019 11:42:49 -0700
Subject: [PATCH 398/531] Rename files to new convention

---
 ja/{refresher-linear-algebra.md => cs-229-linear-algebra.md}      | 0
 ja/{refresher-probability.md => cs-229-probability.md}            | 0
 ...sheet-supervised-learning.md => cs-229-supervised-learning.md} | 0
 ...neural-networks.md => cs-230-convolutional-neural-networks.md} | 0
 ...ent-neural-networks.md => cs-230-recurrent-neural-networks.md} | 0
 5 files changed, 0 insertions(+), 0 deletions(-)
 rename ja/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (100%)
 rename ja/{refresher-probability.md => cs-229-probability.md} (100%)
 rename ja/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)
 rename ja/{convolutional-neural-networks.md => cs-230-convolutional-neural-networks.md} (100%)
 rename ja/{recurrent-neural-networks.md => cs-230-recurrent-neural-networks.md} (100%)

diff --git a/ja/refresher-linear-algebra.md b/ja/cs-229-linear-algebra.md
similarity index 100%
rename from ja/refresher-linear-algebra.md
rename to ja/cs-229-linear-algebra.md
diff --git a/ja/refresher-probability.md b/ja/cs-229-probability.md
similarity index 100%
rename from ja/refresher-probability.md
rename to ja/cs-229-probability.md
diff --git a/ja/cheatsheet-supervised-learning.md b/ja/cs-229-supervised-learning.md
similarity index 100%
rename from ja/cheatsheet-supervised-learning.md
rename to ja/cs-229-supervised-learning.md
diff --git a/ja/convolutional-neural-networks.md b/ja/cs-230-convolutional-neural-networks.md
similarity index 100%
rename from ja/convolutional-neural-networks.md
rename to ja/cs-230-convolutional-neural-networks.md
diff --git a/ja/recurrent-neural-networks.md b/ja/cs-230-recurrent-neural-networks.md
similarity index 100%
rename from ja/recurrent-neural-networks.md
rename to ja/cs-230-recurrent-neural-networks.md

From 97f10e85dd85c5fa92005cece2b9dccb472e4192 Mon Sep 17 00:00:00 2001
From: shervinea <shervine.amidi@centraliens.net>
Date: Sun, 6 Oct 2019 12:00:35 -0700
Subject: [PATCH 399/531] Add [zh-tw] progress

---
 README.md | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/README.md b/README.md
index 746c7ff83..b68d440c1 100644
--- a/README.md
+++ b/README.md
@@ -48,7 +48,8 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Português**|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|
 |**Tiếng Việt**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/179)|
-|**中文**|not started|not started|not started|not started|
+|**简体中文**|not started|not started|not started|not started|
+|**繁體中文**|not started|not started|not started|not started|
 
 ### CS 229 (Machine Learning)
 | |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)|
@@ -73,7 +74,8 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Türkçe**|done|done|done|done|done|done|
 |**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
 |**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/162)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/177)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
-|**中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|**简体中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|**繁體中文**|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/137)|done|done|
 
 ### CS 230 (Deep Learning)
 | |[Convolutional Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-convolutional-neural-networks.md)|[Recurrent Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-recurrent-neural-networks.md)|[Deep Learning tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-deep-learning-tips-and-tricks.md)|
@@ -98,7 +100,8 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Türkçe**|done|done|done|
 |**Українська**|not started|not started|not started|
 |**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/180)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
-|**中文**|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/181)|not started|
+|**简体中文**|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/181)|not started|
+|**繁體中文**|not started|not started|not started|
 
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From fa3eafb9301a0be86d9a451ea95858851a6e1658 Mon Sep 17 00:00:00 2001
From: Prasetia Utama Putra <prasetiautama2@gmail.com>
Date: Sat, 12 Oct 2019 09:58:49 +0900
Subject: [PATCH 400/531] Update convolutional-neural-networks.md

Revise the translation according to the comments from the reviewer.
---
 id/convolutional-neural-networks.md | 67 ++++++++++++++---------------
 1 file changed, 33 insertions(+), 34 deletions(-)

diff --git a/id/convolutional-neural-networks.md b/id/convolutional-neural-networks.md
index 366580e93..4f22dfc35 100644
--- a/id/convolutional-neural-networks.md
+++ b/id/convolutional-neural-networks.md
@@ -16,7 +16,7 @@
 <br>
 
 
-**3. [Overview, Architecture structure]**
+**3. [Intisari, Struktur arsitektur]**
 
 &#10230;[Overview, Struktur Arsitektur]
 
@@ -25,21 +25,21 @@
 
 **4. [Types of layer, Convolution, Pooling, Fully connected]**
 
-&#10230;[Jenis-jenis layer, Covolution, Pooling, Fully connected]
+&#10230;[Jenis-jenis layer, Konvolusi, Pooling, Fully connected]
 
 <br>
 
 
 **5. [Filter hyperparameters, Dimensions, Stride, Padding]**
 
-&#10230;[Hyperparameters filter, Dimensi, Stride, Padding]
+&#10230;[Hiperparameter filter, Dimensi, Stride, Padding]
 
 <br>
 
 
 **6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
 
-&#10230;[Hyperparameters tuning, Kompability parameter, Kompleksitas model, Receptive field]
+&#10230;[Penyetelan hiperparameter, Kesesuaian parameter, Kompleksitas model, Receptive field]
 
 <br>
 
@@ -60,7 +60,7 @@
 
 **9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
 
-&#10230;[Verifikasi/rekognisi wajah, One shot learning, Siamese network, Loss triplet]
+&#10230;[Verifikasi/pengenal wajah, One shot learning, Siamese network, Loss triplet]
 
 <br>
 
@@ -74,28 +74,28 @@
 
 **11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
 
-&#10230;[Arkitektur trik komputasi, Generative Adversarial Net, ResNet, Inception Network]
+&#10230;[Arkitektur trik komputasional, Generative Adversarial Net, ResNet, Inception Network]
 
 <br>
 
 
 **12. Overview**
 
-&#10230;Overview
+&#10230;Ringkasan
 
 <br>
 
 
 **13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
 
-&#10230;Arkitektur dari sebuah tradisional CNN - Convolutional neural network, juga dikenal sebagai CNN, adalah sebuah tipe khusus dari neural network yang secara general terdiri dari layer-layer berikut:
+&#10230;Arkitektur dari sebuah tradisional CNN - Convolutional neural network, juga dikenal sebagai CNN, adalah sebuah tipe khusus dari neural network yang secara umum terdiri dari layer-layer berikut:
 
 <br>
 
 
 **14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
 
-&#10230;Layer convolution and layer pooling dapat disesuaikan terhadap hyperparameters yang dijelaskan pada sesi selanjutnya.
+&#10230;Layer konvolusi and layer pooling dapat disesuaikan terhadap hiperparameter yang dijelaskan pada bagian selanjutnya.
 
 <br>
 
@@ -109,21 +109,21 @@
 
 **16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
 
-&#10230;Layer convolution - Layer convolution (CONV) menggunakan filter yang melakukan operasi konvolusion seakan CONV net menscan masukan I berdasarkan dimensinya. Hyperparameter dari CONV meliputi ukuran filter F dan ukuran stride S. Keluaran hasil O disebut feature map atau activation map.
+&#10230;Layer convolution - Layer convolution (CONV) menggunakan banyak filter yang dapat melakukan operasi konvolusi karena CONV memindai input I dengan memperhatikan dimensinya. Hiperparameter dari CONV meliputi ukuran filter F dan stride S. Keluaran hasil O disebut feature map atau activation map.
 
 <br>
 
 
 **17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
 
-&#10230;Perlu diingat: tahap konvolusion dapat digeneralisasi terhadap masukan 1D dan 3D.
+&#10230;Catatan: tahap konvolusi dapat digeneralisasi juga dalam kasus 1D dan 3D.
 
 <br>
 
 
 **18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
 
-&#10230;Pooling (POOL) - Layer pooling adalah sebuah operasi downsampling, biasanya diaplikasikan setelah sebauh layer convolution, yang mengnyebabkan ianvariansi spasial. Pada khususnya, pooling max dan average adalah jenis khusus dari pooling layer masing-masing mengambil nilai maksimum dan rata-rata.
+&#10230;Pooling (POOL) - Layer pooling adalah sebuah operasi downsampling, biasanya diaplikasikan setelah lapisan konvolusi, yang menyebabkan invarian spasial. Pada khususnya, pooling max dan average merupakan jenis-jenis pooling spesial di mana masing-masing nilai maksimal dan rata-rata diambil.
 
 <br>
 
@@ -137,36 +137,35 @@
 
 **20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
 
-&#10230;[Pooling max, Pooling average, Setiap operasi pooling mengambil nilai tertinggi dari tinjauan sekarang, Setiap operasi poling menghitung rata-rata dari tinjauan sekarang]
+&#10230;[Max pooling, Average pooling, Setiap operasi pooling mewakili nilai maksimal dari tampilan terbaru, setiap operasi pooling meratakan nilai-nilai dari tampilan terbaru]
 
 <br>
 
 
 **21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
 
-&#10230;[Mempertahankan fitur yang terdeteksi, Yang biasa dipakai, Downsample feature map, Digunakan di LeNet]
+&#10230;[Mempertahankan fitur yang terdeteksi, yang paling sering digunakan, Downsamples feature map, dipakai di LeNet]
 
 <br>
 
 
 **22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
 
-&#10230;Fully Connected (FC) - Fully connected layer (FC) menangani sebuah masukan dijadikan 1D dimana setiap elemen masukan terkoneksi keseluruh neuron. Layer FC biasanya ditemukan pada akhir dari arsitektur CNN dan dapat digunakan untuk mengoptimisasi objektif seperti skor kelas (pada kasus klasifikasi).
+&#10230;Fully Connected (FC) - Fully connected layer (FC) menangani sebuah masukan dijadikan 1D ddi mana setiap masukan terhubung ke seluruh neuron. Bila ada, lapisan-lapisan FC biasanya ditemukan pada akhir arsitektur CNN dan dapat digunakan untuk mengoptimalkan hasil seperti skor-skor kelas (pada kasus klasifikasi).
 
 <br>
 
 
 **23. Filter hyperparameters**
 
-&#10230;Hyperparameters filter
+&#10230;Hiperparameter filter
 
 <br>
 
 
 **24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
 
-&#10230;Layer convolutional memuat filter yang mana adalah penting untuk mengerti tentang maksud dari hyperparameter filter tersebut.
-
+&#10230;Layer konvolusi mengandung penyaring yang penting untuk dimengerti tentang maksud dari penyaring hiperparameter tersebut.
 <br>
 
 
@@ -186,14 +185,14 @@
 
 **27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
 
-&#10230;Perlu diperhatikan: aplikasi dari K filter dengan ukuran FxF menhasilkan sebuah keluaran feature map dengan ukuran O×O×K.
+&#10230;Catatan: pengaplikasian dari penyaring F dengan ukuran FxF menghasilkan sebuah keluaran fitur peta dengan ukuran O×O×K.
 
 <br>
 
 
 **28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
 
-&#10230;Stride - Untuk sebuah konvolution atau sebauh operasi pooling, stide S melambangkan jumlah pixel yang dilewati window setelah setiap operasi.
+&#10230;Stride - Untuk sebuah konvolusi atau sebauh operasi pooling, stide S melambangkan jumlah pixel yang dilewati window setelah setiap operasi.
 
 <br>
 
@@ -221,14 +220,14 @@
 
 **32. Tuning hyperparameters**
 
-&#10230;Menyetel hyperparameters
+&#10230;Menyetel hiperparameter
 
 <br>
 
 
 **33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
 
-&#10230;Kompabilitas hyperparameter pada layer konvolusion - Dengan menuliskan I sebagai panjang dari ukuran volume masukan, F sebagai panjang dari filter, P sebagai jumlah zero padding, S sebagai stride, maka ukuran keluaran O dari feature map pada dimensi tersebut dituliskan sebagai:
+&#10230;Kompabilitas parameter pada lapisan konvolusi - Dengan menuliskan I sebagai panjang dari ukuran volume masukan, F sebagai panjang dari filter, P sebagai jumlah dari zero padding, S sebagai stride, maka ukuran keluaran 0 dari feature map pada dimensi tersebut ditandai dengan:
 
 <br>
 
@@ -242,7 +241,7 @@
 
 **35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
 
-&#10230;Perlu diperhatikan: sering, Pstart=Pend≜P, yang mana pada kasus tersebut kita dapat mengganti Pstart+Pend dengan 2P pada formula diatas.
+&#10230;Catatan: sering, Pstart=Pend≜P, pada kasus tersebut kita dapat mengganti Pstart+Pend dengan 2P pada formula di atas.
 
 <br>
 
@@ -263,35 +262,35 @@
 
 **38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
 
-&#10230;[Satu parameter bias untuk setiap filter, Pada banyak kasus, S<F, Sebuah pilihan yang umum untuk K berinali 2C]
+&#10230;[Satu parameter bias per filter, Pada banyak kasus, S>F, sebuah pilihan umum untuk K adalah 2C]
 
 <br>
 
 
 **39. [Pooling operation done channel-wise, In most cases, S=F]**
 
-&#10230;[Operasi pooling dan dilakukan channel-wise, Pada banyak kasus, S=F]
+&#10230;[Operasi pooling yang dilakukan dengan channel-wise, Pada banyak kasus, S=F]
 
 <br>
 
 
 **40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
 
-&#10230;[Input diratakan(menjadi 1D), Satu parameter bias untuk setiap neuron, Jumlah dari neuron FC adalah bebas dari batasan struktural.]
+&#10230;[Masukan diratakan, satu parameter bias untuk setiap neuron, Jumlah dari neuron FC adalah terbebas dari batasan struktural]
 
 <br>
 
 
 **41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
 
-&#10230;Receptive field - Receptive field pada layer k adalah area yang dinotasikan RkxRk dari input yang setiap pixel dari k-th activation map dapat 'melihat'. Dengan menulasikan Fj sebagai ukuran filter dari layer j dan Si sebagai nilai stride pada layer i dan dengan konvensi S0=1, receptive field pada layer K dapat dihitung dengan formula berikut:
+&#10230;Receptive field - Receptive field pada layer k adalah area yang dinotasikan RkxRk dari masukan yang setiap pixel dari k-th activation map dapat "melihat". Dengan menyebut Fj (sebagai) ukuran penyaring dari lapisan j dan Si (sebagai) nilai stride dari lapisan i dan dengan konvensi 50=1, receptive field pada lapisan k dapat dihitung dengan formula:
 
 <br>
 
 
 **42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
 
-&#10230;Pada contoh dibawah ini, kita memiliki F1=f2=3 dan S1=S2=1, yang menghasilkan R2=1+2⋅1+2⋅1=5.
+&#10230;Pada contoh dibawah ini, kita memiliki F1=F2=3 dan S1=S2=1, yang menghasilkan R2=1+2⋅1+2⋅1=5.
 
 <br>
 
@@ -305,7 +304,7 @@
 
 **44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
 
-&#10230;Rectified Linear Unit - Layer rectified linear unit (ReLU) adalah sebuah fungsi aktifasi g yang digunakan pada seluruh elemen. Penggunaan ReLU adalah untuk memasukan non-linearitas ke network. Variasi-variasi dari ReLU dirangkum pada tebel dibawah ini:
+&#10230;Rectified Linear Unit - Layer rectified linear unit (ReLU) adalah sebuat fungsi aktivasi g yang digunakan pada seluruh elemen volume. Unit ini bertujuan untuk menempatkan non-linearitas pada jaringan. Variasi-variasi ReLU ini dirangkum pada tabel di bawah ini:
 
 <br>
 
@@ -319,21 +318,21 @@
 
 **46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
 
-&#10230;[Kompleksitas non-linearitas yang dapat diinterpretasikan secara biologi, Menangani permasalahan dying ReLU yang terjadi untuk nilai negatif, Dapat diturunkan]
+&#10230;[Kompleksitas non-linearitas yang dapat ditafsirkan secara biologi, Menangani permasalahan dying ReLU yang bernilai negatif, Yang dapat dibedakan di mana pun]
 
 <br>
 
 
 **47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
 
-&#10230;Softmax - Langkah softmax dapat dilihat sebagai fungsi logistik yang digeneralisasi yang mengambil masukan sebuah vektor x∈Rn dan mengeluarkan sebuah probabilitas vektor p∈Rn melalui sebuah fungsi softmax pada akhir arsitektur network. Softmax didefinisikan sebagai berikut:
+&#10230;Softmax - Langkah softmax dapat dilihat sebagai sebuah fungsi logistik umum yang berperan sebagai masukan dari nilai skor vektor x∈Rn dan mengualarkan probabilitas produk vektor p∈Rn melalui sebuah fungsi softmax pada akhir dari jaringan arsitektur. Softmax didefinisikan sebagai berikut:
 
 <br>
 
 
 **48. where**
 
-&#10230;Dimana
+&#10230;Di mana
 
 <br>
 
@@ -347,7 +346,7 @@
 
 **50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
 
-&#10230;Tipe-tipe model - Ada tiga tipe utama dari algoritma rekognisi objek, yang mana berbeda pada hal yang diprediksi. Tipe-tipe tersebut dijelaskan pada tabel dibawah ini:
+&#10230;Tipe-tipe model - Ada tiga tipe utama dari algoritma rekognisi objek, yang mana hakikat yang diprediksi tersebut berbeda. Tipe-tipe tersebut dijelaskan pada tabel di bawah ini:
 
 <br>
 
@@ -683,7 +682,7 @@
 
 **98. Original authors**
 
-&#10230;Penulis orisinil
+&#10230;Penulis asli
 
 <br>
 

From dc3d7499976bab9d5991ab1df9ff2cbd9383c267 Mon Sep 17 00:00:00 2001
From: Pham Hong Vinh <phamvinh257@gmail.com>
Date: Sat, 12 Oct 2019 12:55:50 +0700
Subject: [PATCH 401/531] Fix translation suggested by @damminhtien

---
 vi/cs-230-convolutional-neural-networks.md    |  24 +-
 ...30-convolutional-neural-networks.md.backup | 716 ++++++++++++++++++
 2 files changed, 728 insertions(+), 12 deletions(-)
 create mode 100644 vi/cs-230-convolutional-neural-networks.md.backup

diff --git a/vi/cs-230-convolutional-neural-networks.md b/vi/cs-230-convolutional-neural-networks.md
index 3c286c268..356534f3a 100644
--- a/vi/cs-230-convolutional-neural-networks.md
+++ b/vi/cs-230-convolutional-neural-networks.md
@@ -32,14 +32,14 @@
 
 **5. [Filter hyperparameters, Dimensions, Stride, Padding]**
 
-&#10230; [Các tham số cấu hình của bộ lọc, Các chiều, Stride, Padding]
+&#10230; [Các siêu tham số của bộ lọc, Các chiều, Stride, Padding]
 
 <br>
 
 
 **6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
 
-&#10230; [Điều chỉnh các tham số cấu hình, Độ tương thích tham số, Độ phức tạp mô hình, Receptive field]
+&#10230; [Điều chỉnh các siêu tham số, Độ tương thích tham số, Độ phức tạp mô hình, Receptive field]
 
 <br>
 
@@ -53,7 +53,7 @@
 
 **8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
 
-&#10230; [Nhận diện vật thể, Các kiểu mô hình, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]
+&#10230; [Phát hiện vật thể, Các kiểu mô hình, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]
 
 <br>
 
@@ -95,7 +95,7 @@
 
 **14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
 
-&#10230; Tầng tích chập và tầng pooling có thể được hiệu chỉnh theo các tham số cấu hình (hyperparameters) được mô tả ở những phần tiếp theo.
+&#10230; Tầng tích chập và tầng pooling có thể được hiệu chỉnh theo các siêu tham số (hyperparameters) được mô tả ở những phần tiếp theo.
 
 <br>
 
@@ -109,7 +109,7 @@
 
 **16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
 
-&#10230; Tầng tích chập (CONV) ― Tầng tích chập (CONV) sử dụng các bộ lọc để thực hiện phép tích chập khi đưa chúng đi qua đầu vào I theo các chiều của nó. Các tham số cấu hình của các bộ lọc này bao gồm kích thước bộ lọc F và độ trượt (stride) S. Kết quả đầu ra O được gọi là feature map hay activation map.
+&#10230; Tầng tích chập (CONV) ― Tầng tích chập (CONV) sử dụng các bộ lọc để thực hiện phép tích chập khi đưa chúng đi qua đầu vào I theo các chiều của nó. Các siêu tham số của các bộ lọc này bao gồm kích thước bộ lọc F và độ trượt (stride) S. Kết quả đầu ra O được gọi là feature map hay activation map.
 
 <br>
 
@@ -158,14 +158,14 @@
 
 **23. Filter hyperparameters**
 
-&#10230; Các tham số cấu hình của bộ lọc
+&#10230; Các siêu tham số của bộ lọc
 
 <br>
 
 
 **24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
 
-&#10230; Tầng tích chập chứa các bộ lọc mà rất quan trọng cho ta khi biết ý nghĩa đằng sau các tham số cấu hình của chúng.
+&#10230; Tầng tích chập chứa các bộ lọc mà rất quan trọng cho ta khi biết ý nghĩa đằng sau các siêu tham số của chúng.
 
 <br>
 
@@ -221,7 +221,7 @@
 
 **32. Tuning hyperparameters**
 
-&#10230; Điều chỉnh tham số cấu hình
+&#10230; Điều chỉnh siêu tham số
 
 <br>
 
@@ -340,7 +340,7 @@
 
 **49. Object detection**
 
-&#10230; Nhận diện vật thể (Object detection)
+&#10230; Phát hiện vật thể (Object detection)
 
 <br>
 
@@ -354,7 +354,7 @@
 
 **51. [Image classification, Classification w. localization, Detection]**
 
-&#10230; [Phân loại hình ảnh, Phân loại cùng với định vị, Nhận diện]
+&#10230; [Phân loại hình ảnh, Phân loại cùng với khoanh vùng, Phát hiện]
 
 <br>
 
@@ -368,7 +368,7 @@
 
 **53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
 
-&#10230; [Phân loại một tấm ảnh, Dự đoán xác suất của một vật thể, Nhận diện một vật thể trong ảnh, Dự đoán xác suất của vật thể và định vị nó, Nhận diện nhiều vật thể trong cùng một tấm ảnh, Dự đoán xác suất của các vật thể và định vị chúng]
+&#10230; [Phân loại một tấm ảnh, Dự đoán xác suất của một vật thể, Phát hiện một vật thể trong ảnh, Dự đoán xác suất của vật thể và định vị nó, Phát hiện nhiều vật thể trong cùng một tấm ảnh, Dự đoán xác suất của các vật thể và định vị chúng]
 
 <br>
 
@@ -382,7 +382,7 @@
 
 **55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
 
-&#10230; Detection ― Trong bối cảnh nhận diện vật thể, những phương pháp khác nhau được áp dụng tùy thuộc vào liệu chúng ta chỉ muốn định vị vật thể hay nhận diện được những hình dạng phức tạp hơn trong tấm ảnh. Hai phương pháp chính được tổng hợp ở bảng dưới: 
+&#10230; Detection ― Trong bối cảnh phát hiện vật thể, những phương pháp khác nhau được áp dụng tùy thuộc vào liệu chúng ta chỉ muốn định vị vật thể hay phát hiện được những hình dạng phức tạp hơn trong tấm ảnh. Hai phương pháp chính được tổng hợp ở bảng dưới: 
 
 <br>
 
diff --git a/vi/cs-230-convolutional-neural-networks.md.backup b/vi/cs-230-convolutional-neural-networks.md.backup
new file mode 100644
index 000000000..356534f3a
--- /dev/null
+++ b/vi/cs-230-convolutional-neural-networks.md.backup
@@ -0,0 +1,716 @@
+**Convolutional Neural Networks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks)
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230;Convolutional Neural Networks cheatsheet
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Deep Learning
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+&#10230; [Tổng quan, Kết cấu kiến trúc]
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230; [Các kiểu tầng (layer), Tích chập, Pooling, Kết nối đầy đủ]
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230; [Các siêu tham số của bộ lọc, Các chiều, Stride, Padding]
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230; [Điều chỉnh các siêu tham số, Độ tương thích tham số, Độ phức tạp mô hình, Receptive field]
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230; [Các hàm kích hoạt, Rectified Linear Unit, Softmax]
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230; [Phát hiện vật thể, Các kiểu mô hình, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230; [Nhận diện/ xác nhận gương mặt, One shot learning, Siamese network, Triplet loss]
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230; [Neural style transfer, Activation, Style matrix, Style/content cost function]
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230; [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]
+
+<br>
+
+
+**12. Overview**
+
+&#10230; Tổng quan
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230; Kiến trúc truyền thống của một mạng CNN ― Mạng neural tích chập (Convolutional neural networks), còn được biết đến với tên CNNs, là một dạng mạng neural được cấu thành bởi các tầng sau:
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230; Tầng tích chập và tầng pooling có thể được hiệu chỉnh theo các siêu tham số (hyperparameters) được mô tả ở những phần tiếp theo.
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230; Các kiểu tầng
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230; Tầng tích chập (CONV) ― Tầng tích chập (CONV) sử dụng các bộ lọc để thực hiện phép tích chập khi đưa chúng đi qua đầu vào I theo các chiều của nó. Các siêu tham số của các bộ lọc này bao gồm kích thước bộ lọc F và độ trượt (stride) S. Kết quả đầu ra O được gọi là feature map hay activation map.
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230; Lưu ý: Bước tích chập cũng có thể được khái quát hóa cả với trường hợp một chiều (1D) và ba chiều (3D).
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230; Pooling (POOL) ― Tầng pooling (POOL) là một phép downsampling, thường được sử dụng sau tầng tích chập, giúp tăng tính bất biến không gian. Cụ thể, max pooling và average pooling là những dạng pooling đặc biệt, mà tương ứng là trong đó giá trị lớn nhất và giá trị trung bình được lấy ra.
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230; [Kiểu, Chức năng, Minh họa, Nhận xét]
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230; [Max pooling, Average pooling, Từng phép pooling chọn giá trị lớn nhất trong khu vực mà nó đang được áp dụng, Từng phép pooling tính trung bình các giá trị trong khu vực mà nó đang được áp dụng]
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230; [Bảo toàn các đặc trưng đã phát hiện, Được sử dụng thường xuyên, Giảm kích thước feature map, Được sử dụng trong mạng LeNet]
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230;  Fully Connected (FC) ― Tầng kết nối đầy đủ (FC) nhận đầu vào là các dữ liệu đã được làm phẳng, mà mỗi đầu vào đó được kết nối đến tất cả neuron. Trong mô hình mạng CNNs, các tầng kết nối đầy đủ thường được tìm thấy ở cuối mạng và được dùng để tối ưu hóa mục tiêu của mạng ví dụ như độ chính xác của lớp (class).
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230; Các siêu tham số của bộ lọc
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230; Tầng tích chập chứa các bộ lọc mà rất quan trọng cho ta khi biết ý nghĩa đằng sau các siêu tham số của chúng.
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230; Các chiều của một bộ lọc ― Một bộ lọc kích thước F×F áp dụng lên đầu vào chứa C kênh (channels) thì có kích thước tổng kể là F×F×C thực hiện phép tích chập trên đầu vào kích thước I×I×C và cho ra một  feature map (hay còn gọi là activation map) có kích thước O×O×1.
+
+<br>
+
+
+**26. Filter**
+
+&#10230; Bộ lọc
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230; Lưu ý: Việc áp dụng K bộ lọc có kích thước F×F cho ra một feature map có kích thước O×O×K.
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230; Stride ― Đối với phép tích chập hoặc phép pooling, độ trượt S ký hiệu số pixel mà cửa sổ sẽ di chuyển sau mỗi lần thực hiện phép tính.
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230;  Zero-padding ― Zero-padding là tên gọi của quá trình thêm P số không vào các biên của đầu vào. Giá trị này có thể được lựa chọn thủ công hoặc một cách tự động bằng một trong ba những phương pháp mô tả bên dưới:
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230; [Phương pháp, Giá trị, Mục đích, Valid, Same, Full]
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230; [Không sử dụng padding, Bỏ phép tích chập cuối nếu số chiều không khớp, Sử dụng padding để làm cho feature map có kích  thước ⌈IS⌉, Kích thước đầu ra thuận lợi về mặt toán học, Còn được gọi là 'half' padding, Padding tối đa sao cho các phép tích chập có thể được sử dụng tại các rìa của đầu vào, Bộ lọc 'thấy' được đầu vào từ đầu đến cuối]
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230; Điều chỉnh siêu tham số
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230; Tính tương thích của tham số trong tầng tích chập ― Bằng cách ký hiệu I là độ dài kích thước đầu vào, F là độ dài của bộ lọc, P là số lượng zero padding, S là độ trượt, ta có thể tính được độ dài O của feature map theo một chiều bằng công thức:
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230; [Đầu vào, Bộ lọc, Đầu ra]
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230; Lưu ý: Trong một số trường hợp, Pstart=Pend≜P, ta có thể thay thế Pstart+Pend bằng 2P trong công thức trên.
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230; Hiểu về độ phức tạp của mô hình ― Để đánh giá độ phức tạp của một mô hình, cách hữu hiệu là xác định số tham số mà mô hình đó sẽ có. Trong một tầng của mạng neural tích chập, nó sẽ được tính toán như sau:
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230; [Minh họa, Kích thước đầu vào, Kích thước đầu ra, Số lượng tham số, Lưu ý]
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230; [Một tham số bias với mỗi bộ lọc, Trong đa số trường hợp, S<F, Một lựa chọn phổ biến cho K là 2C]
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230; [Phép pooling được áp dụng lên từng kênh (channel-wise), Trong đa số trường hợp, S=F]
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230; [Đầu vào được làm phẳng, Mỗi neuron có một tham số bias, Số neuron trong một tầng FC phụ thuộc vào ràng buộc kết cấu]
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230;
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230;
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230; Các hàm kích hoạt thường gặp
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230; Rectified Linear Unit ― Tầng rectified linear unit (ReLU) là một hàm kích hoạt g  được sử dụng trên tất cả các thành phần. Mục đích của nó là tăng tính phi tuyến tính cho mạng. Những biến thể khác của ReLU được tổng hợp ở bảng dưới:
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230; [ReLU, Leaky ReLU, ELU, with]
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230; [Độ phức tạp phi tuyến tính có thể thông dịch được về mặt sinh học, Gán vấn đề ReLU chết cho những giá trị âm, Khả vi tại mọi nơi]
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230; Softmax ― Bước softmax có thể được coi là một hàm logistic tổng quát lấy đầu vào là một vector chứa các giá trị x∈Rn và cho ra là một vector gồm các xác suất p∈Rn thông qua một hàm softmax ở cuối kiến trúc. Nó được định nghĩa như sau:
+
+<br>
+
+
+**48. where**
+
+&#10230; với
+
+<br>
+
+
+**49. Object detection**
+
+&#10230; Phát hiện vật thể (Object detection)
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230; Các kiểu mô hình ― Có 3 kiểu thuật toán nhận diện vật thể chính, vì thế mà bản chất của thứ được dự đoán sẽ khác nhau. Chúng được miêu tả ở bảng dưới:
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230; [Phân loại hình ảnh, Phân loại cùng với khoanh vùng, Phát hiện]
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230; [Gấu bông, Sách]
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230; [Phân loại một tấm ảnh, Dự đoán xác suất của một vật thể, Phát hiện một vật thể trong ảnh, Dự đoán xác suất của vật thể và định vị nó, Phát hiện nhiều vật thể trong cùng một tấm ảnh, Dự đoán xác suất của các vật thể và định vị chúng]
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230; [CNN cổ điển, YOLO đơn giản hóa, R-CNN, YOLO, R-CNN]
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230; Detection ― Trong bối cảnh phát hiện vật thể, những phương pháp khác nhau được áp dụng tùy thuộc vào liệu chúng ta chỉ muốn định vị vật thể hay phát hiện được những hình dạng phức tạp hơn trong tấm ảnh. Hai phương pháp chính được tổng hợp ở bảng dưới: 
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230;
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230;
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230;
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230;
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230;
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230;
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230;
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230;
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230;
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230;
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230;
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230;
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230;
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230;
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230;
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230;
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230;
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230;
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230;
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230;
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230;
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230;
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230;
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230;
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230;
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230;
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230;
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230;
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230;
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230;
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230;
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230;
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230;
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230;
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230;
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230;
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230;
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230;
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+
+**98. Original authors**
+
+&#10230;
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230;
+
+<br>

From 1d2e33f050d4c7b8041d9b5fa1665b7effb709af Mon Sep 17 00:00:00 2001
From: Pham Hong Vinh <phamvinh257@gmail.com>
Date: Sat, 12 Oct 2019 15:23:44 +0700
Subject: [PATCH 402/531] Translated to line 75

---
 vi/cs-230-convolutional-neural-networks.md | 40 +++++++++++-----------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/vi/cs-230-convolutional-neural-networks.md b/vi/cs-230-convolutional-neural-networks.md
index 356534f3a..1a79296c2 100644
--- a/vi/cs-230-convolutional-neural-networks.md
+++ b/vi/cs-230-convolutional-neural-networks.md
@@ -389,140 +389,140 @@
 
 **56. [Bounding box detection, Landmark detection]**
 
-&#10230;
+&#10230; [Phát hiện hộp giới hạn (bounding box), Phát hiện landmark]
 
 <br>
 
 
 **57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
 
-&#10230;
+&#10230; [Phát hiện phần trong ảnh mà có sự xuất hiện của vật thể, Phát hiện hình dạng và đặc điểm của một đối tượng (vd: mắt), Nhiều hạt]
 
 <br>
 
 
 **58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
 
-&#10230;
+&#10230; [Hộp có tọa độ trung tâm (bx, by), chiều cao bh và chiều rộng bw, Các điểm tương quan (l1x,l1y), ..., (lnx,lny)]
 
 <br>
 
 
 **59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
 
-&#10230;
+&#10230; Intersection over Union ― Tỉ lệ vùng giao trên vùng hợp, còn được biết đến là IoU, là một hàm định lượng vị trí Bp của hộp giới hạn dự đoán được định vị đúng như thế nào so với hộp giới hạn thực tế Ba. Nó được định nghĩa:
 
 <br>
 
 
 **60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
 
-&#10230;
+&#10230; Lưu ý: ta luôn có IoU∈[0,1]. Để thuận tiện, một hộp giới hạn Bp được cho là khá tốt nếu IoU(Bp,Ba)⩾0.5.
 
 <br>
 
 
 **61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
 
-&#10230;
+&#10230; Anchor boxes ― Hộp mỏ neo là một kỹ thuật được dùng để dự đoán những hộp giới hạn nằm chồng lên nhau. Trong thực nghiệm, mạng được phép dự đoán nhiều hơn một hộp cùng một lúc, trong đó mỗi dự đoán được giới hạn theo một tập những tính chất hình học cho trước. Ví dụ, dự đoán đầu tiên có khả năng là một hộp hình chữ nhật có hình dạng cho trước, trong khi dự đoán thứ hai sẽ là một hộp hình chữ nhật nữa với hình dạng hình học khác.
 
 <br>
 
 
 **62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
 
-&#10230;
+&#10230; Non-max suppression ― Kỹ thuật non-max suppression hướng tới việc loại bỏ những hộp giới hạn bị trùng chồng lên nhau của cùng một đối tượng bằng cách chọn chiếc hộp có tính đặc trưng nhất. Sau khi loại bỏ tất cả các hộp có xác suất dự đoán nhỏ hơn 0.6, những bước tiếp theo được lặp lại khi vẫn còn tồn tại những hộp khác.
 
 <br>
 
 
 **63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
 
-&#10230;
+&#10230; [Với một lớp cho trước, Bước 1: Chọn chiếc hộp có xác suất dự đoán lớn nhất., Bước 2: Loại bỏ những hộp có IoU⩾0.5 với hộp đã chọn.]
 
 <br>
 
 
 **64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
 
-&#10230;
+&#10230; [Các dự đoán hộp, Chọn hộp với xác suất cao nhất, Loại bỏ trùng lặp trong cùng một lớp, Các hộp giới hạn cuối cùng]
 
 <br>
 
 
 **65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
 
-&#10230;
+&#10230; YOLO ― You Only Look Once (YOLO) là một thuật toán phát hiện vật thể thực hiện những bước sau:
 
 <br>
 
 
 **66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
 
-&#10230;
+&#10230; [Bước 1: Phân chia tấm ảnh đầu vào thành một lưới G×G., Bước 2: Với mỗi lưới, chạy một mạng CNN dự đoán y có dạng sau:, lặp lại k lần]
 
 <br>
 
 
 **67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
 
-&#10230;
+&#10230; với pc là xác suất dự đoán được một vật thể, bx,by,bh,bw là những thuộc tính của hộp giới hạn được dự đoán, c1,...,cp là biểu diễn one-hot của việc lớp nào trong p các lớp được dự đoán, và k là số lượng các hộp mỏ neo.
 
 <br>
 
 
 **68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
 
-&#10230;
+&#10230; Bước 3: Chạy thuật toán non-max suppression để loại bỏ bất kỳ hộp giới hạn có khả năng bị trùng lặp.
 
 <br>
 
 
 **69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
 
-&#10230;
+&#10230; [Ảnh gốc, Phân chia thành lưới GxG, Dự đoán hộp giới hạn, Non-max suppression]
 
 <br>
 
 
 **70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
 
-&#10230;
+&#10230; Lưu ý: khi pc=0, thì mạng không phát hiện bất kỳ vật thể nào. Trong trường hợp đó, Các dự đoán liên quan bx,...,cp sẽ bị lờ đi.
 
 <br>
 
 
 **71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
 
-&#10230;
+&#10230; R-CNN ― Region with Convolutional Neural Networks (R-CNN) là một thuật toán phát hiện vật thể mà đầu tiên phân chia ảnh thành các vùng để tìm các hộp giới hạn có khả năng liên quan cao rồi chạy một thuật toán phát hiện để tìm những thứ có khả năng cao là vật thể trong những hộp giới hạn đó. 
 
 <br>
 
 
 **72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
 
-&#10230;
+&#10230; [Ảnh gốc, Phân vùng, Dự đoán hộp giới hạn, Non-max suppression]
 
 <br>
 
 
 **73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
 
-&#10230;
+&#10230; Lưu ý: mặc dù thuật toán gốc có chi phí tính toán cao và chậm, những kiến trúc mới đã có thể cho phép thuật toán này chạy nhanh hơn, như là Fast R-CNN và Faster R-CNN.
 
 <br>
 
 
 **74. Face verification and recognition**
 
-&#10230;
+&#10230; Xác nhận khuôn mặt và nhận diện khuôn mặt
 
 <br>
 
 
 **75. Types of models ― Two main types of model are summed up in table below:**
 
-&#10230;
+&#10230; Các kiểu mô hình ― Hai kiểu mô hình chính được tổng hợp trong bảng dưới:
 
 <br>
 

From 3d97b1f64cb5aadb81487deb4e295a11c03cede9 Mon Sep 17 00:00:00 2001
From: kevingo <kevingo75@gmail.com>
Date: Sat, 20 Apr 2019 23:21:04 +0800
Subject: [PATCH 403/531] [zh-tw] Add machine learning tips and tricks zh-tw
 translation

---
 CONTRIBUTORS                                  |   4 +
 ...tsheet-machine-learning-tips-and-tricks.md | 257 ++++++++++++++++++
 2 files changed, 261 insertions(+)
 create mode 100644 zh-tw/cheatsheet-machine-learning-tips-and-tricks.md

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 8071830aa..63fd92e19 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -164,3 +164,7 @@
   kevingo (translation of unsupervised learning)
   imironhead (review of unsupervised learning)
   johnnychhsu (review of unsupervised learning)
+
+  kevingo (translation of machine learning tips and tricks)
+  kentropy (review of machine learning tips and tricks)
+
diff --git a/zh-tw/cheatsheet-machine-learning-tips-and-tricks.md b/zh-tw/cheatsheet-machine-learning-tips-and-tricks.md
new file mode 100644
index 000000000..b7a5db1c0
--- /dev/null
+++ b/zh-tw/cheatsheet-machine-learning-tips-and-tricks.md
@@ -0,0 +1,257 @@
+1. **Machine Learning tips and tricks cheatsheet**
+
+&#10230;
+機器學習秘訣和技巧參考手冊
+<br>
+
+2. **Classification metrics**
+
+&#10230;
+分類器的評估指標
+<br>
+
+3. **In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
+
+&#10230;
+在二元分類的問題上，底下是主要用來衡量模型表現的指標
+<br>
+
+4. **Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
+
+&#10230;
+混淆矩陣 - 混淆矩陣是用來衡量模型整體表現的指標
+<br>
+
+5. **[Predicted class, Actual class]**
+
+&#10230;
+[預測類別, 真實類別]
+<br>
+
+6. **Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
+
+&#10230;
+主要的衡量指標 - 底下的指標經常用在評估分類模型的表現
+<br>
+
+7. **[Metric, Formula, Interpretation]**
+
+&#10230;
+[指標, 公式, 解釋]
+<br>
+
+8. **Overall performance of model**
+
+&#10230;
+模型的整體表現
+<br>
+
+9. **How accurate the positive predictions are**
+
+&#10230;
+預測的類別有多精準的比例
+<br>
+
+10. **Coverage of actual positive sample**
+
+&#10230;
+實際正的樣本的覆蓋率有多少
+<br>
+
+11. **Coverage of actual negative sample**
+
+&#10230;
+實際負的樣本的覆蓋率
+<br>
+
+12. **Hybrid metric useful for unbalanced classes**
+
+&#10230;
+對於非平衡類別相當有用的混合指標
+<br>
+
+13. **ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
+
+&#10230;
+ROC - 接收者操作特徵曲線 (ROC Curve)，又被稱為 ROC，是透過改變閥值來表示 TPR 和 FPR 之間關係的圖形。這些指標總結如下：
+<br>
+
+14. **[Metric, Formula, Equivalent]**
+
+&#10230;
+[衡量指標, 公式, 等同於]
+<br>
+
+15. **AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
+
+&#10230;
+AUC - 在接收者操作特徵曲線 (ROC) 底下的面積，也稱為 AUC 或 AUROC：
+<br>
+
+16. **[Actual, Predicted]**
+
+&#10230;
+[實際值, 預測值]
+<br>
+
+17. **Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
+
+&#10230;
+基本的指標 - 給定一個迴歸模型 f，底下是經常用來評估此模型的指標：
+<br>
+
+18. **[Total sum of squares, Explained sum of squares, Residual sum of squares]**
+
+&#10230;
+[總平方和, 被解釋平方和, 殘差平方和]
+<br>
+
+19. **Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
+
+&#10230;
+決定係數 - 決定係數又被稱為 R2 or r2，它提供了模型是否具備復現觀測結果的能力。定義如下：
+<br>
+
+20. **Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
+
+&#10230;
+主要的衡量指標 - 藉由考量變數 n 的數量，我們經常用使用底下的指標來衡量迴歸模型的表現：
+<br>
+
+21. **where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
+
+&#10230;
+當中，L 代表的是概似估計，ˆσ2 則是變異數的估計
+<br>
+
+22. **Model selection**
+
+&#10230;
+模型選擇
+<br>
+
+23. **Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230;
+詞彙 - 當進行模型選擇時，我們會針對資料進行以下區分：
+<br>
+
+24. **[Training set, Validation set, Testing set]**
+
+&#10230;
+[訓練資料集, 驗證資料集, 測試資料集]
+<br>
+
+25. **[Model is trained, Model is assessed, Model gives predictions]**
+
+&#10230;
+[用來訓練模型, 用來評估模型, 模型用來預測用的資料集]
+<br>
+
+26. **[Usually 80% of the dataset, Usually 20% of the dataset]**
+
+&#10230;
+[通常是 80% 的資料集, 通常是 20% 的資料集]
+<br>
+
+27. **[Also called hold-out or development set, Unseen data]**
+
+&#10230;
+[又被稱為 hold-out 資料集或開發資料集, 模型沒看過的資料集]
+<br>
+
+28. **Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+&#10230;
+當模型被選擇後，就會使用整個資料集來做訓練，並且在沒看過的資料集上做測試。你可以參考以下的圖表：
+<br>
+
+29. **Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
+
+&#10230;
+交叉驗證 - 交叉驗證，又稱之為 CV，它是一種不特別依賴初始訓練集來挑選模型的方法。幾種不同的方法如下：
+<br>
+
+30. **[Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
+
+&#10230;
+[把資料分成 k 份，利用 k-1 份資料來訓練，剩下的一份用來評估模型效能, 在 n-p 份資料上進行訓練，剩下的  p 份資料用來評估模型效能]
+<br>
+
+31. **[Generally k=5 or 10, Case p=1 is called leave-one-out]**
+
+&#10230;
+[一般來說 k=5 或 10, 當 p=1 時，又稱為 leave-one-out]
+<br>
+
+32. **The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
+
+&#10230;
+最常用到的方法叫做 k-fold 交叉驗證。它將訓練資料切成 k 份，在 k-1 份資料上進行訓練，而剩下的一份用來評估模型的效能，這樣的流程會重複 k 次次。最後計算出來的模型損失是 k 次結果的平均，又稱為交叉驗證損失值。
+<br>
+
+33. **Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230;
+正規化 - 正歸化的目的是為了避免模型對於訓練資料過擬合，進而導致高方差。底下的表格整理了常見的正規化技巧：
+<br>
+
+34. **[Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230;
+[將係數縮減為 0, 有利變數的選擇, 將係數變得更小, 在變數的選擇和小係數之間作權衡]
+<br>
+
+35. **Diagnostics**
+
+&#10230;
+診斷
+<br>
+
+36. **Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
+
+&#10230;
+偏差 - 模型的偏差指的是模型預測值與實際值之間的差異
+<br>
+
+37. **Variance ― The variance of a model is the variability of the model prediction for given data points.**
+
+&#10230;
+變異 - 變異指的是模型在預測資料時的變異程度
+<br>
+
+38. **Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
+
+&#10230;
+偏差/變異的權衡 - 越簡單的模型，偏差就越大。而越複雜的模型，變異就越大
+<br>
+
+39. **[Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
+
+&#10230;
+[現象, 迴歸圖示, 分類圖示, 深度學習圖示, 可能的解法]
+<br>
+
+40. **[High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
+
+&#10230;
+[訓練錯誤較高, 訓練錯誤和測試錯誤接近, 高偏差, 訓練誤差會稍微比測試誤差低, 訓練誤差很低, 訓練誤差比測試誤差低很多, 高變異]
+<br>
+
+41. **[Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
+
+&#10230;
+[使用較複雜的模型, 增加更多特徵, 訓練更久, 採用正規化化的方法, 取得更多資料]
+<br>
+
+42. **Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
+
+&#10230;
+誤差分析 - 誤差分析指的是分析目前使用的模型和最佳模型之間差距的根本原因
+<br>
+
+43. **Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
+
+&#10230;
+銷蝕分析 (Ablative analysis) - 銷蝕分析指的是分析目前模型和基準模型之間差異的根本原因
+<br>

From 890bb8c7f73a0ce89a711f8ce0be9dc3fa546922 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 12 Oct 2019 11:33:47 -0700
Subject: [PATCH 404/531] Rename cheatsheet-machine-learning-tips-and-tricks.md
 to cs-229-machine-learning-tips-and-tricks.md

---
 ...s-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename zh-tw/{cheatsheet-machine-learning-tips-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} (100%)

diff --git a/zh-tw/cheatsheet-machine-learning-tips-and-tricks.md b/zh-tw/cs-229-machine-learning-tips-and-tricks.md
similarity index 100%
rename from zh-tw/cheatsheet-machine-learning-tips-and-tricks.md
rename to zh-tw/cs-229-machine-learning-tips-and-tricks.md

From 21422186906d9ebd011f4460b7b254f955486fec Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 12 Oct 2019 11:37:42 -0700
Subject: [PATCH 405/531] Update zh-tw progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index b68d440c1..ebd1f4cec 100644
--- a/README.md
+++ b/README.md
@@ -75,7 +75,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
 |**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/162)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/177)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
 |**简体中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
-|**繁體中文**|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/137)|done|done|
+|**繁體中文**|done|done|done|done|done|done|
 
 ### CS 230 (Deep Learning)
 | |[Convolutional Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-convolutional-neural-networks.md)|[Recurrent Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-recurrent-neural-networks.md)|[Deep Learning tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-deep-learning-tips-and-tricks.md)|

From 77aba952c133fcd76e59813d3341dace7eebec81 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 12 Oct 2019 11:50:38 -0700
Subject: [PATCH 406/531] Rename convolutional-neural-networks.md to
 cs-230-convolutional-neural-networks.md

---
 ...neural-networks.md => cs-230-convolutional-neural-networks.md} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename id/{convolutional-neural-networks.md => cs-230-convolutional-neural-networks.md} (100%)

diff --git a/id/convolutional-neural-networks.md b/id/cs-230-convolutional-neural-networks.md
similarity index 100%
rename from id/convolutional-neural-networks.md
rename to id/cs-230-convolutional-neural-networks.md

From 140fb0e985b9d886eaaa88d47f9c82d252590a25 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 12 Oct 2019 11:54:23 -0700
Subject: [PATCH 407/531] Add [id] contributors

---
 CONTRIBUTORS | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index aacc57a73..db83fa3d0 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -81,6 +81,10 @@
 
 --hi
 
+--id
+  Prasetia Utama Putra (translation of convolutional neural networks)
+  Gunawan Tri (review of convolutional neural networks)
+
 --ko
   Wooil Jeong (translation of machine learning tips and tricks)
   

From f4b9617ea32a711c8ddf7ce6d63b3b99348f3443 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 12 Oct 2019 11:55:51 -0700
Subject: [PATCH 408/531] Update [id] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index ebd1f4cec..379eb314e 100644
--- a/README.md
+++ b/README.md
@@ -90,7 +90,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**עִבְרִית**|not started|not started|not started|
 |**हिन्दी**|not started|not started|not started|
 |**Magyar**|not started|not started|not started|
-|**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/155)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
+|**Bahasa Indonesia**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
 |**Italiano**|not started|not started|not started|
 |**日本語**|done|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|

From f0948d708018811e1504f573b5a28a6dd7323f12 Mon Sep 17 00:00:00 2001
From: qunaieer <fsaq2000@gmail.com>
Date: Tue, 15 Oct 2019 21:43:47 +0300
Subject: [PATCH 409/531] Update cheatsheet-machine-learning-tips-and-tricks.md

Finished translation of the file: cheatsheet-machine-learning-tips-and-tricks.md
---
 ...tsheet-machine-learning-tips-and-tricks.md | 97 +++++++++----------
 1 file changed, 48 insertions(+), 49 deletions(-)

diff --git a/ar/cheatsheet-machine-learning-tips-and-tricks.md b/ar/cheatsheet-machine-learning-tips-and-tricks.md
index 9712297b8..b78162b6e 100644
--- a/ar/cheatsheet-machine-learning-tips-and-tricks.md
+++ b/ar/cheatsheet-machine-learning-tips-and-tricks.md
@@ -1,285 +1,284 @@
 **1. Machine Learning tips and tricks cheatsheet**
 
-&#10230;
+مرجع سريع لنصائح وحيل تعلّم الآلة
 
 <br>
 
 **2. Classification metrics**
 
-&#10230;
+مقاييس التصنيف
 
 <br>
 
 **3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
 
-&#10230;
+في سياق التصنيف الثنائي، هذه المقاييس (metrics) المهمة التي يجدر مراقبتها من أجل تقييم آداء النموذج.
 
 <br>
 
 **4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
 
-&#10230;
+مصفوفة الدقّة (confusion matrix) - تستخدم مصفوفة الدقّة لأخذ تصور شامل عند تقييم أداء النموذج. وهي تعرّف كالتالي: 
 
 <br>
 
 **5. [Predicted class, Actual class]**
 
-&#10230;
+[التصنيف المتوقع، التصنيف الفعلي]
 
 <br>
 
 **6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
 
-&#10230;
+المقاييس الأساسية - المقاييس التالية تستخدم في العادة لتقييم أداء نماذج التصنيف:
 
 <br>
 
 **7. [Metric, Formula, Interpretation]**
 
-&#10230;
+[المقياس، المعادلة، التفسير]
 
 <br>
 
 **8. Overall performance of model**
 
-&#10230;
+الأداء العام للنموذج
 
 <br>
 
 **9. How accurate the positive predictions are**
 
-&#10230;
+دقّة التوقعات الإيجابية (positive)
 
 <br>
 
 **10. Coverage of actual positive sample**
 
-&#10230;
+تغطية عينات التوقعات الإيجابية الفعلية
 
 <br>
 
 **11. Coverage of actual negative sample**
 
-&#10230;
+تغطية عينات التوقعات السلبية الفعلية
 
 <br>
 
 **12. Hybrid metric useful for unbalanced classes**
 
-&#10230;
+مقياس هجين مفيد للأصناف غير المتوازنة (unbalanced)
 
 <br>
 
 **13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
 
-&#10230;
-
+منحنى دقّة الأداء (ROC) - منحنى دقّة الآداء، ويطلق عليه ROC، هو رسمة لمعدل التصنيفات الإيجابية الصحيحة (TPR) مقابل معدل التصنيفات الإيجابية الخاطئة (FPR) باستخدام قيم حد (threshold) متغيرة. هذه المقاييس ملخصة في الجدول التالي:
 <br>
 
 **14. [Metric, Formula, Equivalent]**
 
-&#10230;
+[المقياس، المعادلة، مرادف]
 
 <br>
 
 **15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
 
-&#10230;
+المساحة تحت منحنى دقة الأداء (المساحة تحت المنحنى) (AUC) - المساحة تحت منحنى دقة الأداء (المساحة تحت المنحنى)، ويطلق عليها  AUC أو AUROC، هي المساحة تحت ROC كما هو موضح في الرسمة التالية:
 
 <br>
 
 **16. [Actual, Predicted]**
 
-&#10230;
+[الفعلي، المتوقع]
 
 <br>
 
 **17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
 
-&#10230;
+المقاييس الأساسية - إذا كان لدينا نموذج الانحدار f، فإن المقاييس التالية غالباً ما تستخدم لتقييم أداء النموذج:
 
 <br>
 
 **18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
 
-&#10230;
+[المجموع الكلي للمربعات، مجموع المربعات المُفسَّر، مجموع المربعات المتبقي]
 
 <br>
 
 **19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
 
-&#10230;
+مُعامل التحديد (Coefficient of determination) - مُعامل التحديد، وغالباً يرمز له بـ R2 أو r2، يعطي قياس لمدى مطابقة النموذج للنتائج الملحوظة، ويعرف كما يلي:
 
 <br>
 
 **20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
 
-&#10230;
+المقاييس الرئيسية - المقاييس التالية تستخدم غالباً لتقييم أداء نماذج الانحدار، وذلك بأن يتم الأخذ في الحسبان عدد المتغيرات n المستخدمة فيها:
 
 <br>
 
 **21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
 
-&#10230;
+حيث L هو الأرجحية، و ˆσ2 تقدير التباين الخاص بكل نتيجة.
 
 <br>
 
 **22. Model selection**
 
-&#10230;
+اختيار النموذج
 
 <br>
 
 **23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
 
-&#10230;
+مفردات - عند اختيار النموذج، نفرق بين 3 أجزاء من البيانات التي لدينا كالتالي:
 
 <br>
 
 **24. [Training set, Validation set, Testing set]**
 
-&#10230;
+[مجموعة تدريب، مجموعة تحقق، مجموعة اختبار]
 
 <br>
 
 **25. [Model is trained, Model is assessed, Model gives predictions]**
 
-&#10230;
+[يتم تدريب النموذج، يتم تقييم النموذج، النموذج يعطي التوقعات]
 
 <br>
 
 **26. [Usually 80% of the dataset, Usually 20% of the dataset]**
 
-&#10230;
+[غالباً 80% من مجموعة البيانات، غالباً 20% من مجموعة البيانات]
 
 <br>
 
 **27. [Also called hold-out or development set, Unseen data]**
 
-&#10230;
+[يطلق عليها كذلك المجموعة المُجنّبة أو مجموعة التطوير، بيانات لم يسبق رؤيتها من قبل]
 
 <br>
 
 **28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
 
-&#10230;
+بمجرد اختيار النموذج، يتم تدريبه على مجموعة البيانات بالكامل ثم يتم اختباره على مجموعة اختبار لم يسبق رؤيتها من قبل. كما هو موضح في الشكل التالي:
 
 <br>
 
 **29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
 
-&#10230;
+التحقق المتقاطع (Cross-validation) - التحقق المتقاطع، وكذلك يختصر بـ CV، هو طريقة تستخدم لاختيار نموذج بحيث لا يعتمد بشكل كبير على مجموعة بيانات التدريب المبدأية. أنواع التحقق المتقاطع المختلفة ملخصة في الجدول التالي:
 
 <br>
 
 **30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
 
-&#10230;
+[التدريب على k-1 جزء والتقييم باستخدام الجزء الباقي، التدريب على n−p عينة والتقييم باستخدام الـ p عينات المتبقية]
 
 <br>
 
 **31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
 
-&#10230;
+[بشكل غالب k=5 أو 10، الحالة p=1 يطلق عليها الإبقاء على واحد (leave-one-out)]
 
 <br>
 
 **32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
 
-&#10230;
+الطريقة الأكثر استخداماً يطلق عليها التحقق المتقاطع س جزء/أجزاء (k-fold)، ويتم فيها تقسيم البيانات إلى k جزء، بحيث يتم تدريب النموذج باستخدام k−1 والتحقق باستخدام الجزء المتبقي، ويتم تكرار ذلك k مرة. يتم بعد ذلك حساب معدل الأخطاء في الأجزاء k ويسمى خطأ التحقق المتقاطع.
 
 <br>
 
 **33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
 
-&#10230;
+ضبط (Regularization) - عمليه الضبط تهدف إلى تفادي فرط التخصيص (overfit) للنموذج، وهو بذلك يتعامل مع مشاكل التباين العالي. الجدول التالي يلخص أنواع وطرق الضبط الأكثر استخداماً:
 
 <br>
 
 **34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
-&#10230;
+[يقلص المُعاملات إلى 0، جيد لاختيار المتغيرات، يجعل المُعاملات أصغر، المفاضلة بين اختيار المتغيرات والمُعاملات الصغيرة]
 
 <br>
 
 **35. Diagnostics**
 
-&#10230;
+التشخيصات
 
 <br>
 
 **36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
 
-&#10230;
+الانحياز (Bias) - الانحياز للنموذج هو الفرق بين التنبؤ المتوقع والنموذج الحقيقي الذي نحاول تنبؤه للبيانات المعطاة.
 
 <br>
 
 **37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
 
-&#10230;
+التباين (Variance) - تباين النموذج هو مقدار التغير في تنبؤ النموذج لنقاط البيانات المعطاة.
 
 <br>
 
 **38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
 
-&#10230;
+موازنة الانحياز/التباين (Bias/variance tradeoff) - كلما زادت بساطة النموذج، زاد الانحياز، وكلما زاد تعقيد النموذج، زاد التباين.
 
 <br>
 
 **39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
 
-&#10230;
+[الأعراض، توضيح الانحدار، توضيح التصنيف، توضيح التعلم العميق، العلاجات الممكنة]
 
 <br>
 
 **40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
 
-&#10230;
+[خطأ التدريب عالي، خطأ التدريب قريب من خطأ الاختبار، انحياز عالي، خطأ التدريب أقل بقليل من خطأ الاختبار، خطأ التدريب منخفض جداً، خطأ التدريب أقل بكثير من خطأ الاختبار، تباين عالي]
 
 <br>
 
 **41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
 
-&#10230;
+[زيادة تعقيد النموذج، إضافة المزيد من الخصائص، تدريب لمدة أطول، إجراء الضبط (regularization)، الحصول على المزيد من البيانات]
 
 <br>
 
 **42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
 
-&#10230;
+تحليل الخطأ - تحليل الخطأ هو تحليل السبب الرئيسي للفرق في الأداء بين النماذج الحالية والنماذج المثالية.
 
 <br>
 
 **43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
 
-&#10230;
+تحليل استئصالي (Ablative analysis) - التحليل الاستئصالي هو تحليل السبب الرئيسي للفرق في الأداء بين النماذج الحالية والنماذج المبدئية (baseline).
 
 <br>
 
 **44. Regression metrics**
 
-&#10230;
+مقاييس الانحدار
 
 <br>
 
 **45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
 
-&#10230;
+[مقاييس التصنيف، مصفوفة الدقّة، الضبط (accuracy)، الدقة (precision)، الاستدعاء (recall)، درجة F1]
 
 <br>
 
 **46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
 
-&#10230;
+[مقاييس الانحدار، مربع R، معيار معامل مالوس (Mallow's)، معيار آكياك المعلوماتي (AIC)، معيار المعلومات البايزي (BIC)]
 
 <br>
 
 **47. [Model selection, cross-validation, regularization]**
 
-&#10230;
+[اختيار النموذج، التحقق المتقاطع، الضبط]
 
 <br>
 
 **48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
 
-&#10230;
+[التشخيصات، موازنة الانحياز/التباين، تحليل الخطأ/التحليل الاستئصالي]

From e37040b72d332e160e7c7a2e2509df7362039530 Mon Sep 17 00:00:00 2001
From: qunaieer <fsaq2000@gmail.com>
Date: Tue, 15 Oct 2019 22:00:32 +0300
Subject: [PATCH 410/531] Update cheatsheet-supervised-learning.md

modifications of cheatsheet-supervised-learning.md file after @zaidalyafeai comments.
---
 ar/cheatsheet-supervised-learning.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/ar/cheatsheet-supervised-learning.md b/ar/cheatsheet-supervised-learning.md
index 1d387b05e..58bc6aeaa 100644
--- a/ar/cheatsheet-supervised-learning.md
+++ b/ar/cheatsheet-supervised-learning.md
@@ -67,7 +67,7 @@
 
 **12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
 
-دالة الخسارة (Loss function) - دالة الخسارة هي الدالة L:(z,y)∈R×Y⟼L(z,y)∈R التي تأخذ كمدخلات القيمة المتوقعة z والقيمة الحقيقية y وتعطينا الفرق بينهما. الجدول التالي يحتوي على بعض دوال الخسارة الشائعة:
+دالة الخسارة (Loss function) - دالة الخسارة هي الدالة L:(z,y)∈R×Y⟼L(z,y)∈R التي تأخذ كمدخلات القيمة المتوقعة z والقيمة الحقيقية y وتعطينا الاختلاف بينهما. الجدول التالي يحتوي على بعض دوال الخسارة الشائعة:
 
 <br>
 
@@ -97,7 +97,7 @@
 
 **17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
 
-ملاحظة: في النزول الاشتقاقي العشوائي (Stochastic gradient descent (SGD)) يتم تحديث المُدخلات (parameters) بناءاً على كل عينة تدريب على حدة، بينما في النزول الاشتقاقي الحُزَمي (batch gradient descent) يتم تحديثها باستخدام حُزَم من عينات التدريب.
+ملاحظة: في النزول الاشتقاقي العشوائي (Stochastic gradient descent (SGD)) يتم تحديث المُعاملات (parameters) بناءاً على كل عينة تدريب على حدة، بينما في النزول الاشتقاقي الحُزَمي (batch gradient descent) يتم تحديثها باستخدام حُزَم من عينات التدريب.
 
 <br>
 
@@ -289,7 +289,7 @@
 
 **49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
 
-في التطبيق، يمكن أن تُعَرَّف الدالة K عن طريق المعادلة K(x,z)=exp(−||x−z||22σ2)، ويطلق عليها النواة الجاوسية (Gaussian kernel)، وهي تستخدم بكثرة.
+عملياً، يمكن أن تُعَرَّف الدالة K عن طريق المعادلة K(x,z)=exp(−||x−z||22σ2)، ويطلق عليها النواة الجاوسية (Gaussian kernel)، وهي تستخدم بكثرة.
 
 <br>
 
@@ -493,7 +493,7 @@
 
 **83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
 
-مجموعة تكسيرية (Shattering Set) - إذا كان لدينا المجموعة S={x(1),...,x(d)}، ومجموعة مُصنٍّفات H، نقول أن H shatters S إذا كان لكل مجموعة علامات (labels) {y(1),...,y(d)} لدينا:
+مجموعة تكسيرية (Shattering Set) - إذا كان لدينا المجموعة S={x(1),...,x(d)}، ومجموعة مُصنٍّفات H، نقول أن H تكسر S (H shatters S) إذا كان لكل مجموعة علامات (labels) {y(1),...,y(d)} لدينا:
 
 <br>
 
@@ -505,7 +505,7 @@
 
 **85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
 
-بُعْد فابنيك-تشرفونيكس (Vapnik-Chervonenkis - VC) لفئة فرضية غير محدودة (infinite hypothesis class) H، ويرمز له بـ VC(H)، هو حجم أكبر مجموعة (set) التي shattered by H.
+بُعْد فابنيك-تشرفونيكس (Vapnik-Chervonenkis - VC) لفئة فرضية غير محدودة (infinite hypothesis class) H، ويرمز له بـ VC(H)، هو حجم أكبر مجموعة (set) التي تم تكسيرها بواسطة H (shattered by H).
 
 <br>
 

From f800ea91efbf754607fd8d73b8312f6313ef0dd5 Mon Sep 17 00:00:00 2001
From: Mahmoud Aslan <mahmoudaslan@outlook.com>
Date: Wed, 16 Oct 2019 08:27:36 +0300
Subject: [PATCH 411/531] Create cs-229-probability.md

---
 ar/cs-229-probability.md | 384 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 384 insertions(+)
 create mode 100644 ar/cs-229-probability.md

diff --git a/ar/cs-229-probability.md b/ar/cs-229-probability.md
new file mode 100644
index 000000000..e5b96b3d4
--- /dev/null
+++ b/ar/cs-229-probability.md
@@ -0,0 +1,384 @@
+**Probabilities and Statistics translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/refresher-probabilities-statistics)
+
+<br>
+
+**1. Probabilities and Statistics refresher**
+<div dir="rtl">
+مراجعة للاحتمالات والإحصاء
+</div>
+<br>
+
+**2. Introduction to Probability and Combinatorics**
+<div dir="rtl">
+مقدمة في الاحتمالات والتوافيق
+</div>
+<br>
+
+**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
+<div dir="rtl">
+فضاء العينة ― يعرَّف فضاء العينة لتجربة ما بمجموعة كل النتائج الممكنة لهذه التجربة ويرمز لها بـ S.
+</div>
+<br>
+
+**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
+<div dir="rtl">
+الحدث ― أي مجموعة جزئية E من فضاء العينة تعتبر حدثاً. أي، الحدث هو مجموعة من النتائج الممكنة للتجربة. إذا كانت نتيجة التجربة محتواة في E، عندها نقول أن الحدث E وقع.
+</div>
+<br>
+
+**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
+<div dir="rtl">
+مسلَّمات الاحتمالات. من أجل كل حدث E، نرمز لإحتمال وقوعه بـ P(E).
+</div>
+<br>
+
+**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
+<div dir="rtl">
+المسلَّمة 1 ― كل احتمال يأخد قيماً بين الـ 0 والـ 1 مضمَّنة، على سبيل المثال:
+</div>
+<br>
+
+**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
+<div dir="rtl">
+المسلَّمة 2 ― احتمال وقوع حدث ابتدائي واحد على الأقل من الأحداث الابتدائية في فضاء العينة يساوي الـ 1، على سبيل المثال:
+</div>
+<br>
+
+**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
+<div dir="rtl">
+المسلَّمة 3 ― من أجل أي سلسلة من الأحداث الغير متداخلة E1,...,En، لدينا:
+</div>
+<br>
+
+**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
+<div dir="rtl">
+التباديل ― التبديل هو عبارة عن ترتيب معين لـ r غرض مختارة من مجموعة من n غرض. عدد هكذا تراتيب يرمز له بـ P(n, r)، المعرف كالتالي:</div>
+<br>
+
+**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
+<div dir="rtl">
+التوافيق ― التوفيق هو اختيار لـ r غرض من مجموعة مكونة من n غرض بدون إعطاء الترتيب أية أهمية. عدد هكذا توافيق يرمز له بـ C(n, r)، المعرف كالتالي:
+</div>
+<br>
+
+**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
+<div dir="rtl">
+ملاحظة: من أجل <span dir="ltr">0⩽r⩽n</span>، يكون لدينا P(n,r)⩾C(n,r)
+</div>
+<br>
+
+**12. Conditional Probability**
+<div dir="rtl">
+الاحتمال الشرطي
+</div>
+<br>
+
+**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
+<div dir="rtl">
+قاعدة بايز ― من أجل الأحداث A و B بحيث P(B)>0، يكون لدينا:
+</div>
+<br>
+
+**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
+<div dir="rtl">
+ملاحظة: لدينا P(A∩B)=P(A)P(B|A)=P(A|B)P(B)
+</div>
+<br>
+
+**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
+<div dir="rtl">
+القسم ― ليكن {Ai,i∈[[1,n]]} بحيث من أجل كل i، لدينا<span dir="ltr">Ai≠∅ </span>. نقول أن {Ai} قسم إذا كان لدينا: 
+</div>
+<br>
+
+**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
+<div dir="rtl">
+ملاحظة: من أجل أي حدث B من فضاء العينة، لدينا P(B)=n∑i=1P(B|Ai)P(Ai).
+</div>
+<br>
+
+**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
+<div dir="rtl">
+النسخة الموسعة من قاعدة بايز ― ليكن {Ai,i∈[[1,n]]} قسم من فضاء العينة. لدينا:
+</div>
+<br>
+
+**18. Independence ― Two events A and B are independent if and only if we have:**
+<div dir="rtl">
+الاستقلال ― يكون حدثين A و B مستقلين إذا وفقط إذا كان لدينا:
+</div>
+<br>
+
+**19. Random Variables**
+<div dir="rtl">
+المتحولات العشوائية
+</div>
+<br>
+
+**20. Definitions**
+<div dir="rtl">
+تعاريف
+</div>
+<br>
+
+**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
+<div dir="rtl">
+المتحول العشوائي ― المتحول العشوائي، المرمز له عادة ب X، هو دالة تربط كل عنصر من فضاء العينة إلى خط الأعداد الحقيقية.
+</div>
+<br>
+
+**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
+<div dir="rtl">
+دالة التوزيع التراكمي (CDF) ― تعرف دالة التوزيع التراكمي F، والتي تكون غير متناقصة بشكل دائم وتحقق limx→−∞F(x)=0 و limx→+∞F(x)=1، كالتالي:
+</div>
+<br>
+
+**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
+<div dir="rtl">
+ملاحظة: لدينا P(a&lt;X⩽B)=F(b)−F(a).
+</div>
+<br>
+
+**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
+<div dir="rtl">
+دالة الكثافة الإحتمالية (PDF) ― دالة الكثافة الاحتمالية f هي احتمال أن يأخذ X قيماً بين قيمتين متجاورتين من قيم المتحول العشوائي.
+</div>
+<br>
+
+**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
+<div dir="rtl">
+علاقات تتضمن دالة الكثافة الاحتمالية ودالة التوزع التراكمي ― هذه بعض الخصائص التي من المهم معرفتها في الحالتين المتقطعة (D) والمستمرة (C).
+</div>
+<br>
+
+**26. [Case, CDF F, PDF f, Properties of PDF]**
+<div dir="rtl">
+[الحالة، دالة التوزع التراكمي F، دالة الكثافة الاحتمالية f، خصائص دالة الكثافة الاحتمالية]
+</div>
+<br>
+
+**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
+<div dir="rtl">
+التوقع وعزوم التوزيع ― فيما يلي المصطلحات المستخدمة للتعبير عن القيمة المتوقعة E[X]، الصيغة العامة للقيمة المتوقعة E[g(X)]، العزم رقم K  <span dir="ltr">E[XK]</span>  ودالة السمة ψ(ω) من أجل الحالات المتقطعة والمستمرة:
+</div>
+<br>
+
+**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
+<div dir="rtl">
+التباين ― تباين متحول عشوائي، والذي يرمز له عادةً ب Var(X) أو σ2، هو مقياس لانتشار دالة توزيع هذا المتحول. يحسب بالشكل التالي:
+</div>
+<br>
+
+**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
+<div dir="rtl">
+الانحراف المعياري ― الانحراف المعياري لمتحول عشوائي، والذي يرمز له عادةً ب σ، هو مقياس لانتشار دالة توزيع هذا المتحول بما يتوافق مع وحدات قياس المتحول العشوائي. يحسب بالشكل التالي:
+</div>
+<br>
+
+**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
+<div dir="rtl">
+تحويل المتحولات العشوائية ― لتكن المتحولات العشوائية X وY مرتبطة من خلال دالة ما. باعتبار fX وfY دالتا التوزيع لX وY على التوالي، يكون لدينا:</div>
+<br>
+
+**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
+<div dir="rtl">
+قاعدة لايبنتز للتكامل ― لتكن g دالة لـ x وربما لـ c، ولتكن a وb حدود قد تعتمد على c. يكون لدينا:
+</div>
+<br>
+
+**32. Probability Distributions**
+<div dir="rtl">
+التوزعات الاحتمالية
+</div>
+<br>
+
+**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
+<div dir="rtl">
+متراجحة تشيبشيف ― ليكن X متحولاً عشوائياً قيمته المتوقعة تساوي μ. من أجل k ،σ>0، لدينا المتراجحة التالية:
+</div>
+<br>
+
+**34. Main distributions ― Here are the main distributions to have in mind:**
+<div dir="rtl">
+التوزعات الأساسية ― فيما يلي التوزعات الأساسية لأخذها بالاعتبار:
+</div>
+<br>
+
+**35. [Type, Distribution]**
+<div dir="rtl">
+[الحالة، التوزع]
+</div>
+<br>
+
+**36. Jointly Distributed Random Variables**
+<div dir="rtl">
+المتحولات العشوائية الموزعة بشكل مشترك
+</div>
+<br>
+
+**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
+<div dir="rtl">
+الكثافة الهامشية والتوزع التراكمي ― من دالة الكثافة الاحتمالية المشتركة fXY، لدينا:
+</div>
+<br>
+
+**38. [Case, Marginal density, Cumulative function]**
+<div dir="rtl">
+[الحالة، الكثافة الهامشية، الدالة التراكمية]
+</div>
+<br>
+
+**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
+<div dir="rtl">
+الكثافة الشرطية ― الكثافة الشرطية لـ X بالنسبة لـ Y، والتي يرمز لها عادةً بـ fX|Y، تعرف بالشكل التالي: 
+</div>
+<br>
+
+**40. Independence ― Two random variables X and Y are said to be independent if we have:**
+<div dir="rtl">
+الاستقلال ― يقال عن متحولين عشوائيين X و Y أنهما مستقلين إذا كان لدينا:
+</div>
+<br>
+
+**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
+<div dir="rtl">
+التغاير ― نعرف تغاير متحولين عشوائيين X و Y، والذي نرمز له بـ σ2XY أو بالرمز الأكثر شيوعاً Cov(X,Y)، كالتالي:
+</div>
+<br>
+
+**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
+<div dir="rtl">
+الارتباط ― بأخذ σX، σY كانحراف معياري لـ X و Y، نعرف الارتباط بين المتحولات العشوائية X و Y، و المرمز بـ ρXY، كالتالي:
+</div>
+<br>
+
+**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
+<div dir="rtl">
+ملاحظة 1: من أجل أية متحولات عشوائية X، Y، لدينا ρXY∈[−1,1].
+</div>
+<br>
+
+**44. Remark 2: If X and Y are independent, then ρXY=0.**
+<div dir="rtl">
+ملاحظة 2: إذا كان X و Y مستقلين، فإن ρXY=0.
+</div>
+<br>
+
+**45. Parameter estimation**
+<div dir="rtl">
+تقدير المُدخَل
+</div>
+<br>
+
+**46. Definitions**
+<div dir="rtl">
+تعاريف
+</div>
+<br>
+
+**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
+<div dir="rtl">
+العينة العشوائية ― العينة العشوائية هي مجموعة من n متحول عشوائي X1,...,Xn والتي تكون مستقلة وموزعة بشكل متطابق مع X.
+</div>
+<br>
+
+**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
+<div dir="rtl">
+المُقَدِّر ― المُقَدِّر هو تابع للبيانات المستخدمة لاستنباط قيمة متحول غير معلوم ضمن نموذج إحصائي.
+</div>
+<br>
+
+**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
+<div dir="rtl">
+الانحياز ― انحياز مُقَدِّر ^θ هو الفرق بين القيمة المتوقعة لتوزع ^θ والقيمة الحقيقية، كمثال:
+</div>
+<br>
+
+**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
+<div dir="rtl">
+ملاحظة: يقال عن مُقَدِّر أنه غير منحاز عندما يكون لدينا E[^θ]=θ.
+</div>
+<br>
+
+**51. Estimating the mean**
+<div dir="rtl">
+تقدير المتوسط
+</div>
+<br>
+
+**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
+<div dir="rtl">
+متوسط العينة ― يستخدم متوسط عينة عشوائية لتقدير المتوسط الحقيقي μ لتوزع ما، عادةً ما يرمز له بـ ¯¯¯¯¯X ويعرف كالتالي:
+</div>
+<br>
+
+**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
+<div dir="rtl">
+ملاحظة: متوسط العينة غير منحاز، أي E[¯¯¯¯¯X]=μ.
+</div>
+<br>
+
+**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
+<div dir="rtl">
+مبرهنة النهاية المركزية ― ليكن لدينا عينة عشوائية X1,...,Xn والتي تتبع لتوزع معطى له متوسط μ وتباين σ2، فيكون:
+</div>
+<br>
+
+**55. Estimating the variance**
+<div dir="rtl">
+تقدير التباين
+</div>
+<br>
+
+**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
+<div dir="rtl">
+تباين العينة ― يستخدم تباين عينة عشوائية لتقدير التباين الحقيقي σ2 لتوزع ما، والذي يرمز له عادةً بـ s2 أو ^σ2 ويعرّف بالشكل التالي:
+</div>
+<br>
+
+**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
+<div dir="rtl">
+ملاحظة: تباين العينة غير منحاز، أي E[s2]=σ2.
+</div>
+<br>
+
+**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
+<div dir="rtl">
+علاقة مربع كاي مع تباين العينة ― ليكن s2 تباين العينة لعينة عشوائية. لدينا:
+</div>
+<br>
+
+**59. [Introduction, Sample space, Event, Permutation]**
+<div dir="rtl">
+[مقدمة، فضاء العينة، الحدث، التبديل]
+</div>
+<br>
+
+**60. [Conditional probability, Bayes' rule, Independence]**
+<div dir="rtl">
+[الاحتمال الشرطي، قاعدة بايز، الاستقلال]
+</div>
+<br>
+
+**61. [Random variables, Definitions, Expectation, Variance]**
+<div dir="rtl">
+[المتحولات العشوائية، تعاريف، القيمة المتوقعة، التباين]
+</div>
+<br>
+
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
+<div dir="rtl">
+[التوزعات الاحتمالية، متراجحة تشيبشيف، توزعات رئيسية]
+</div>
+<br>
+
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
+<div dir="rtl">
+[المتحولات العشوائية الموزعة بشكل مشترك، الكثافة، التغاير، الارتباط]
+</div>
+<br>
+
+**64. [Parameter estimation, Mean, Variance]**
+<div dir="rtl">
+[تقدير المُدخَل، المتوسط، التباين]
+</div>

From 66d19a5e0cf01af768dba62112fb2b07e11631a9 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 15 Oct 2019 22:55:08 -0700
Subject: [PATCH 412/531] Update [ar] progress files

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 379eb314e..ba299d551 100644
--- a/README.md
+++ b/README.md
@@ -54,7 +54,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 ### CS 229 (Machine Learning)
 | |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|**العَرَبِيَّة**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|
+|**العَرَبِيَّة**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/182)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|
 |**Català**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|
 |**Deutsch**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|
 |**Español**|done|done|done|done|done|done|

From f8345c1dcac3835ccb7135eba6cc0e172eed28dc Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Wed, 16 Oct 2019 22:47:43 +0900
Subject: [PATCH 413/531] fix line 309 of cheatsheet-deep-learning

---
 vi/cheatsheet-deep-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vi/cheatsheet-deep-learning.md b/vi/cheatsheet-deep-learning.md
index e34e5eb70..e03a3f0ca 100644
--- a/vi/cheatsheet-deep-learning.md
+++ b/vi/cheatsheet-deep-learning.md
@@ -306,7 +306,7 @@
 
 **52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
 
-&#10230; [Mạng neural tích chập, Tầng chập, Chuẩn hoá batch]
+&#10230; [Mạng neural tích chập, Tầng tích chập, Chuẩn hoá batch]
 
 <br>
 

From 7e9b06bd649e3ab018f11100a2ec1cf4a6b2b598 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Wed, 16 Oct 2019 23:47:54 +0900
Subject: [PATCH 414/531] vi cheatsheet supervised learning

---
 vi/cheatsheet-supervised-learning.md | 60 ++++++++++++++--------------
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/vi/cheatsheet-supervised-learning.md b/vi/cheatsheet-supervised-learning.md
index cfcee9526..f069b6779 100644
--- a/vi/cheatsheet-supervised-learning.md
+++ b/vi/cheatsheet-supervised-learning.md
@@ -12,7 +12,7 @@
 
 **3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
 
-&#10230; Cho một tập hợp các điểm dữ liệu {x(1),...,x(m)} tương ứng với đó là tập các kết quả {y(1),...,y(m)}, chúng ta muốn xây dựng một bộ phân loại học được các dự đoán y từ x.
+&#10230; Cho một tập hợp các điểm dữ liệu {x(1),...,x(m)} tương ứng với đó là tập các đầu ra {y(1),...,y(m)}, chúng ta muốn xây dựng một bộ phân loại học được cách dự đoán y từ x.
 
 <br>
 
@@ -42,13 +42,13 @@
 
 **8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
 
-&#10230; [Mô hình phân biệt, Mô hình sáng tạo, Mục tiêu, Những gì học được, Hình minh hoạ, Các ví dụ]
+&#10230; [Mô hình phân biệt, Mô hình sinh, Mục tiêu, Những gì học được, Hình minh hoạ, Các ví dụ]
 
 <br>
 
 **9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
 
-&#10230; []
+&#10230; [Ước lượng trực tiếp P(y|x), Ước lượng P(x|y) để tiếp tục suy luận P(y|x), Biên quyết địnhđịnh, Phân bố xác suất của dữ liệu, Hồi quy, SVMs, GDA, Naive Bayes]
 
 <br>
 
@@ -60,7 +60,7 @@
 
 **11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
 
-&#10230; Hypothesis - Hypothesis được kí hiệu là h0, là một mô hình mà chúng ta chọn. Với dữ liệu đầu vào cho trước x(i), mô hình dự đoaans đầu ra là h0(x(i)).
+&#10230; Hypothesis - Hypothesis được kí hiệu là h0, là một mô hình mà chúng ta chọn. Với dữ liệu đầu vào cho trước x(i), mô hình dự đoán đầu ra là h0(x(i)).
 
 <br>
 
@@ -72,7 +72,7 @@
 
 **13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
 
-&#10230;
+&#10230; [Least squared error, Mất mát Logistic, Mất mát Hinge, Cross-entropy]
 
 <br>
 
@@ -102,7 +102,7 @@
 
 **18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
 
-&#10230;
+&#10230; Likelihood - Likelihood của một mô hình L(θ) với tham số θ được sử dụng để tìm tham số tối ưu θ thông qua việc cực đại hoá likelihood. Trong thực tế, chúng ta sử dụng log-likelihood ℓ(θ)=log(L(θ)) đễ dễ dàng hơn trong việc tôi ưu hoá. Ta có:
 
 <br>
 
@@ -138,7 +138,7 @@
 
 **24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
 
-&#10230; Phương trình normal - Bằng việc kí hiệu X là ma trận thiết kế, giá trị của θ mà cực tiểu hoá cost function là một phương pháp dạng đóng như là:
+&#10230; Phương trình chuẩn - Bằng việc kí hiệu X là ma trận thiết kế, giá trị của θ mà cực tiểu hoá cost function là một phương pháp dạng đóng như là:
 
 <br>
 
@@ -198,13 +198,13 @@
 
 **34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
 
-&#10230;
+&#10230; Họ số mũ - Một lớp của phân phối được cho rằng thuộc về họ số mũ nếu nó có thể được viết dưới dạng một thuật ngữ của tham số tự nhiên, cũng được gọi là tham số kinh điển (canonical parameter) hoặc hàm kết nối, η, một số liệu thống kê đầy đủ T(y) và hàm phân vùng log (log-partition function) a(η) sẽ có dạng như sau:
 
 <br>
 
 **35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
 
-&#10230;
+&#10230; Chú ý: chúng ta thường có T(y)=y. Đồng thời, exp(−a(η)) có thể được xem như là tham số chuẩn hoá sẽ đảm bảo rằng tổng các xác suất là một.
 
 <br>
 
@@ -222,13 +222,13 @@
 
 **38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
 
-&#10230;
+&#10230; Giả thuyết GLMs - Mô hình tuyến tính tổng quát (GLM) với mục đích là dự đoán một biến ngẫu nhiên y như là hàm cho biến x∈Rn+1 và dựa trên 3 giả thuyết sau:
 
 <br>
 
 **39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
 
-&#10230;
+&#10230; Chú ý: Bình phương nhỏ nhất thông thường và logistirc regression đều là các trường hợp đặc biệt của các mô hình tuyến tính tổng quát.
 
 <br>
 
@@ -246,7 +246,7 @@
 
 **42: Optimal margin classifier ― The optimal margin classifier h is such that:**
 
-&#10230;
+&#10230; Optimal margin classifier - Optimal margin classifier h là như sau:
 
 <br>
 
@@ -270,73 +270,73 @@
 
 **46. Remark: the line is defined as wTx−b=0.**
 
-&#10230;
+&#10230; Chú ý: đường thẳng có phương trình là wTx−b=0.
 
 <br>
 
 **47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
 
-&#10230;
+&#10230; Mất mát Hinge - Mất mát Hinge được sử dụng trong thiết lập của SVMs và nó được định nghĩa như sau:
 
 <br>
 
 **48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
 
-&#10230;
+&#10230; Kernel (nhân) - Cho trước feature mapping ϕ, chúng ta định nghĩa kernel K như sau:
 
 <br>
 
 **49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
 
-&#10230;
+&#10230; Trong thực tế, kernel K được định nghĩa bởi K(x,z)=exp(−||x−z||22σ2) được gọi là Gaussian kernal và thường được sử dụng.
 
 <br>
 
 **50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
 
-&#10230;
+&#10230; [Phân tách phi tuyến, Việc sử dụng một kernel mapping, Biến quyết định trong không gian gốc]
 
 <br>
 
 **51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
 
-&#10230;
+&#10230; Chú ý: chúng ta nói rằng chúng ta sử dụng "kernel trick" để tính toán cost function sử dụng kernel bởi vì chúng ta thực sự không cần biết đến mapping tường minh ϕ, nó thường khá phức tạp. Thay vào đó, chỉ cần biết giá trị K(x,z).
 
 <br>
 
 **52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
 
-&#10230;
+&#10230; Lagrangian - Chúng ta định nghĩa Lagrangian L(w,b) như sau:
 
 <br>
 
 **53. Remark: the coefficients βi are called the Lagrange multipliers.**
 
-&#10230;
+&#10230; Chú ý: hệ số βi được gọi là bội số Lagrange.
 
 <br>
 
 **54. Generative Learning**
 
-&#10230;
+&#10230; Generative Learning
 
 <br>
 
 **55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
 
-&#10230;
+&#10230; Một generative model đầu tiên cố gắng học cách dữ liệu được sinh ra thông qua việc ước lượng P(x|y), chúng ta có thể sau đó sử dụng P(x|y) để ước lượng P(y|x) bằng cách sử dụng luật Bayes.
 
 <br>
 
 **56. Gaussian Discriminant Analysis**
 
-&#10230;
+&#10230; Gaussian Discriminant Analysis
 
 <br>
 
 **57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
 
-&#10230;
+&#10230; Thiết lập - Gaussian Discriminant Analysis giả sử rằng y và x|y=0 và x|y=1 là như sau:
 
 <br>
 
@@ -348,7 +348,7 @@
 
 **59. Naive Bayes**
 
-&#10230;
+&#10230; Naive Bayes
 
 <br>
 
@@ -372,7 +372,7 @@
 
 **63. Tree-based and ensemble methods**
 
-&#10230;
+&#10230; Phương thức Tree-based và toàn thể
 
 <br>
 
@@ -450,7 +450,7 @@
 
 **76. Union bound ― Let A1,...,Ak be k events. We have:**
 
-&#10230;
+&#10230; Union bound - Cho k sự kiện là A1,...,Ak. Ta có:
 
 <br>
 
@@ -540,7 +540,7 @@
 
 **91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
 
-&#10230;
+&#10230; [Máy vector hỗ trợ, Optimal margin classifier, Mất mát Hinge, Kernel]
 
 <br>
 
@@ -558,10 +558,10 @@
 
 **94. [Other methods, k-NN]**
 
-&#10230;
+&#10230; [Các phương thức khác, k-NN]
 
 <br>
 
 **95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
 
-&#10230;
+&#10230; [Lí thuyết học, Bất đẳng thức Hoeffding, PAC, VC dimension]

From 5fae643dfa8b2020bfbd1c44bc85a51920ff922a Mon Sep 17 00:00:00 2001
From: qunaieer <fsaq2000@gmail.com>
Date: Sat, 19 Oct 2019 01:32:17 +0300
Subject: [PATCH 415/531] Update cheatsheet-machine-learning-tips-and-tricks.md

Modification on cheatsheet-machine-learning-tips-and-tricks.md file after @zaidalyafeai review.
---
 ar/cheatsheet-machine-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ar/cheatsheet-machine-learning-tips-and-tricks.md b/ar/cheatsheet-machine-learning-tips-and-tricks.md
index b78162b6e..5a5092f01 100644
--- a/ar/cheatsheet-machine-learning-tips-and-tricks.md
+++ b/ar/cheatsheet-machine-learning-tips-and-tricks.md
@@ -179,7 +179,7 @@
 
 **31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
 
-[بشكل غالب k=5 أو 10، الحالة p=1 يطلق عليها الإبقاء على واحد (leave-one-out)]
+[بشكل عام k=5 أو 10، الحالة p=1 يطلق عليها الإبقاء على واحد (leave-one-out)]
 
 <br>
 

From 2cc5b311b42dc9dd545e35e3c5535f228f99fcb7 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Sat, 19 Oct 2019 12:38:44 +0900
Subject: [PATCH 416/531] vi cheatsheet supervised learning

---
 vi/cheatsheet-supervised-learning.md | 60 ++++++++++++++--------------
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/vi/cheatsheet-supervised-learning.md b/vi/cheatsheet-supervised-learning.md
index f069b6779..f86e01c00 100644
--- a/vi/cheatsheet-supervised-learning.md
+++ b/vi/cheatsheet-supervised-learning.md
@@ -228,7 +228,7 @@
 
 **39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
 
-&#10230; Chú ý: Bình phương nhỏ nhất thông thường và logistirc regression đều là các trường hợp đặc biệt của các mô hình tuyến tính tổng quát.
+&#10230; Chú ý: Bình phương nhỏ nhất thông thường và logistic regression đều là các trường hợp đặc biệt của các mô hình tuyến tính tổng quát.
 
 <br>
 
@@ -300,7 +300,7 @@
 
 **51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
 
-&#10230; Chú ý: chúng ta nói rằng chúng ta sử dụng "kernel trick" để tính toán cost function sử dụng kernel bởi vì chúng ta thực sự không cần biết đến mapping tường minh ϕ, nó thường khá phức tạp. Thay vào đó, chỉ cần biết giá trị K(x,z).
+&#10230; Chú ý: chúng ta nói rằng chúng ta sử dụng "kernel trick" để tính toán cost function sử dụng kernel bởi vì chúng ta thực sự không cần biết đến ánh xạ tường minh ϕ, nó thường khá phức tạp. Thay vào đó, chỉ cần biết giá trị K(x,z).
 
 <br>
 
@@ -324,7 +324,7 @@
 
 **55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
 
-&#10230; Một generative model đầu tiên cố gắng học cách dữ liệu được sinh ra thông qua việc ước lượng P(x|y), chúng ta có thể sau đó sử dụng P(x|y) để ước lượng P(y|x) bằng cách sử dụng luật Bayes.
+&#10230; Một mô hình sinh đầu tiên cố gắng học cách dữ liệu được sinh ra thông qua việc ước lượng P(x|y), chúng ta có thể sau đó sử dụng P(x|y) để ước lượng P(y|x) bằng cách sử dụng luật Bayes.
 
 <br>
 
@@ -342,7 +342,7 @@
 
 **58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
 
-&#10230;
+&#10230; Sự ước lượng - Bảng sau đây tổng kết các ước lượng mà chúng ta tìm thấy khi tối đa hoá likelihood:
 
 <br>
 
@@ -354,19 +354,19 @@
 
 **60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
 
-&#10230;
+&#10230; Giả thiết - Mô hình Naive Bayes giả sử rằng các features của các điểm dữ liệu đều độc lập với nhau:
 
 <br>
 
 **61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
 
-&#10230;
+&#10230; Giải pháp - Tối đa hoá log-likelihood đưa ra những lời giải sau đây, với k∈{0,1},l∈[[1,L]]
 
 <br>
 
 **62. Remark: Naive Bayes is widely used for text classification and spam detection.**
 
-&#10230;
+&#10230; Chú ý: Naive Bayes được sử dụng rộng rãi cho bài toán phân loại văn bản và phát hiện spam.
 
 <br>
 
@@ -378,61 +378,61 @@
 
 **64. These methods can be used for both regression and classification problems.**
 
-&#10230;
+&#10230; Các phương thức này có thể được sử dụng cho cả bài toán hồi quy lẫn bài toán phân loại.
 
 <br>
 
 **65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
 
-&#10230;
+&#10230; CART - Cây phân loại và hồi quy (CART), thường được biết đến là cây quyết định, có thể được biểu diễn dưới dạng cây nhị phân. Chúng có các ưu điểm có thể được diễn giải một cách dễ dàng.
 
 <br>
 
 **66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
 
-&#10230;
+&#10230; Rừng ngẫu nhiên 0 Nó là một kĩ thuật dựa trên cây, sử dụng số lượng lớn các cây quyết định để lựa chọn ngẫu nhiện các tập thuộc tính. Ngược lại với một cây quyết định đơn, kĩ thuật này khá khó diễn giải nhưng do có hiệu năng tốt nên đã trở thành một giải thuật khá phổ biến hiện nay.
 
 <br>
 
 **67. Remark: random forests are a type of ensemble methods.**
 
-&#10230;
+&#10230; Chú ý: rững ngẫu nhiên là một loại giải thuật ensemble.
 
 <br>
 
 **68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
 
-&#10230;
+&#10230; Boosting - Ý tưởng của các phương thức boosting là kết hợp các phương pháp học yếu hơn để tạo nên phương pháp học mạnh hơn. Những phương thức chính được tổng kết ở bảng dưới đây:
 
 <br>
 
 **69. [Adaptive boosting, Gradient boosting]**
 
-&#10230;
+&#10230; [Adaptive boosting, Gradient boosting]
 
 <br>
 
 **70. High weights are put on errors to improve at the next boosting step**
 
-&#10230;
+&#10230; Các trọng số có giá trị lớn được đặt vào các phần lỗi để cải thiện ở bước boosting tiếp theo
 
 <br>
 
 **71. Weak learners trained on remaining errors**
 
-&#10230;
+&#10230; Các phương pháp học yếu huấn luyện trên các phần lỗi còn lại 
 
 <br>
 
 **72. Other non-parametric approaches**
 
-&#10230;
+&#10230; Các cách tiếp cận phi-tham số khác
 
 <br>
 
 **73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
 
-&#10230;
+&#10230; k-nearest neighbors - Giải thuật k-nearest neighbors, thường được biết đến là k-NN, là cách tiếp cận phi-tham số, ở phương pháp này phân lớp của một điểm dữ liệu được định nghĩa bởi k điểm dữ liệu gần nó nhất trong tập huấn luyện. Phương pháp này có thể được sử dụng trong quá trình thiết lập cho bài toán phân loại cũng như bài toán hồi quy.
 
 <br>
 
@@ -456,25 +456,25 @@
 
 **77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
 
-&#10230;
+&#10230; Bất đẳng thức Hoeffding
 
 <br>
 
 **78. Remark: this inequality is also known as the Chernoff bound.**
 
-&#10230;
+&#10230; Chú ý: bất đẳng thức này còn được biết đến như là ràng buộc Chernoff.
 
 <br>
 
 **79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
 
-&#10230;
+&#10230; Lỗi huấn luyện (Training error) - Cho trước classifier h, ta định nghĩa training error ˆϵ(h), còn được biết đến là empirical risk hoặc empirical error, như sau:
 
 <br>
 
 **80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions:**
 
-&#10230;
+&#10230; Probably Approximately Correct (PAC) - PAC là một framework với nhiều kết quả về lí thuyết học đã được chứng minh, và có tập hợp các giả thiết như sau:
 
 <br>
 
@@ -492,31 +492,31 @@
 
 **83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
 
-&#10230;
+&#10230; Shattering (Chia nhỏ) - Cho một tập hợp S={x(1),...,x(d)}, và một tập hợp các classifiers H, ta nó rằng H chia nhỏ S nếu với bất kì tập các nhãn {y(1),...,y(d)} nào, ta có:
 
 <br>
 
 **84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
 
-&#10230;
+&#10230; Định lí giới hạn trên - Cho H là một finite hypothesis class mà |H|=k với δ, kích cỡ m là cố định. Khi đó, với xác suất nhỏ nhất là 1−δ, ta có:
 
 <br>
 
 **85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
 
-&#10230;
+&#10230; VC dimension - Vapnik-Chervonenkis (VC) dimension của class infinite hypothesis H cho trước, kí hiệu là VC(H) là kích thước của tập lớn nhất được chia nhỏ bởi H.
 
 <br>
 
 **86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
 
-&#10230;
+&#10230; Chú ý: VC dimension của H={tập hợp các linear classifiers trong 2 chiều} là 3.
 
 <br>
 
 **87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
 
-&#10230;
+&#10230; Định lí (Vapnik) - Cho H với VC(H)=d và m là số lượng các ví dụ huấn luyện. Với xác suất nhỏ nhất là 1−δ, ta có:
 
 <br>
 
@@ -528,13 +528,13 @@
 
 **89. [Notations and general concepts, loss function, gradient descent, likelihood]**
 
-&#10230;
+&#10230; [Các kí hiệu và các khái niệm tổng quát, hàm mất mát, gradient descent, likelihood]
 
 <br>
 
 **90. [Linear models, linear regression, logistic regression, generalized linear models]**
 
-&#10230;
+&#10230; [Các mô hình tuyến tính, hồi quy tuyến tính, hồi quy logistic, mô hình tuyến tính tổng quát]
 
 <br>
 
@@ -546,13 +546,13 @@
 
 **92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
 
-&#10230;
+&#10230; [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]
 
 <br>
 
 **93. [Trees and ensemble methods, CART, Random forest, Boosting]**
 
-&#10230;
+&#10230; [Cây và các phương pháp ensemble, CART, Rừng ngẫu nhiên, Boosting]
 
 <br>
 

From 612fd929d0c6f50abd22030651a66a4ee10797b6 Mon Sep 17 00:00:00 2001
From: Mahmoud Aslan <mahmoudaslan@outlook.com>
Date: Sun, 20 Oct 2019 10:02:51 +0300
Subject: [PATCH 417/531] Added review edits

---
 ar/cs-229-probability.md | 67 ++++++++++++++++++++--------------------
 1 file changed, 34 insertions(+), 33 deletions(-)

diff --git a/ar/cs-229-probability.md b/ar/cs-229-probability.md
index e5b96b3d4..9b837455b 100644
--- a/ar/cs-229-probability.md
+++ b/ar/cs-229-probability.md
@@ -28,42 +28,43 @@
 
 **5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
 <div dir="rtl">
-مسلَّمات الاحتمالات. من أجل كل حدث E، نرمز لإحتمال وقوعه بـ P(E).
+مسلَّمات الاحتمالات. لكل حدث E، نرمز لإحتمال وقوعه بـ P(E).
 </div>
 <br>
 
 **6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
 <div dir="rtl">
-المسلَّمة 1 ― كل احتمال يأخد قيماً بين الـ 0 والـ 1 مضمَّنة، على سبيل المثال:
+المسلَّمة 1 ― كل احتمال يأخد قيماً بين الـ 0 والـ 1 مضمَّنة:
 </div>
 <br>
 
 **7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
 <div dir="rtl">
-المسلَّمة 2 ― احتمال وقوع حدث ابتدائي واحد على الأقل من الأحداث الابتدائية في فضاء العينة يساوي الـ 1، على سبيل المثال:
+المسلَّمة 2 ― احتمال وقوع حدث ابتدائي واحد على الأقل من الأحداث الابتدائية في فضاء العينة يساوي الـ 1:
 </div>
 <br>
 
 **8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
 <div dir="rtl">
-المسلَّمة 3 ― من أجل أي سلسلة من الأحداث الغير متداخلة E1,...,En، لدينا:
+المسلَّمة 3 ― لأي سلسلة من الأحداث الغير متداخلة E1,...,En، لدينا:
 </div>
 <br>
 
 **9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
 <div dir="rtl">
-التباديل ― التبديل هو عبارة عن ترتيب معين لـ r غرض مختارة من مجموعة من n غرض. عدد هكذا تراتيب يرمز له بـ P(n, r)، المعرف كالتالي:</div>
-<br>
+التباديل ― التبديل هو عبارة عن عدد الاختيارات لـ r غرض من مجموعة مكونة من n غرض بترتيب محدد. عدد هكذا تراتيب يرمز له بـ P(n, r)، المعرف كالتالي:
+ </div>
+ <br>
 
 **10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
 <div dir="rtl">
-التوافيق ― التوفيق هو اختيار لـ r غرض من مجموعة مكونة من n غرض بدون إعطاء الترتيب أية أهمية. عدد هكذا توافيق يرمز له بـ C(n, r)، المعرف كالتالي:
+التوافيق ― التوفيق هو عدد الاختيارات لـ r غرض من مجموعة مكونة من n غرض بدون إعطاء الترتيب أية أهمية. عدد هكذا توافيق يرمز له بـ C(n, r)، المعرف كالتالي:
 </div>
 <br>
 
 **11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
 <div dir="rtl">
-ملاحظة: من أجل <span dir="ltr">0⩽r⩽n</span>، يكون لدينا P(n,r)⩾C(n,r)
+ملاحظة: لكل <span dir="ltr">0⩽r⩽n</span>، يكون لدينا P(n,r)⩾C(n,r) 
 </div>
 <br>
 
@@ -75,7 +76,7 @@
 
 **13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
 <div dir="rtl">
-قاعدة بايز ― من أجل الأحداث A و B بحيث P(B)>0، يكون لدينا:
+قاعدة بايز ― إذا كانت لدينا الأحداث A و B بحيث P(B)>0، يكون لدينا:
 </div>
 <br>
 
@@ -87,13 +88,13 @@
 
 **15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
 <div dir="rtl">
-القسم ― ليكن {Ai,i∈[[1,n]]} بحيث من أجل كل i، لدينا<span dir="ltr">Ai≠∅ </span>. نقول أن {Ai} قسم إذا كان لدينا: 
+القسم ― ليكن {Ai,i∈[[1,n]]} بحيث لكل i لدينا<span dir="ltr">Ai≠∅ </span>. نقول أن {Ai} قسم إذا كان لدينا: 
 </div>
 <br>
 
 **16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
 <div dir="rtl">
-ملاحظة: من أجل أي حدث B من فضاء العينة، لدينا P(B)=n∑i=1P(B|Ai)P(Ai).
+ملاحظة: لأي حدث B في فضاء العينة، لدينا P(B)=n∑i=1P(B|Ai)P(Ai).
 </div>
 <br>
 
@@ -123,13 +124,13 @@
 
 **21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
 <div dir="rtl">
-المتحول العشوائي ― المتحول العشوائي، المرمز له عادة ب X، هو دالة تربط كل عنصر من فضاء العينة إلى خط الأعداد الحقيقية.
+المتحول العشوائي ― المتحول العشوائي، ويرمز له عادة بـ X، هو دالة تربط كل عنصر في فضاء العينة إلى خط الأعداد الحقيقية.
 </div>
 <br>
 
 **22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
 <div dir="rtl">
-دالة التوزيع التراكمي (CDF) ― تعرف دالة التوزيع التراكمي F، والتي تكون غير متناقصة بشكل دائم وتحقق limx→−∞F(x)=0 و limx→+∞F(x)=1، كالتالي:
+دالة التوزيع التراكمي (CDF) ― تعرف دالة التوزيع التراكمي F، والتي تكون غير متناقصة بشكل رتيب وتحقق limx→−∞F(x)=0 و limx→+∞F(x)=1، كالتالي: 
 </div>
 <br>
 
@@ -159,7 +160,7 @@
 
 **27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
 <div dir="rtl">
-التوقع وعزوم التوزيع ― فيما يلي المصطلحات المستخدمة للتعبير عن القيمة المتوقعة E[X]، الصيغة العامة للقيمة المتوقعة E[g(X)]، العزم رقم K  <span dir="ltr">E[XK]</span>  ودالة السمة ψ(ω) من أجل الحالات المتقطعة والمستمرة:
+التوقع وعزوم التوزيع ― فيما يلي المصطلحات المستخدمة للتعبير عن القيمة المتوقعة E[X]، الصيغة العامة للقيمة المتوقعة E[g(X)]، العزم رقم K  <span dir="ltr">E[XK]</span>  ودالة السمة ψ(ω) للحالات المتقطعة والمستمرة:
 </div>
 <br>
 
@@ -182,43 +183,43 @@
 
 **31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
 <div dir="rtl">
-قاعدة لايبنتز للتكامل ― لتكن g دالة لـ x وربما لـ c، ولتكن a وb حدود قد تعتمد على c. يكون لدينا:
+ قاعدة لايبنتز (Leibniz) للتكامل ― لتكن g دالة لـ x وربما لـ c، ولتكن a وb حدود قد تعتمد على c. يكون لدينا: 
 </div>
 <br>
 
 **32. Probability Distributions**
 <div dir="rtl">
-التوزعات الاحتمالية
+التوزيعات الاحتمالية
 </div>
 <br>
 
 **33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
 <div dir="rtl">
-متراجحة تشيبشيف ― ليكن X متحولاً عشوائياً قيمته المتوقعة تساوي μ. من أجل k ،σ>0، لدينا المتراجحة التالية:
+متراجحة تشيبشيف (Chebyshev) ― ليكن X متحولاً عشوائياً قيمته المتوقعة تساوي μ. إذا كان لدينا k ،σ>0، سنحصل على المتراجحة التالية:
 </div>
 <br>
 
 **34. Main distributions ― Here are the main distributions to have in mind:**
 <div dir="rtl">
-التوزعات الأساسية ― فيما يلي التوزعات الأساسية لأخذها بالاعتبار:
+التوزيعات الأساسية ― فيما يلي التوزيعات الأساسية لأخذها بالاعتبار: 
 </div>
 <br>
 
 **35. [Type, Distribution]**
 <div dir="rtl">
-[الحالة، التوزع]
+[الحالة، التوزيع]
 </div>
 <br>
 
 **36. Jointly Distributed Random Variables**
 <div dir="rtl">
-المتحولات العشوائية الموزعة بشكل مشترك
+المتغيرات العشوائية الموزعة اشتراكياً
 </div>
 <br>
 
 **37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
 <div dir="rtl">
-الكثافة الهامشية والتوزع التراكمي ― من دالة الكثافة الاحتمالية المشتركة fXY، لدينا:
+الكثافة الهامشية والتوزيع التراكمي ― من دالة الكثافة الاحتمالية المشتركة fXY، لدينا: 
 </div>
 <br>
 
@@ -248,13 +249,13 @@
 
 **42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
 <div dir="rtl">
-الارتباط ― بأخذ σX، σY كانحراف معياري لـ X و Y، نعرف الارتباط بين المتحولات العشوائية X و Y، و المرمز بـ ρXY، كالتالي:
+الارتباط ― بأخذ σX، σY كانحراف معياري لـ X و Y، نعرف الارتباط بين المتحولات العشوائية X و Y، والمرمز بـ ρXY، كالتالي: 
 </div>
 <br>
 
 **43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
 <div dir="rtl">
-ملاحظة 1: من أجل أية متحولات عشوائية X، Y، لدينا ρXY∈[−1,1].
+ملاحظة 1: لأي متحولات عشوائية X، Y، لدينا ρXY∈[−1,1]. 
 </div>
 <br>
 
@@ -266,7 +267,7 @@
 
 **45. Parameter estimation**
 <div dir="rtl">
-تقدير المُدخَل
+تقدير المُدخَل (Parameter)
 </div>
 <br>
 
@@ -278,19 +279,19 @@
 
 **47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
 <div dir="rtl">
-العينة العشوائية ― العينة العشوائية هي مجموعة من n متحول عشوائي X1,...,Xn والتي تكون مستقلة وموزعة بشكل متطابق مع X.
+العينة العشوائية ― العينة العشوائية هي مجموعة من n متحول عشوائي X1,...,Xn والتي تكون مستقلة وموزعة تطابقياً مع X.
 </div>
 <br>
 
 **48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
 <div dir="rtl">
-المُقَدِّر ― المُقَدِّر هو تابع للبيانات المستخدمة لاستنباط قيمة متحول غير معلوم ضمن نموذج إحصائي.
+المُقَدِّر ― المُقَدِّر هو دالة للبيانات المستخدمة ويستخدم لاستنباط قيمة مُدخل غير معلوم ضمن نموذج إحصائي.
 </div>
 <br>
 
 **49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
 <div dir="rtl">
-الانحياز ― انحياز مُقَدِّر ^θ هو الفرق بين القيمة المتوقعة لتوزع ^θ والقيمة الحقيقية، كمثال:
+الانحياز ― انحياز مُقَدِّر ^θ هو الفرق بين القيمة المتوقعة لتوزيع ^θ والقيمة الحقيقية، كالتالي:
 </div>
 <br>
 
@@ -308,7 +309,7 @@
 
 **52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
 <div dir="rtl">
-متوسط العينة ― يستخدم متوسط عينة عشوائية لتقدير المتوسط الحقيقي μ لتوزع ما، عادةً ما يرمز له بـ ¯¯¯¯¯X ويعرف كالتالي:
+متوسط العينة ― يستخدم متوسط عينة عشوائية لتقدير المتوسط الحقيقي μ لتوزيع ما، عادةً ما يرمز له بـ ¯¯¯¯¯X ويعرف كالتالي: 
 </div>
 <br>
 
@@ -320,7 +321,7 @@
 
 **54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
 <div dir="rtl">
-مبرهنة النهاية المركزية ― ليكن لدينا عينة عشوائية X1,...,Xn والتي تتبع لتوزع معطى له متوسط μ وتباين σ2، فيكون:
+مبرهنة النهاية المركزية ― ليكن لدينا عينة عشوائية X1,...,Xn والتي تتبع لتوزيع معطى له متوسط μ وتباين σ2، فيكون:
 </div>
 <br>
 
@@ -332,7 +333,7 @@
 
 **56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
 <div dir="rtl">
-تباين العينة ― يستخدم تباين عينة عشوائية لتقدير التباين الحقيقي σ2 لتوزع ما، والذي يرمز له عادةً بـ s2 أو ^σ2 ويعرّف بالشكل التالي:
+تباين العينة ― يستخدم تباين عينة عشوائية لتقدير التباين الحقيقي σ2 لتوزيع ما، والذي يرمز له عادةً بـ s2 أو ^σ2 ويعرّف بالشكل التالي:
 </div>
 <br>
 
@@ -344,7 +345,7 @@
 
 **58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
 <div dir="rtl">
-علاقة مربع كاي مع تباين العينة ― ليكن s2 تباين العينة لعينة عشوائية. لدينا:
+علاقة مربع كاي (Chi-Squared) مع تباين العينة ― ليكن s2 تباين العينة لعينة عشوائية. لدينا:
 </div>
 <br>
 
@@ -368,13 +369,13 @@
 
 **62. [Probability distributions, Chebyshev's inequality, Main distributions]**
 <div dir="rtl">
-[التوزعات الاحتمالية، متراجحة تشيبشيف، توزعات رئيسية]
+[التوزيعات الاحتمالية، متراجحة تشيبشيف، توزيعات رئيسية] 
 </div>
 <br>
 
 **63. [Jointly distributed random variables, Density, Covariance, Correlation]**
 <div dir="rtl">
-[المتحولات العشوائية الموزعة بشكل مشترك، الكثافة، التغاير، الارتباط]
+[المتحولات العشوائية الموزعة اشتراكياً، الكثافة، التغاير، الارتباط] 
 </div>
 <br>
 

From e1b452a343a13da9304868f29ebf1495070018da Mon Sep 17 00:00:00 2001
From: Mahmoud Aslan <mahmoudaslan@outlook.com>
Date: Sun, 20 Oct 2019 10:12:03 +0300
Subject: [PATCH 418/531] Added review edits

---
 ar/cs-229-probability.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ar/cs-229-probability.md b/ar/cs-229-probability.md
index 9b837455b..d57cadf9f 100644
--- a/ar/cs-229-probability.md
+++ b/ar/cs-229-probability.md
@@ -207,7 +207,7 @@
 
 **35. [Type, Distribution]**
 <div dir="rtl">
-[الحالة، التوزيع]
+[النوع، التوزيع]
 </div>
 <br>
 

From 77acf9d1d4c12f45f48bb0b03d65424646e5c5c7 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 21 Oct 2019 00:13:14 -0700
Subject: [PATCH 419/531] Rename cheatsheet-supervised-learning.md to
 cs-229-supervised-learning.md

---
 ...sheet-supervised-learning.md => cs-229-supervised-learning.md} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename ar/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)

diff --git a/ar/cheatsheet-supervised-learning.md b/ar/cs-229-supervised-learning.md
similarity index 100%
rename from ar/cheatsheet-supervised-learning.md
rename to ar/cs-229-supervised-learning.md

From bf70c0c859f1a5aa058eb2ad8cb60a98dd761821 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 21 Oct 2019 00:14:31 -0700
Subject: [PATCH 420/531] Create cs-229-machine-learning-tips-and-tricks.md

---
 ar/cs-229-machine-learning-tips-and-tricks.md | 288 ++++++++++++++++++
 1 file changed, 288 insertions(+)
 create mode 100644 ar/cs-229-machine-learning-tips-and-tricks.md

diff --git a/ar/cs-229-machine-learning-tips-and-tricks.md b/ar/cs-229-machine-learning-tips-and-tricks.md
new file mode 100644
index 000000000..1739e8ffb
--- /dev/null
+++ b/ar/cs-229-machine-learning-tips-and-tricks.md
@@ -0,0 +1,288 @@
+**Machine Learning tips and tricks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks)
+
+<br>
+
+**1. Machine Learning tips and tricks cheatsheet**
+
+مرجع سريع لنصائح وحيل تعلّم الآلة
+
+<br>
+
+**2. Classification metrics**
+
+مقاييس التصنيف
+
+<br>
+
+**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
+
+في سياق التصنيف الثنائي، هذه المقاييس (metrics) المهمة التي يجدر مراقبتها من أجل تقييم آداء النموذج.
+
+<br>
+
+**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
+
+مصفوفة الدقّة (confusion matrix) - تستخدم مصفوفة الدقّة لأخذ تصور شامل عند تقييم أداء النموذج. وهي تعرّف كالتالي: 
+
+<br>
+
+**5. [Predicted class, Actual class]**
+
+[التصنيف المتوقع، التصنيف الفعلي]
+
+<br>
+
+**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
+
+المقاييس الأساسية - المقاييس التالية تستخدم في العادة لتقييم أداء نماذج التصنيف:
+
+<br>
+
+**7. [Metric, Formula, Interpretation]**
+
+[المقياس، المعادلة، التفسير]
+
+<br>
+
+**8. Overall performance of model**
+
+الأداء العام للنموذج
+
+<br>
+
+**9. How accurate the positive predictions are**
+
+دقّة التوقعات الإيجابية (positive)
+
+<br>
+
+**10. Coverage of actual positive sample**
+
+تغطية عينات التوقعات الإيجابية الفعلية
+
+<br>
+
+**11. Coverage of actual negative sample**
+
+تغطية عينات التوقعات السلبية الفعلية
+
+<br>
+
+**12. Hybrid metric useful for unbalanced classes**
+
+مقياس هجين مفيد للأصناف غير المتوازنة (unbalanced)
+
+<br>
+
+**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
+
+منحنى دقّة الأداء (ROC) - منحنى دقّة الآداء، ويطلق عليه ROC، هو رسمة لمعدل التصنيفات الإيجابية الصحيحة (TPR) مقابل معدل التصنيفات الإيجابية الخاطئة (FPR) باستخدام قيم حد (threshold) متغيرة. هذه المقاييس ملخصة في الجدول التالي:
+<br>
+
+**14. [Metric, Formula, Equivalent]**
+
+[المقياس، المعادلة، مرادف]
+
+<br>
+
+**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
+
+المساحة تحت منحنى دقة الأداء (المساحة تحت المنحنى) (AUC) - المساحة تحت منحنى دقة الأداء (المساحة تحت المنحنى)، ويطلق عليها  AUC أو AUROC، هي المساحة تحت ROC كما هو موضح في الرسمة التالية:
+
+<br>
+
+**16. [Actual, Predicted]**
+
+[الفعلي، المتوقع]
+
+<br>
+
+**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
+
+المقاييس الأساسية - إذا كان لدينا نموذج الانحدار f، فإن المقاييس التالية غالباً ما تستخدم لتقييم أداء النموذج:
+
+<br>
+
+**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
+
+[المجموع الكلي للمربعات، مجموع المربعات المُفسَّر، مجموع المربعات المتبقي]
+
+<br>
+
+**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
+
+مُعامل التحديد (Coefficient of determination) - مُعامل التحديد، وغالباً يرمز له بـ R2 أو r2، يعطي قياس لمدى مطابقة النموذج للنتائج الملحوظة، ويعرف كما يلي:
+
+<br>
+
+**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
+
+المقاييس الرئيسية - المقاييس التالية تستخدم غالباً لتقييم أداء نماذج الانحدار، وذلك بأن يتم الأخذ في الحسبان عدد المتغيرات n المستخدمة فيها:
+
+<br>
+
+**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
+
+حيث L هو الأرجحية، و ˆσ2 تقدير التباين الخاص بكل نتيجة.
+
+<br>
+
+**22. Model selection**
+
+اختيار النموذج
+
+<br>
+
+**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+مفردات - عند اختيار النموذج، نفرق بين 3 أجزاء من البيانات التي لدينا كالتالي:
+
+<br>
+
+**24. [Training set, Validation set, Testing set]**
+
+[مجموعة تدريب، مجموعة تحقق، مجموعة اختبار]
+
+<br>
+
+**25. [Model is trained, Model is assessed, Model gives predictions]**
+
+[يتم تدريب النموذج، يتم تقييم النموذج، النموذج يعطي التوقعات]
+
+<br>
+
+**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
+
+[غالباً 80% من مجموعة البيانات، غالباً 20% من مجموعة البيانات]
+
+<br>
+
+**27. [Also called hold-out or development set, Unseen data]**
+
+[يطلق عليها كذلك المجموعة المُجنّبة أو مجموعة التطوير، بيانات لم يسبق رؤيتها من قبل]
+
+<br>
+
+**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+بمجرد اختيار النموذج، يتم تدريبه على مجموعة البيانات بالكامل ثم يتم اختباره على مجموعة اختبار لم يسبق رؤيتها من قبل. كما هو موضح في الشكل التالي:
+
+<br>
+
+**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
+
+التحقق المتقاطع (Cross-validation) - التحقق المتقاطع، وكذلك يختصر بـ CV، هو طريقة تستخدم لاختيار نموذج بحيث لا يعتمد بشكل كبير على مجموعة بيانات التدريب المبدأية. أنواع التحقق المتقاطع المختلفة ملخصة في الجدول التالي:
+
+<br>
+
+**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
+
+[التدريب على k-1 جزء والتقييم باستخدام الجزء الباقي، التدريب على n−p عينة والتقييم باستخدام الـ p عينات المتبقية]
+
+<br>
+
+**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
+
+[بشكل عام k=5 أو 10، الحالة p=1 يطلق عليها الإبقاء على واحد (leave-one-out)]
+
+<br>
+
+**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
+
+الطريقة الأكثر استخداماً يطلق عليها التحقق المتقاطع س جزء/أجزاء (k-fold)، ويتم فيها تقسيم البيانات إلى k جزء، بحيث يتم تدريب النموذج باستخدام k−1 والتحقق باستخدام الجزء المتبقي، ويتم تكرار ذلك k مرة. يتم بعد ذلك حساب معدل الأخطاء في الأجزاء k ويسمى خطأ التحقق المتقاطع.
+
+<br>
+
+**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+ضبط (Regularization) - عمليه الضبط تهدف إلى تفادي فرط التخصيص (overfit) للنموذج، وهو بذلك يتعامل مع مشاكل التباين العالي. الجدول التالي يلخص أنواع وطرق الضبط الأكثر استخداماً:
+
+<br>
+
+**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+[يقلص المُعاملات إلى 0، جيد لاختيار المتغيرات، يجعل المُعاملات أصغر، المفاضلة بين اختيار المتغيرات والمُعاملات الصغيرة]
+
+<br>
+
+**35. Diagnostics**
+
+التشخيصات
+
+<br>
+
+**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
+
+الانحياز (Bias) - الانحياز للنموذج هو الفرق بين التنبؤ المتوقع والنموذج الحقيقي الذي نحاول تنبؤه للبيانات المعطاة.
+
+<br>
+
+**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
+
+التباين (Variance) - تباين النموذج هو مقدار التغير في تنبؤ النموذج لنقاط البيانات المعطاة.
+
+<br>
+
+**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
+
+موازنة الانحياز/التباين (Bias/variance tradeoff) - كلما زادت بساطة النموذج، زاد الانحياز، وكلما زاد تعقيد النموذج، زاد التباين.
+
+<br>
+
+**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
+
+[الأعراض، توضيح الانحدار، توضيح التصنيف، توضيح التعلم العميق، العلاجات الممكنة]
+
+<br>
+
+**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
+
+[خطأ التدريب عالي، خطأ التدريب قريب من خطأ الاختبار، انحياز عالي، خطأ التدريب أقل بقليل من خطأ الاختبار، خطأ التدريب منخفض جداً، خطأ التدريب أقل بكثير من خطأ الاختبار، تباين عالي]
+
+<br>
+
+**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
+
+[زيادة تعقيد النموذج، إضافة المزيد من الخصائص، تدريب لمدة أطول، إجراء الضبط (regularization)، الحصول على المزيد من البيانات]
+
+<br>
+
+**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
+
+تحليل الخطأ - تحليل الخطأ هو تحليل السبب الرئيسي للفرق في الأداء بين النماذج الحالية والنماذج المثالية.
+
+<br>
+
+**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
+
+تحليل استئصالي (Ablative analysis) - التحليل الاستئصالي هو تحليل السبب الرئيسي للفرق في الأداء بين النماذج الحالية والنماذج المبدئية (baseline).
+
+<br>
+
+**44. Regression metrics**
+
+مقاييس الانحدار
+
+<br>
+
+**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
+
+[مقاييس التصنيف، مصفوفة الدقّة، الضبط (accuracy)، الدقة (precision)، الاستدعاء (recall)، درجة F1]
+
+<br>
+
+**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
+
+[مقاييس الانحدار، مربع R، معيار معامل مالوس (Mallow's)، معيار آكياك المعلوماتي (AIC)، معيار المعلومات البايزي (BIC)]
+
+<br>
+
+**47. [Model selection, cross-validation, regularization]**
+
+[اختيار النموذج، التحقق المتقاطع، الضبط]
+
+<br>
+
+**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
+
+[التشخيصات، موازنة الانحياز/التباين، تحليل الخطأ/التحليل الاستئصالي]

From 3afcde4ed31931a92170f333a00715279c1624a8 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 21 Oct 2019 00:15:15 -0700
Subject: [PATCH 421/531] Restore template

---
 ...cs-229-machine-learning-tips-and-tricks.md | 97 ++++++++++---------
 1 file changed, 49 insertions(+), 48 deletions(-)

diff --git a/template/cs-229-machine-learning-tips-and-tricks.md b/template/cs-229-machine-learning-tips-and-tricks.md
index 1739e8ffb..edba03259 100644
--- a/template/cs-229-machine-learning-tips-and-tricks.md
+++ b/template/cs-229-machine-learning-tips-and-tricks.md
@@ -4,285 +4,286 @@
 
 **1. Machine Learning tips and tricks cheatsheet**
 
-مرجع سريع لنصائح وحيل تعلّم الآلة
+&#10230;
 
 <br>
 
 **2. Classification metrics**
 
-مقاييس التصنيف
+&#10230;
 
 <br>
 
 **3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
 
-في سياق التصنيف الثنائي، هذه المقاييس (metrics) المهمة التي يجدر مراقبتها من أجل تقييم آداء النموذج.
+&#10230;
 
 <br>
 
 **4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
 
-مصفوفة الدقّة (confusion matrix) - تستخدم مصفوفة الدقّة لأخذ تصور شامل عند تقييم أداء النموذج. وهي تعرّف كالتالي: 
+&#10230;
 
 <br>
 
 **5. [Predicted class, Actual class]**
 
-[التصنيف المتوقع، التصنيف الفعلي]
+&#10230;
 
 <br>
 
 **6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
 
-المقاييس الأساسية - المقاييس التالية تستخدم في العادة لتقييم أداء نماذج التصنيف:
+&#10230;
 
 <br>
 
 **7. [Metric, Formula, Interpretation]**
 
-[المقياس، المعادلة، التفسير]
+&#10230;
 
 <br>
 
 **8. Overall performance of model**
 
-الأداء العام للنموذج
+&#10230;
 
 <br>
 
 **9. How accurate the positive predictions are**
 
-دقّة التوقعات الإيجابية (positive)
+&#10230;
 
 <br>
 
 **10. Coverage of actual positive sample**
 
-تغطية عينات التوقعات الإيجابية الفعلية
+&#10230;
 
 <br>
 
 **11. Coverage of actual negative sample**
 
-تغطية عينات التوقعات السلبية الفعلية
+&#10230;
 
 <br>
 
 **12. Hybrid metric useful for unbalanced classes**
 
-مقياس هجين مفيد للأصناف غير المتوازنة (unbalanced)
+&#10230;
 
 <br>
 
 **13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
 
-منحنى دقّة الأداء (ROC) - منحنى دقّة الآداء، ويطلق عليه ROC، هو رسمة لمعدل التصنيفات الإيجابية الصحيحة (TPR) مقابل معدل التصنيفات الإيجابية الخاطئة (FPR) باستخدام قيم حد (threshold) متغيرة. هذه المقاييس ملخصة في الجدول التالي:
+&#10230;
+
 <br>
 
 **14. [Metric, Formula, Equivalent]**
 
-[المقياس، المعادلة، مرادف]
+&#10230;
 
 <br>
 
 **15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
 
-المساحة تحت منحنى دقة الأداء (المساحة تحت المنحنى) (AUC) - المساحة تحت منحنى دقة الأداء (المساحة تحت المنحنى)، ويطلق عليها  AUC أو AUROC، هي المساحة تحت ROC كما هو موضح في الرسمة التالية:
+&#10230;
 
 <br>
 
 **16. [Actual, Predicted]**
 
-[الفعلي، المتوقع]
+&#10230;
 
 <br>
 
 **17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
 
-المقاييس الأساسية - إذا كان لدينا نموذج الانحدار f، فإن المقاييس التالية غالباً ما تستخدم لتقييم أداء النموذج:
+&#10230;
 
 <br>
 
 **18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
 
-[المجموع الكلي للمربعات، مجموع المربعات المُفسَّر، مجموع المربعات المتبقي]
+&#10230;
 
 <br>
 
 **19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
 
-مُعامل التحديد (Coefficient of determination) - مُعامل التحديد، وغالباً يرمز له بـ R2 أو r2، يعطي قياس لمدى مطابقة النموذج للنتائج الملحوظة، ويعرف كما يلي:
+&#10230;
 
 <br>
 
 **20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
 
-المقاييس الرئيسية - المقاييس التالية تستخدم غالباً لتقييم أداء نماذج الانحدار، وذلك بأن يتم الأخذ في الحسبان عدد المتغيرات n المستخدمة فيها:
+&#10230;
 
 <br>
 
 **21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
 
-حيث L هو الأرجحية، و ˆσ2 تقدير التباين الخاص بكل نتيجة.
+&#10230;
 
 <br>
 
 **22. Model selection**
 
-اختيار النموذج
+&#10230;
 
 <br>
 
 **23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
 
-مفردات - عند اختيار النموذج، نفرق بين 3 أجزاء من البيانات التي لدينا كالتالي:
+&#10230;
 
 <br>
 
 **24. [Training set, Validation set, Testing set]**
 
-[مجموعة تدريب، مجموعة تحقق، مجموعة اختبار]
+&#10230;
 
 <br>
 
 **25. [Model is trained, Model is assessed, Model gives predictions]**
 
-[يتم تدريب النموذج، يتم تقييم النموذج، النموذج يعطي التوقعات]
+&#10230;
 
 <br>
 
 **26. [Usually 80% of the dataset, Usually 20% of the dataset]**
 
-[غالباً 80% من مجموعة البيانات، غالباً 20% من مجموعة البيانات]
+&#10230;
 
 <br>
 
 **27. [Also called hold-out or development set, Unseen data]**
 
-[يطلق عليها كذلك المجموعة المُجنّبة أو مجموعة التطوير، بيانات لم يسبق رؤيتها من قبل]
+&#10230;
 
 <br>
 
 **28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
 
-بمجرد اختيار النموذج، يتم تدريبه على مجموعة البيانات بالكامل ثم يتم اختباره على مجموعة اختبار لم يسبق رؤيتها من قبل. كما هو موضح في الشكل التالي:
+&#10230;
 
 <br>
 
 **29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
 
-التحقق المتقاطع (Cross-validation) - التحقق المتقاطع، وكذلك يختصر بـ CV، هو طريقة تستخدم لاختيار نموذج بحيث لا يعتمد بشكل كبير على مجموعة بيانات التدريب المبدأية. أنواع التحقق المتقاطع المختلفة ملخصة في الجدول التالي:
+&#10230;
 
 <br>
 
 **30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
 
-[التدريب على k-1 جزء والتقييم باستخدام الجزء الباقي، التدريب على n−p عينة والتقييم باستخدام الـ p عينات المتبقية]
+&#10230;
 
 <br>
 
 **31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
 
-[بشكل عام k=5 أو 10، الحالة p=1 يطلق عليها الإبقاء على واحد (leave-one-out)]
+&#10230;
 
 <br>
 
 **32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
 
-الطريقة الأكثر استخداماً يطلق عليها التحقق المتقاطع س جزء/أجزاء (k-fold)، ويتم فيها تقسيم البيانات إلى k جزء، بحيث يتم تدريب النموذج باستخدام k−1 والتحقق باستخدام الجزء المتبقي، ويتم تكرار ذلك k مرة. يتم بعد ذلك حساب معدل الأخطاء في الأجزاء k ويسمى خطأ التحقق المتقاطع.
+&#10230;
 
 <br>
 
 **33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
 
-ضبط (Regularization) - عمليه الضبط تهدف إلى تفادي فرط التخصيص (overfit) للنموذج، وهو بذلك يتعامل مع مشاكل التباين العالي. الجدول التالي يلخص أنواع وطرق الضبط الأكثر استخداماً:
+&#10230;
 
 <br>
 
 **34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
-[يقلص المُعاملات إلى 0، جيد لاختيار المتغيرات، يجعل المُعاملات أصغر، المفاضلة بين اختيار المتغيرات والمُعاملات الصغيرة]
+&#10230;
 
 <br>
 
 **35. Diagnostics**
 
-التشخيصات
+&#10230;
 
 <br>
 
 **36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
 
-الانحياز (Bias) - الانحياز للنموذج هو الفرق بين التنبؤ المتوقع والنموذج الحقيقي الذي نحاول تنبؤه للبيانات المعطاة.
+&#10230;
 
 <br>
 
 **37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
 
-التباين (Variance) - تباين النموذج هو مقدار التغير في تنبؤ النموذج لنقاط البيانات المعطاة.
+&#10230;
 
 <br>
 
 **38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
 
-موازنة الانحياز/التباين (Bias/variance tradeoff) - كلما زادت بساطة النموذج، زاد الانحياز، وكلما زاد تعقيد النموذج، زاد التباين.
+&#10230;
 
 <br>
 
 **39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
 
-[الأعراض، توضيح الانحدار، توضيح التصنيف، توضيح التعلم العميق، العلاجات الممكنة]
+&#10230;
 
 <br>
 
 **40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
 
-[خطأ التدريب عالي، خطأ التدريب قريب من خطأ الاختبار، انحياز عالي، خطأ التدريب أقل بقليل من خطأ الاختبار، خطأ التدريب منخفض جداً، خطأ التدريب أقل بكثير من خطأ الاختبار، تباين عالي]
+&#10230;
 
 <br>
 
 **41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
 
-[زيادة تعقيد النموذج، إضافة المزيد من الخصائص، تدريب لمدة أطول، إجراء الضبط (regularization)، الحصول على المزيد من البيانات]
+&#10230;
 
 <br>
 
 **42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
 
-تحليل الخطأ - تحليل الخطأ هو تحليل السبب الرئيسي للفرق في الأداء بين النماذج الحالية والنماذج المثالية.
+&#10230;
 
 <br>
 
 **43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
 
-تحليل استئصالي (Ablative analysis) - التحليل الاستئصالي هو تحليل السبب الرئيسي للفرق في الأداء بين النماذج الحالية والنماذج المبدئية (baseline).
+&#10230;
 
 <br>
 
 **44. Regression metrics**
 
-مقاييس الانحدار
+&#10230;
 
 <br>
 
 **45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
 
-[مقاييس التصنيف، مصفوفة الدقّة، الضبط (accuracy)، الدقة (precision)، الاستدعاء (recall)، درجة F1]
+&#10230;
 
 <br>
 
 **46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
 
-[مقاييس الانحدار، مربع R، معيار معامل مالوس (Mallow's)، معيار آكياك المعلوماتي (AIC)، معيار المعلومات البايزي (BIC)]
+&#10230;
 
 <br>
 
 **47. [Model selection, cross-validation, regularization]**
 
-[اختيار النموذج، التحقق المتقاطع، الضبط]
+&#10230;
 
 <br>
 
 **48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
 
-[التشخيصات، موازنة الانحياز/التباين، تحليل الخطأ/التحليل الاستئصالي]
+&#10230;

From 0c6dc5c15cd0fa93f30fcdfaa4e2e9c77cc44587 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 21 Oct 2019 00:18:23 -0700
Subject: [PATCH 422/531] Add contributors

---
 CONTRIBUTORS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index db83fa3d0..d269e4112 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -6,6 +6,12 @@
   Amjad Khatabi (review of linear algebra)
   Mazen Melibari (review of linear algebra)
   
+  Fares Al-Quaneier (translation of machine learning tips and tricks)
+  Zaid Alyafeai (review of machine learning tips and tricks)
+  
+  Fares Al-Quaneier (translation of supervised learning)
+  Zaid Alyafeai (review of supervised learning)
+  
 --de
 
 --es 

From 760d9bdc15ab21e5b06e7c80c6c56a077b813358 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Tue, 22 Oct 2019 23:40:45 +0900
Subject: [PATCH 423/531] vi translate for rnn

---
 vi/cs-230-recurrent-neural-networks.md | 677 +++++++++++++++++++++++++
 1 file changed, 677 insertions(+)
 create mode 100644 vi/cs-230-recurrent-neural-networks.md

diff --git a/vi/cs-230-recurrent-neural-networks.md b/vi/cs-230-recurrent-neural-networks.md
new file mode 100644
index 000000000..f576848a4
--- /dev/null
+++ b/vi/cs-230-recurrent-neural-networks.md
@@ -0,0 +1,677 @@
+**Recurrent Neural Networks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks)
+
+<br>
+
+**1. Recurrent Neural Networks cheatsheet**
+
+&#10230; Cheatsheet về mạng neural hồi quy
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Deep Learning
+
+<br>
+
+
+**3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
+
+&#10230; [Tổng quan, Kết cấu kiến trúc, Ứng dụng của RNNs, Hàm mất mát, Lan truyền ngược]
+
+<br>
+
+
+**4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
+
+&#10230; [Xử lí các phụ thuộc dài hạn, Các hàm kích hoạt phổ biến, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Các loại cổng, RNN hai chiều, RNN xâu]
+
+<br>
+
+
+**5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
+
+&#10230; [Học từ đại diện, Ký hiệu, Ma trận nhúng, Word2vec, Skip-gram, Lấy mẫu âm, GloVe]
+
+<br>
+
+
+**6. [Comparing words, Cosine similarity, t-SNE]**
+
+&#10230; [So sánh các từ, Độ tương đồng Cosine, t-SNE]
+
+<br>
+
+
+**7. [Language model, n-gram, Perplexity]**
+
+&#10230; [Hô hình ngôn ngữ, n-gram, Perplexity]
+
+<br>
+
+
+**8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
+
+&#10230; [Dịch máy, Tìm kiếm Beam, Chuẩn hoá độ dài, Phân tích lỗi, Bleu score]
+
+<br>
+
+
+**9. [Attention, Attention model, Attention weights]**
+
+&#10230; [Attention, Mô hình Attention, Trọng số Attention]
+
+<br>
+
+
+**10. Overview**
+
+&#10230; Tổng quan
+
+<br>
+
+
+**11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
+
+&#10230; Kiến trúc của một mạng RNN truyền thống - Các mạng neural hồi quy, còn được biến đến như là RNNs, là một lớp của mạng neural cho phép đầu ra của tầng trước được sử dụng như đầu vào của tầng kế tiếp khi có các trạng thái ẩn. Thông thường là như sau:
+
+<br>
+
+
+**12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
+
+&#10230; Tại mỗi bước t, hàm activation a<t> và đầu ra y<t> được biểu diễn như sau:
+
+<br>
+
+
+**13. and**
+
+&#10230; và
+
+<br>
+
+
+**14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
+
+&#10230; với Wax,Waa,Wya,ba,by là các hệ số được chia sẻ tạm thời và g1,g2 là các hàm kích hoạt.
+
+<br>
+
+
+**15. The pros and cons of a typical RNN architecture are summed up in the table below:**
+
+&#10230; pros và cons của một kiến trúc RNN thông thường được tổng kết ở bảng dưới đây:
+
+<br>
+
+
+**16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
+
+&#10230; [Ưu điểm, Khả năng xử lí đầu vào với bất kì độ dài nào, Kích cỡ mô hình không tăng theo kích cỡ đầu vào, Quá trình tính toán sử dụng các thông tin cũ, Trọng số được chia sẻ trong suốt thời gian]
+
+<br>
+
+
+**17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
+
+&#10230;
+
+<br>
+
+
+**18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**19. [Type of RNN, Illustration, Example]**
+
+&#10230;
+
+<br>
+
+
+**20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
+
+&#10230;
+
+<br>
+
+
+**21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
+
+&#10230;
+
+<br>
+
+
+**22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
+
+&#10230;
+
+<br>
+
+
+**23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
+
+&#10230;
+
+<br>
+
+
+**24. Handling long term dependencies**
+
+&#10230;
+
+<br>
+
+
+**25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
+
+&#10230;
+
+<br>
+
+
+**26. [Sigmoid, Tanh, RELU]**
+
+&#10230;
+
+<br>
+
+
+**27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
+
+&#10230;
+
+<br>
+
+
+**28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
+
+&#10230;
+
+<br>
+
+
+**29. clipped**
+
+&#10230;
+
+<br>
+
+
+**30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
+
+&#10230;
+
+<br>
+
+
+**31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**32. [Type of gate, Role, Used in]**
+
+&#10230;
+
+<br>
+
+
+**33. [Update gate, Relevance gate, Forget gate, Output gate]**
+
+&#10230;
+
+<br>
+
+
+**34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
+
+&#10230;
+
+<br>
+
+
+**35. [LSTM, GRU]**
+
+&#10230;
+
+<br>
+
+
+**36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
+
+&#10230;
+
+<br>
+
+
+**37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
+
+&#10230;
+
+<br>
+
+
+**38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
+
+&#10230;
+
+<br>
+
+
+**39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
+
+&#10230;
+
+<br>
+
+
+**40. [Bidirectional (BRNN), Deep (DRNN)]**
+
+&#10230;
+
+<br>
+
+
+**41. Learning word representation**
+
+&#10230;
+
+<br>
+
+
+**42. In this section, we note V the vocabulary and |V| its size.**
+
+&#10230;
+
+<br>
+
+
+**43. Motivation and notations**
+
+&#10230;
+
+<br>
+
+
+**44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**45. [1-hot representation, Word embedding]**
+
+&#10230;
+
+<br>
+
+
+**46. [teddy bear, book, soft]**
+
+&#10230;
+
+<br>
+
+
+**47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
+
+&#10230;
+
+<br>
+
+
+**48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
+
+&#10230;
+
+<br>
+
+
+**49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
+
+&#10230;
+
+<br>
+
+
+**50. Word embeddings**
+
+&#10230;
+
+<br>
+
+
+**51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
+
+&#10230;
+
+<br>
+
+
+**52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
+
+&#10230;
+
+<br>
+
+
+**53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
+
+&#10230;
+
+<br>
+
+
+**54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
+
+&#10230;
+
+<br>
+
+
+**55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
+
+&#10230;
+
+<br>
+
+
+**56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
+
+&#10230;
+
+<br>
+
+
+**57. Remark: this method is less computationally expensive than the skip-gram model.**
+
+&#10230;
+
+<br>
+
+
+**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
+
+&#10230;
+
+<br>
+
+
+**58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
+Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
+
+&#10230;
+
+<br>
+
+
+**59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
+
+&#10230;
+
+<br>
+
+
+**60. Comparing words**
+
+&#10230;
+
+<br>
+
+
+**61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
+
+&#10230;
+
+<br>
+
+
+**62. Remark: θ is the angle between words w1 and w2.**
+
+&#10230;
+
+<br>
+
+
+**63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
+
+&#10230;
+
+<br>
+
+
+**64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
+
+&#10230;
+
+<br>
+
+
+**65. Language model**
+
+&#10230;
+
+<br>
+
+
+**66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
+
+&#10230;
+
+<br>
+
+
+**67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
+
+&#10230;
+
+<br>
+
+
+**68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**69. Remark: PP is commonly used in t-SNE.**
+
+&#10230;
+
+<br>
+
+
+**70. Machine translation**
+
+&#10230;
+
+<br>
+
+
+**71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
+
+&#10230;
+
+<br>
+
+
+**72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
+
+&#10230;
+
+<br>
+
+
+**73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
+
+&#10230;
+
+<br>
+
+
+**74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
+
+&#10230;
+
+<br>
+
+
+**75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
+
+&#10230;
+
+<br>
+
+
+**76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
+
+&#10230;
+
+<br>
+
+
+**77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
+
+&#10230;
+
+<br>
+
+
+**78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
+
+&#10230;
+
+<br>
+
+
+**79. [Case, Root cause, Remedies]**
+
+&#10230;
+
+<br>
+
+
+**80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
+
+&#10230;
+
+<br>
+
+
+**81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**82. where pn is the bleu score on n-gram only defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
+
+&#10230;
+
+<br>
+
+
+**84. Attention**
+
+&#10230;
+
+<br>
+
+
+**85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
+
+&#10230;
+
+<br>
+
+
+**86. with**
+
+&#10230;
+
+<br>
+
+
+**87. Remark: the attention scores are commonly used in image captioning and machine translation.**
+
+&#10230;
+
+<br>
+
+
+**88. A cute teddy bear is reading Persian literature.**
+
+&#10230;
+
+<br>
+
+
+**89. Attention weight ― The amount of attention that the output y<t> should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**90. Remark: computation complexity is quadratic with respect to Tx.**
+
+&#10230;
+
+<br>
+
+
+**91. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+**92. Original authors**
+
+&#10230;
+
+<br>
+
+**93. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**94. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**95. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+**96. By X and Y**
+
+&#10230;
+
+<br>

From 48ad2f9cc2fddb27fe3aeebdf45b454ee4ed7d54 Mon Sep 17 00:00:00 2001
From: Pham Hong Vinh <phamvinh257@gmail.com>
Date: Wed, 23 Oct 2019 21:20:20 +0700
Subject: [PATCH 424/531] Finished translating. Draft version

---
 vi/cs-230-convolutional-neural-networks.md | 59 +++++++++++-----------
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/vi/cs-230-convolutional-neural-networks.md b/vi/cs-230-convolutional-neural-networks.md
index 1a79296c2..fae937a63 100644
--- a/vi/cs-230-convolutional-neural-networks.md
+++ b/vi/cs-230-convolutional-neural-networks.md
@@ -284,14 +284,14 @@
 
 **41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
 
-&#10230;
+&#10230; Trường thụ cảm (Receptive field) ― Trường thụ cảm tại tầng k là vùng được ký hiệu Rk×Rk của đầu vào mà những pixel của activation map thứ k có thể "nhìn thấy". Bằng cách gọi Fj là kích thước bộ lọc của tầng j và Si là giá trị độ trượt của tầng i và để thuận tiện, ta mặc định S0=1, trường thụ cảm của tầng k được tính toán bằng công thức:
 
 <br>
 
 
 **42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
 
-&#10230;
+&#10230; Trong ví dụ bên dưới, ta có F1=F2=3 và S1=S2=1, nên cho ra được R2=1+2⋅1+2⋅1=5.
 
 <br>
 
@@ -529,188 +529,189 @@
 
 **76. [Face verification, Face recognition, Query, Reference, Database]**
 
-&#10230;
+&#10230; [Xác nhận khuôn mặt, Nhận diện khuôn mặt, Truy vấn, Tham vấn, Cơ sở dữ liệu]
 
 <br>
 
 
 **77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
 
-&#10230;
+&#10230; [Có đúng người không?, Tra cứu một-một, Đây có phải là 1 trong K người trong cơ sở dữ liệu không?, Tra cứu một với tất cả]
 
 <br>
 
 
 **78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
 
-&#10230;
+&#10230; One Shot Learning ― One Shot Learning là một thuật toán xác minh khuôn mặt sử dụng một tập huấn luyện hạn chế để học một hàm similarity nhằm ước lượng sự khác nhau giữa hai tấm hình. Hàm này được áp dụng cho hai tấm ảnh thường được ký hiệu d(image 1,image 2).
 
 <br>
 
 
 **79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
 
-&#10230;
+&#10230; Siamese Network ― Siamese Networks hướng tới việc học cách mã hóa tấm ảnh để rồi định lượng sự khác nhau giữa hai tấm ảnh. Với một tấm ảnh đầu vào x(i), đầu ra được mã hóa thường được ký hiệu là f(x(i)).
 
 <br>
 
 
 **80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
 
-&#10230;
+&#10230; Triplet loss ― Triplet loss ℓ là một hàm mất mát được tính toán dựa trên biểu diễn nhúng của bộ ba hình ảnh A (mỏ neo), P (dương tính) và N(âm tính). Ảnh mỏ neo và ảnh dương tính đều thuộc một lớp, trong khi đó ảnh âm tính thuộc về một lớp khác. Bằng các gọi α∈R+ là tham số margin, hàm mất mát này được định nghĩa như sau:
 
 <br>
 
 
 **81. Neural style transfer**
 
-&#10230;
+&#10230; Neural style transfer
 
 <br>
 
 
 **82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
 
-&#10230;
+&#10230; Ý tưởng ― Mục tiêu của neural style transfer là tạo ra một ảnh G dựa trên một nội dung C và một phong cách S. 
 
 <br>
 
 
 **83. [Content C, Style S, Generated image G]**
 
-&#10230;
+&#10230; [Nội dung C, Phong cách S, Ảnh tạo được G]
 
 <br>
 
 
 **84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
 
-&#10230;
+&#10230; Tầng kích hoạt ― Trong một tầng l cho trước, tầng kích hoạt được ký hiệu a[l] và có các chiều là nH×nw×nc
 
 <br>
 
 
 **85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
 
-&#10230;
+&#10230; Hàm mất mát nội dung ― Hàm mất mát nội dung Jcontent(C,G) được sử dụng để xác định nội dung của ảnh được tạo G khác biệt với nội dung gốc trong ảnh C. Nó được định nghĩa như dưới đây:
 
 <br>
 
 
 **86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
 
-&#10230;
+&#10230; Ma trận phong cách ― Ma trận phong cách G[l] của một tầng cho trước l  là một ma trận Gram mà mỗi thành phần G[l]kk′ của ma trận xác định sự tương quan giữa kênh k và kênh k'. Nó được định nghĩa theo tầng kích hoạt a[l] như sau:
 
 <br>
 
 
 **87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
 
-&#10230;
+&#10230; Lưu ý: ma trận phong cách cho ảnh phong cách và ảnh được tạo được ký hiệu tương ứng là G[l] (S) và G[l] (G).
 
 <br>
 
 
 **88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
 
-&#10230;
+&#10230; Hàm mất mát phong cách ― Hàm mất mát phong cách Jstyle(S,G) được sử dụng để xác định sự khác biệt về phong cách giữa ảnh được tạo G và ảnh phong cách S. Nó được định nghĩa như sau:
 
 <br>
 
 
 **89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
 
-&#10230;
+&#10230; Hàm mất mát tổng quát ― Hàm mất mát tổng quát được định nghĩa là sự kết hợp của hàm mất mát nội dung và hàm mất mát phong cách, độ quan trọng của chúng được xác định bởi hai tham số α,β, như dưới đây:
 
 <br>
 
 
 **90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
 
-&#10230;
+&#10230; Lưu ý: giá trị của α càng lớn dẫn tới việc mô hình sẽ quan tâm hơn cho nội dung, trong khi đó, giá trị của β càng lớn sẽ khiến nó quan tâm hơn đến phong cách.
 
 <br>
 
 
 **91. Architectures using computational tricks**
 
-&#10230;
+&#10230; Những kiến trúc sử dụng computational tricks
 
 <br>
 
 
 **92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
 
-&#10230;
+
+&#10230; Generative Adversarial Network ― Generative adversarial networks, hay còn được gọi là GAN, là sự kết hợp giữa mô hình khởi tạo và mô hình phân biệt, khi mà mô hình khởi tạo cố gắng tạo ra hình ảnh đầu ra chân thực nhất, sau đó được đưa vô mô hình phân biệt, mà mục tiêu của nó là phân biệt giữa ảnh được tạo và ảnh thật.
 
 <br>
 
 
 **93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
 
-&#10230;
+&#10230; [Huấn luyện, Nhiễu, Ảnh thật, Mô hình khởi tạo, Mô hình phân biệt, Thật Giả]
 
 <br>
 
 
 **94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
 
-&#10230;
+&#10230; Lưu ý: có nhiều loại GAN khác nhau bao gồm từ văn bản thành ảnh, sinh nhạc và tổ hợp.
 
 <br>
 
 
 **95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
 
-&#10230;
+&#10230; ResNet ― Kiến trúc Residual Network (hay còn gọi là ResNet) sử dụng những khối residual (residual blocks) cùng với một lượng lớn các tầng để giảm lỗi huấn luyện. Những khối residual có những tính chất sau đây:
 
 <br>
 
 
 **96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
 
-&#10230;
+&#10230; Inception Network ― Kiến trúc này sử dụng những inception module và hướng tới việc thử các tầng tích chập khác nhau để tăng hiệu suất thông qua sự đa dạng của các feature. Cụ thể, kiến trúc này sử dụng thủ thuật tầng tích chập 1×1 để hạn chế gánh nặng tính toán. 
 
 <br>
 
 
 **97. The Deep Learning cheatsheets are now available in [target language].**
 
-&#10230;
+&#10230; Những cheatsheet về Deep Learning nay đã được dịch sang [target language].
 
 <br>
 
 
 **98. Original authors**
 
-&#10230;
+&#10230; Các tác giả
 
 <br>
 
 
 **99. Translated by X, Y and Z**
 
-&#10230;
+&#10230; Được dịch bởi X, Y và Z
 
 <br>
 
 
 **100. Reviewed by X, Y and Z**
 
-&#10230;
+&#10230; Xem qua bởi X, Y và Z
 
 <br>
 
 
 **101. View PDF version on GitHub**
 
-&#10230;
+&#10230; Xem bản PDF trên Github
 
 <br>
 
 
 **102. By X and Y**
 
-&#10230;
+&#10230; Bởi X và Y
 
 <br>

From 51026413526dbfaf78855e444db3762ce3d9298b Mon Sep 17 00:00:00 2001
From: Pham Hong Vinh <phamvinh257@gmail.com>
Date: Wed, 23 Oct 2019 21:26:33 +0700
Subject: [PATCH 425/531] Delete cs-230-convolutional-neural-networks.md.backup

Accidentally added backup file.
---
 ...30-convolutional-neural-networks.md.backup | 716 ------------------
 1 file changed, 716 deletions(-)
 delete mode 100644 vi/cs-230-convolutional-neural-networks.md.backup

diff --git a/vi/cs-230-convolutional-neural-networks.md.backup b/vi/cs-230-convolutional-neural-networks.md.backup
deleted file mode 100644
index 356534f3a..000000000
--- a/vi/cs-230-convolutional-neural-networks.md.backup
+++ /dev/null
@@ -1,716 +0,0 @@
-**Convolutional Neural Networks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks)
-
-<br>
-
-**1. Convolutional Neural Networks cheatsheet**
-
-&#10230;Convolutional Neural Networks cheatsheet
-
-<br>
-
-
-**2. CS 230 - Deep Learning**
-
-&#10230; CS 230 - Deep Learning
-
-<br>
-
-
-**3. [Overview, Architecture structure]**
-
-&#10230; [Tổng quan, Kết cấu kiến trúc]
-
-<br>
-
-
-**4. [Types of layer, Convolution, Pooling, Fully connected]**
-
-&#10230; [Các kiểu tầng (layer), Tích chập, Pooling, Kết nối đầy đủ]
-
-<br>
-
-
-**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
-
-&#10230; [Các siêu tham số của bộ lọc, Các chiều, Stride, Padding]
-
-<br>
-
-
-**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
-
-&#10230; [Điều chỉnh các siêu tham số, Độ tương thích tham số, Độ phức tạp mô hình, Receptive field]
-
-<br>
-
-
-**7. [Activation functions, Rectified Linear Unit, Softmax]**
-
-&#10230; [Các hàm kích hoạt, Rectified Linear Unit, Softmax]
-
-<br>
-
-
-**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
-
-&#10230; [Phát hiện vật thể, Các kiểu mô hình, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]
-
-<br>
-
-
-**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
-
-&#10230; [Nhận diện/ xác nhận gương mặt, One shot learning, Siamese network, Triplet loss]
-
-<br>
-
-
-**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
-
-&#10230; [Neural style transfer, Activation, Style matrix, Style/content cost function]
-
-<br>
-
-
-**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
-
-&#10230; [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]
-
-<br>
-
-
-**12. Overview**
-
-&#10230; Tổng quan
-
-<br>
-
-
-**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
-
-&#10230; Kiến trúc truyền thống của một mạng CNN ― Mạng neural tích chập (Convolutional neural networks), còn được biết đến với tên CNNs, là một dạng mạng neural được cấu thành bởi các tầng sau:
-
-<br>
-
-
-**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
-
-&#10230; Tầng tích chập và tầng pooling có thể được hiệu chỉnh theo các siêu tham số (hyperparameters) được mô tả ở những phần tiếp theo.
-
-<br>
-
-
-**15. Types of layer**
-
-&#10230; Các kiểu tầng
-
-<br>
-
-
-**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
-
-&#10230; Tầng tích chập (CONV) ― Tầng tích chập (CONV) sử dụng các bộ lọc để thực hiện phép tích chập khi đưa chúng đi qua đầu vào I theo các chiều của nó. Các siêu tham số của các bộ lọc này bao gồm kích thước bộ lọc F và độ trượt (stride) S. Kết quả đầu ra O được gọi là feature map hay activation map.
-
-<br>
-
-
-**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
-
-&#10230; Lưu ý: Bước tích chập cũng có thể được khái quát hóa cả với trường hợp một chiều (1D) và ba chiều (3D).
-
-<br>
-
-
-**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
-
-&#10230; Pooling (POOL) ― Tầng pooling (POOL) là một phép downsampling, thường được sử dụng sau tầng tích chập, giúp tăng tính bất biến không gian. Cụ thể, max pooling và average pooling là những dạng pooling đặc biệt, mà tương ứng là trong đó giá trị lớn nhất và giá trị trung bình được lấy ra.
-
-<br>
-
-
-**19. [Type, Purpose, Illustration, Comments]**
-
-&#10230; [Kiểu, Chức năng, Minh họa, Nhận xét]
-
-<br>
-
-
-**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
-
-&#10230; [Max pooling, Average pooling, Từng phép pooling chọn giá trị lớn nhất trong khu vực mà nó đang được áp dụng, Từng phép pooling tính trung bình các giá trị trong khu vực mà nó đang được áp dụng]
-
-<br>
-
-
-**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
-
-&#10230; [Bảo toàn các đặc trưng đã phát hiện, Được sử dụng thường xuyên, Giảm kích thước feature map, Được sử dụng trong mạng LeNet]
-
-<br>
-
-
-**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
-
-&#10230;  Fully Connected (FC) ― Tầng kết nối đầy đủ (FC) nhận đầu vào là các dữ liệu đã được làm phẳng, mà mỗi đầu vào đó được kết nối đến tất cả neuron. Trong mô hình mạng CNNs, các tầng kết nối đầy đủ thường được tìm thấy ở cuối mạng và được dùng để tối ưu hóa mục tiêu của mạng ví dụ như độ chính xác của lớp (class).
-
-<br>
-
-
-**23. Filter hyperparameters**
-
-&#10230; Các siêu tham số của bộ lọc
-
-<br>
-
-
-**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
-
-&#10230; Tầng tích chập chứa các bộ lọc mà rất quan trọng cho ta khi biết ý nghĩa đằng sau các siêu tham số của chúng.
-
-<br>
-
-
-**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
-
-&#10230; Các chiều của một bộ lọc ― Một bộ lọc kích thước F×F áp dụng lên đầu vào chứa C kênh (channels) thì có kích thước tổng kể là F×F×C thực hiện phép tích chập trên đầu vào kích thước I×I×C và cho ra một  feature map (hay còn gọi là activation map) có kích thước O×O×1.
-
-<br>
-
-
-**26. Filter**
-
-&#10230; Bộ lọc
-
-<br>
-
-
-**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
-
-&#10230; Lưu ý: Việc áp dụng K bộ lọc có kích thước F×F cho ra một feature map có kích thước O×O×K.
-
-<br>
-
-
-**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
-
-&#10230; Stride ― Đối với phép tích chập hoặc phép pooling, độ trượt S ký hiệu số pixel mà cửa sổ sẽ di chuyển sau mỗi lần thực hiện phép tính.
-
-<br>
-
-
-**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
-
-&#10230;  Zero-padding ― Zero-padding là tên gọi của quá trình thêm P số không vào các biên của đầu vào. Giá trị này có thể được lựa chọn thủ công hoặc một cách tự động bằng một trong ba những phương pháp mô tả bên dưới:
-
-<br>
-
-
-**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
-
-&#10230; [Phương pháp, Giá trị, Mục đích, Valid, Same, Full]
-
-<br>
-
-
-**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
-
-&#10230; [Không sử dụng padding, Bỏ phép tích chập cuối nếu số chiều không khớp, Sử dụng padding để làm cho feature map có kích  thước ⌈IS⌉, Kích thước đầu ra thuận lợi về mặt toán học, Còn được gọi là 'half' padding, Padding tối đa sao cho các phép tích chập có thể được sử dụng tại các rìa của đầu vào, Bộ lọc 'thấy' được đầu vào từ đầu đến cuối]
-
-<br>
-
-
-**32. Tuning hyperparameters**
-
-&#10230; Điều chỉnh siêu tham số
-
-<br>
-
-
-**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
-
-&#10230; Tính tương thích của tham số trong tầng tích chập ― Bằng cách ký hiệu I là độ dài kích thước đầu vào, F là độ dài của bộ lọc, P là số lượng zero padding, S là độ trượt, ta có thể tính được độ dài O của feature map theo một chiều bằng công thức:
-
-<br>
-
-
-**34. [Input, Filter, Output]**
-
-&#10230; [Đầu vào, Bộ lọc, Đầu ra]
-
-<br>
-
-
-**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
-
-&#10230; Lưu ý: Trong một số trường hợp, Pstart=Pend≜P, ta có thể thay thế Pstart+Pend bằng 2P trong công thức trên.
-
-<br>
-
-
-**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
-
-&#10230; Hiểu về độ phức tạp của mô hình ― Để đánh giá độ phức tạp của một mô hình, cách hữu hiệu là xác định số tham số mà mô hình đó sẽ có. Trong một tầng của mạng neural tích chập, nó sẽ được tính toán như sau:
-
-<br>
-
-
-**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
-
-&#10230; [Minh họa, Kích thước đầu vào, Kích thước đầu ra, Số lượng tham số, Lưu ý]
-
-<br>
-
-
-**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
-
-&#10230; [Một tham số bias với mỗi bộ lọc, Trong đa số trường hợp, S<F, Một lựa chọn phổ biến cho K là 2C]
-
-<br>
-
-
-**39. [Pooling operation done channel-wise, In most cases, S=F]**
-
-&#10230; [Phép pooling được áp dụng lên từng kênh (channel-wise), Trong đa số trường hợp, S=F]
-
-<br>
-
-
-**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
-
-&#10230; [Đầu vào được làm phẳng, Mỗi neuron có một tham số bias, Số neuron trong một tầng FC phụ thuộc vào ràng buộc kết cấu]
-
-<br>
-
-
-**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
-
-&#10230;
-
-<br>
-
-
-**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
-
-&#10230;
-
-<br>
-
-
-**43. Commonly used activation functions**
-
-&#10230; Các hàm kích hoạt thường gặp
-
-<br>
-
-
-**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
-
-&#10230; Rectified Linear Unit ― Tầng rectified linear unit (ReLU) là một hàm kích hoạt g  được sử dụng trên tất cả các thành phần. Mục đích của nó là tăng tính phi tuyến tính cho mạng. Những biến thể khác của ReLU được tổng hợp ở bảng dưới:
-
-<br>
-
-
-**45. [ReLU, Leaky ReLU, ELU, with]**
-
-&#10230; [ReLU, Leaky ReLU, ELU, with]
-
-<br>
-
-
-**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
-
-&#10230; [Độ phức tạp phi tuyến tính có thể thông dịch được về mặt sinh học, Gán vấn đề ReLU chết cho những giá trị âm, Khả vi tại mọi nơi]
-
-<br>
-
-
-**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
-
-&#10230; Softmax ― Bước softmax có thể được coi là một hàm logistic tổng quát lấy đầu vào là một vector chứa các giá trị x∈Rn và cho ra là một vector gồm các xác suất p∈Rn thông qua một hàm softmax ở cuối kiến trúc. Nó được định nghĩa như sau:
-
-<br>
-
-
-**48. where**
-
-&#10230; với
-
-<br>
-
-
-**49. Object detection**
-
-&#10230; Phát hiện vật thể (Object detection)
-
-<br>
-
-
-**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
-
-&#10230; Các kiểu mô hình ― Có 3 kiểu thuật toán nhận diện vật thể chính, vì thế mà bản chất của thứ được dự đoán sẽ khác nhau. Chúng được miêu tả ở bảng dưới:
-
-<br>
-
-
-**51. [Image classification, Classification w. localization, Detection]**
-
-&#10230; [Phân loại hình ảnh, Phân loại cùng với khoanh vùng, Phát hiện]
-
-<br>
-
-
-**52. [Teddy bear, Book]**
-
-&#10230; [Gấu bông, Sách]
-
-<br>
-
-
-**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
-
-&#10230; [Phân loại một tấm ảnh, Dự đoán xác suất của một vật thể, Phát hiện một vật thể trong ảnh, Dự đoán xác suất của vật thể và định vị nó, Phát hiện nhiều vật thể trong cùng một tấm ảnh, Dự đoán xác suất của các vật thể và định vị chúng]
-
-<br>
-
-
-**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
-
-&#10230; [CNN cổ điển, YOLO đơn giản hóa, R-CNN, YOLO, R-CNN]
-
-<br>
-
-
-**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
-
-&#10230; Detection ― Trong bối cảnh phát hiện vật thể, những phương pháp khác nhau được áp dụng tùy thuộc vào liệu chúng ta chỉ muốn định vị vật thể hay phát hiện được những hình dạng phức tạp hơn trong tấm ảnh. Hai phương pháp chính được tổng hợp ở bảng dưới: 
-
-<br>
-
-
-**56. [Bounding box detection, Landmark detection]**
-
-&#10230;
-
-<br>
-
-
-**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
-
-&#10230;
-
-<br>
-
-
-**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
-
-&#10230;
-
-<br>
-
-
-**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
-
-&#10230;
-
-<br>
-
-
-**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
-
-&#10230;
-
-<br>
-
-
-**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
-
-&#10230;
-
-<br>
-
-
-**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
-
-&#10230;
-
-<br>
-
-
-**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
-
-&#10230;
-
-<br>
-
-
-**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
-
-&#10230;
-
-<br>
-
-
-**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
-
-&#10230;
-
-<br>
-
-
-**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
-
-&#10230;
-
-<br>
-
-
-**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
-
-&#10230;
-
-<br>
-
-
-**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
-
-&#10230;
-
-<br>
-
-
-**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
-
-&#10230;
-
-<br>
-
-
-**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
-
-&#10230;
-
-<br>
-
-
-**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
-
-&#10230;
-
-<br>
-
-
-**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
-
-&#10230;
-
-<br>
-
-
-**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
-
-&#10230;
-
-<br>
-
-
-**74. Face verification and recognition**
-
-&#10230;
-
-<br>
-
-
-**75. Types of models ― Two main types of model are summed up in table below:**
-
-&#10230;
-
-<br>
-
-
-**76. [Face verification, Face recognition, Query, Reference, Database]**
-
-&#10230;
-
-<br>
-
-
-**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
-
-&#10230;
-
-<br>
-
-
-**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
-
-&#10230;
-
-<br>
-
-
-**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
-
-&#10230;
-
-<br>
-
-
-**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**81. Neural style transfer**
-
-&#10230;
-
-<br>
-
-
-**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
-
-&#10230;
-
-<br>
-
-
-**83. [Content C, Style S, Generated image G]**
-
-&#10230;
-
-<br>
-
-
-**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
-
-&#10230;
-
-<br>
-
-
-**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
-
-&#10230;
-
-<br>
-
-
-**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
-
-&#10230;
-
-<br>
-
-
-**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-
-**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
-
-&#10230;
-
-<br>
-
-
-**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
-
-&#10230;
-
-<br>
-
-
-**91. Architectures using computational tricks**
-
-&#10230;
-
-<br>
-
-
-**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
-
-&#10230;
-
-<br>
-
-
-**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
-
-&#10230;
-
-<br>
-
-
-**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
-
-&#10230;
-
-<br>
-
-
-**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
-
-&#10230;
-
-<br>
-
-
-**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
-
-&#10230;
-
-<br>
-
-
-**97. The Deep Learning cheatsheets are now available in [target language].**
-
-&#10230;
-
-<br>
-
-
-**98. Original authors**
-
-&#10230;
-
-<br>
-
-
-**99. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-
-**100. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-
-**101. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-
-**102. By X and Y**
-
-&#10230;
-
-<br>

From 612234d7bbb17c2dd1f57431a6dab70db336deda Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Wed, 23 Oct 2019 23:33:45 +0900
Subject: [PATCH 426/531] vi translate for rnn

---
 vi/cs-230-recurrent-neural-networks.md | 32 +++++++++++++-------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/vi/cs-230-recurrent-neural-networks.md b/vi/cs-230-recurrent-neural-networks.md
index f576848a4..be073d058 100644
--- a/vi/cs-230-recurrent-neural-networks.md
+++ b/vi/cs-230-recurrent-neural-networks.md
@@ -116,112 +116,112 @@
 
 **17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
 
-&#10230;
+&#10230; [Hạn chế, Tính toán chậm, Khó để truy cập các thông tin từ một khoảng thời gian dài trước đây, Không thể xem xét bất kì đầu vào sau này nào cho trạng thái hiện tại]
 
 <br>
 
 
 **18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
 
-&#10230;
+&#10230; Ứng dụng của RNNs - Các mô hình RNN hầu như được sử dụng trong lĩnh vực xử lí ngôn ngữ tự nhiên và ghi nhận tiếng nói. Các ứng dụng khác được tổng kết trong bảng dưới đây:
 
 <br>
 
 
 **19. [Type of RNN, Illustration, Example]**
 
-&#10230;
+&#10230; [Các loại RNN, Hình minh hoạ, Ví dụ]
 
 <br>
 
 
 **20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
 
-&#10230;
+&#10230; [Một-Một, Một-nhiều, Nhiều-một, Nhiều-nhiều]
 
 <br>
 
 
 **21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
 
-&#10230;
+&#10230; [Mạng neural truyền thống, Sinh nhạc, Phân loại ý kiến, Ghi nhận thực thể tên, Dịch máy]
 
 <br>
 
 
 **22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
 
-&#10230;
+&#10230; Hàm mất mát - Trong trường hợp của mạng neural hồi quy, hàm mất mát L của tất cả các bước thời gian được định nghĩa dựa theo mất mát ở mọi thời điểm như sau:
 
 <br>
 
 
 **23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
 
-&#10230;
+&#10230; Lan truyền ngược theo thời gian - Lan truyền ngược được hoàn thành ở mỗi một thời điểm cụ thể. Ở bước T, đạo hàm của hàm mất mát L với ma trận trọng số W được biểu diễn như sau:
 
 <br>
 
 
 **24. Handling long term dependencies**
 
-&#10230;
+&#10230; Xử lí phụ thuộc dài hạn
 
 <br>
 
 
 **25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
 
-&#10230;
+&#10230; Các hàm kích hoạt thường dùng - Các hàm kích hoạt thường dùng trong các modules RNN được miêu tả như sau:
 
 <br>
 
 
 **26. [Sigmoid, Tanh, RELU]**
 
-&#10230;
+&#10230; [Sigmoid, Tanh, RELU]
 
 <br>
 
 
 **27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
 
-&#10230;
+&#10230; Vanishing/exploding gradient - Hiện tượng vanishing và exploding gradient thường gặp trong ngữ cảnh của RNNs. Lí do tại sao chúng thường xảy ra đó là khó để có được sự phụ thuộc dài hạn vì multiplicative gradient có thể tăng/giảm theo hàm mũ tương ứng với số lượng các tầng.
 
 <br>
 
 
 **28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
 
-&#10230;
+&#10230; Gradient clipping - Là một kĩ thuật được sử dụng để giải quyết vấn đề exploding gradient xảy ra khi thực hiện lan truyền ngược. Bằng việc giới hạn giá trị lớn nhất cho gradient, hiện tượng này sẽ được kiểm soát trong thực tế.
 
 <br>
 
 
 **29. clipped**
 
-&#10230;
+&#10230; clipped
 
 <br>
 
 
 **30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
 
-&#10230;
+&#10230; Các loại cổng - Để giải quyết vấn đề vanishing gradient, các cổng cụ thể được sử dụng trong một vài loại RNNs và thường có mục đích rõ ràng. Chúng thường được kí hiệu là Γ và bằng với:
 
 <br>
 
 
 **31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
 
-&#10230;
+&#10230; Với W, U, b là các hệ số của một cổng và σ là hàm sigmoid. Các loại chính được tổng kết ở bảng dưới đây:
 
 <br>
 
 
 **32. [Type of gate, Role, Used in]**
 
-&#10230;
+&#10230; [Loại cổng, Vai trò, Được sử dụng trong]
 
 <br>
 

From 99ee93a9de99ed6745b29ad3cd61d4542703bb71 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Mon, 28 Oct 2019 22:26:49 +0900
Subject: [PATCH 427/531] vi translate for rnn

---
 vi/cs-230-recurrent-neural-networks.md | 62 +++++++++++++-------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/vi/cs-230-recurrent-neural-networks.md b/vi/cs-230-recurrent-neural-networks.md
index be073d058..ba0014794 100644
--- a/vi/cs-230-recurrent-neural-networks.md
+++ b/vi/cs-230-recurrent-neural-networks.md
@@ -228,182 +228,182 @@
 
 **33. [Update gate, Relevance gate, Forget gate, Output gate]**
 
-&#10230;
+&#10230; [Cổng cập nhật, Cổng relevance, Cổng quên, Cổng ra]
 
 <br>
 
 
 **34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
 
-&#10230;
+&#10230; [Dữ liệu cũ nên có tầm quan trọng như thế nào ở hiện tại?, Bỏ qua thông tin phía trước?, Xoá ô hay không xoá?, Biểu thị một ô ở mức độ bao nhiêu?]
 
 <br>
 
 
 **35. [LSTM, GRU]**
 
-&#10230;
+&#10230; [LSTM, GRU]
 
 <br>
 
 
 **36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
 
-&#10230;
+&#10230; GRU/LSTM ― Gated Recurrent Unit (GRU) và Các đơn vị bộ nhớ dài-ngắn hạn (LSTM) đối phó với vấn đề vanishing gradient khi gặp phải bằng mạng RNNs truyền thống, với LSTM là sự tổng quát của GRU. Phía dưới là bảng tổng kết các phương trình đặc trưng của mỗi kiến trúc: 
 
 <br>
 
 
 **37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
 
-&#10230;
+&#10230; [Đặc tính, Gated Recurrent Unit (GRU), Bộ nhớ dài-ngắn hạn (LSTM), Các phụ thuộc]
 
 <br>
 
 
 **38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
 
-&#10230;
+&#10230; Chú ý: kí hiệu ⋆ chỉ phép nhân nguyên tố giữa hai vectors.
 
 <br>
 
 
 **39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
 
-&#10230;
+&#10230; Các biến thể của RNNs - Bảng dưới đây tổng kết các kiến trúc thường được sử dụng khác của RNN:
 
 <br>
 
 
 **40. [Bidirectional (BRNN), Deep (DRNN)]**
 
-&#10230;
+&#10230; [RNN hai chiều (Bidirectional - BRNN), RNN sâu (Deep - DRNN)]
 
 <br>
 
 
 **41. Learning word representation**
 
-&#10230;
+&#10230; Học thể hiện từ
 
 <br>
 
 
 **42. In this section, we note V the vocabulary and |V| its size.**
 
-&#10230;
+&#10230; Trong phần này, chúng ta kí hiệu V là tập từ vựng và |V| là kích cỡ của nó.
 
 <br>
 
 
 **43. Motivation and notations**
 
-&#10230;
+&#10230; Giải thích và các kí hiệu
 
 <br>
 
 
 **44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
 
-&#10230;
+&#10230; Các kĩ thuật biểu diễn - Có hai cách chính của biểu diễn các từ được tổng kết ở bảng bên dưới:
 
 <br>
 
 
 **45. [1-hot representation, Word embedding]**
 
-&#10230;
+&#10230; [Biểu diễn 1-hot, Word embedding]
 
 <br>
 
 
 **46. [teddy bear, book, soft]**
 
-&#10230;
+&#10230; [gấu bông, sách, mềm]
 
 <br>
 
 
 **47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
 
-&#10230;
+&#10230; [Lưu ý ow, Tiếp cận Naive, không có thông tin chung, Lưu ý ew, Xem xét độ tương đồng của các từ]
 
 <br>
 
 
 **48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
 
-&#10230;
+&#10230; Embedding matrix - Cho một từ w, embedding matrix E là một ma trận tham chiếu thể hiện 1-hot ow của nó với embedding ew của nó như sau:
 
 <br>
 
 
 **49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
 
-&#10230;
+&#10230; Chú ý: học embedding matrix có thể hoàn thành bằng cách sử dụng các mô hình target/context likelihood.
 
 <br>
 
 
 **50. Word embeddings**
 
-&#10230;
+&#10230; Word embeddings
 
 <br>
 
 
 **51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
 
-&#10230;
+&#10230; Word2vec - Word2vec là một framework tập trung vào việc học word embeddings bằng cách ước lượng khả năng mà một từ cho trước được bao quanh bởi các từ khác. Các mô hình phổ biến bao gồm skip-gram, negative sampling và CBOW.
 
 <br>
 
 
 **52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
 
-&#10230;
+&#10230; [Một chú gấu bông dễ thương đang đọc sách, gấu bông teddy, soft, thơ Persian, hội hoạ]
 
 <br>
 
 
 **53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
 
-&#10230;
+&#10230; [Huấn luyện mạng trên proxy task, Bóc tách các thể hiện cấp cao, Tính toán word embeddings]
 
 <br>
 
 
 **54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
 
-&#10230;
+&#10230; Skip-gram - Mô hình skip-gram word2vec là một task supervised learning, nó học các word embeddings bằng cách đánh giá khả năng của bất kì target word t cho trước nào xảy ra với context word c. Bằng việc kí hiệu θt là tham số đi kèm với t, xác suất P(t|c) được tính như sau:
 
 <br>
 
 
 **55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
 
-&#10230;
+&#10230; Chú ý: Cộng tổng tất cả các từ vựng trong mẫu số của phần softmax khiến mô hình này tốn nhiều chi phí tính toán. CBOW là một mô hình word2vec khác sử dụng các từ xung quanh để dự đoán một từ cho trước.
 
 <br>
 
 
 **56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
 
-&#10230;
+&#10230; Negative sampling ― Nó là một tập của các bộ phân loại nhị phân sử dụng logistic regressions với mục tiêu là đánh giá khả năng mà một ngữ cảnh cho trước và các target words cho trước có thể xuất hiện đồng thời, với các mô hình đang được huấn luyện trên các tập của k negative examples và 1 positive example. Cho trước context word c và target word t, dự đoán được thể hiện bởi:
 
 <br>
 
 
 **57. Remark: this method is less computationally expensive than the skip-gram model.**
 
-&#10230;
+&#10230; Chú ý: phương thức này tốn ít chi phí tính toán hơn mô hình skip-gram.
 
 <br>
 
 
 **57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
 
-&#10230;
+&#10230; GloVe - Mô hình GloVe
 
 <br>
 
@@ -648,30 +648,30 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **92. Original authors**
 
-&#10230;
+&#10230; Tác giả
 
 <br>
 
 **93. Translated by X, Y and Z**
 
-&#10230;
+&#10230; Dịch bởi X, Y và Z
 
 <br>
 
 **94. Reviewed by X, Y and Z**
 
-&#10230;
+&#10230; Reviewed bởi X, Y và Z
 
 <br>
 
 **95. View PDF version on GitHub**
 
-&#10230;
+&#10230; Xem bản PDF trên GibHub
 
 <br>
 
 **96. By X and Y**
 
-&#10230;
+&#10230; Bởi X và Y
 
 <br>

From a3c95b4963326b5397a8aa9561740308e2f20a4a Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Mon, 28 Oct 2019 22:29:12 +0900
Subject: [PATCH 428/531] vi cheatsheet supervised learning

---
 vi/cheatsheet-supervised-learning.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/vi/cheatsheet-supervised-learning.md b/vi/cheatsheet-supervised-learning.md
index f86e01c00..33ce30d81 100644
--- a/vi/cheatsheet-supervised-learning.md
+++ b/vi/cheatsheet-supervised-learning.md
@@ -30,7 +30,7 @@
 
 **6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
 
-&#10230; [Tiếp diễn, Lớp, Hồi quy tuyến tính, Hồi quy Logistic, SVM, Naive Bayes]
+&#10230; [Liên tục, Lớp, Hồi quy tuyến tính, Hồi quy Logistic, SVM, Naive Bayes]
 
 <br>
 
@@ -48,7 +48,7 @@
 
 **9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
 
-&#10230; [Ước lượng trực tiếp P(y|x), Ước lượng P(x|y) để tiếp tục suy luận P(y|x), Biên quyết địnhđịnh, Phân bố xác suất của dữ liệu, Hồi quy, SVMs, GDA, Naive Bayes]
+&#10230; [Ước lượng trực tiếp P(y|x), Ước lượng P(x|y) để tiếp tục suy luận P(y|x), Biên quyết định, Phân bố xác suất của dữ liệu, Hồi quy, SVMs, GDA, Naive Bayes]
 
 <br>
 
@@ -372,7 +372,7 @@
 
 **63. Tree-based and ensemble methods**
 
-&#10230; Phương thức Tree-based và toàn thể
+&#10230; Phương thức Tree-based và ensemble
 
 <br>
 

From f3b416aea2fd5e58284425f6d691ae2cc85d58e5 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Mon, 28 Oct 2019 23:05:50 +0900
Subject: [PATCH 429/531] vi translate for rnn

---
 vi/cs-230-recurrent-neural-networks.md | 28 +++++++++++++-------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/vi/cs-230-recurrent-neural-networks.md b/vi/cs-230-recurrent-neural-networks.md
index ba0014794..75bfba294 100644
--- a/vi/cs-230-recurrent-neural-networks.md
+++ b/vi/cs-230-recurrent-neural-networks.md
@@ -403,7 +403,7 @@
 
 **57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
 
-&#10230; GloVe - Mô hình GloVe
+&#10230; GloVe - Mô hình GloVe, viết tắt của global vectors for word representation, nó là một kĩ thuật word embedding sử dụng ma trận đồng thời X với mỗi Xi,j là số lần mà target i xảy ra tại ngữ cảnh j. Cost function J của nó như sau:
 
 <br>
 
@@ -411,63 +411,63 @@
 **58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
 Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
 
-&#10230;
+&#10230; f là hàm trong số với Xi,j=0⟹f(Xi,j)=0. Với tính đối xứng mà e và θ có được trong mô hình này, word embedding cuối cùng e(final)w được định nghĩa như sau:
 
 <br>
 
 
 **59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
 
-&#10230;
+&#10230; Chú ý: Các phần tử riêng của các word embedding học được không nhất thiết là phải thông dịch được.
 
 <br>
 
 
 **60. Comparing words**
 
-&#10230;
+&#10230; So sánh các từ
 
 <br>
 
 
 **61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
 
-&#10230;
+&#10230; Độ tương đồng Cosine - Độ tương đồng cosine giữa các từ w1 và w2 được trình bày như sau:
 
 <br>
 
 
 **62. Remark: θ is the angle between words w1 and w2.**
 
-&#10230;
+&#10230; Chú ý: θ là góc giữa các từ w1 và w2.
 
 <br>
 
 
 **63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
 
-&#10230;
+&#10230; t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) là một kĩ thuật nhằm giảm đi số chiều của không gian embedding. Trong thực tế, nó thường được sử dụng để trực quan hoá các word vectors trong không gian 2 chiều (2D).
 
 <br>
 
 
 **64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
 
-&#10230;
+&#10230; [văn học, nghệ thuật, sách, văn hoá, thơ, đọc, hiểu biết, giải trí, ngôn tình, thiếu nhi, loại, gấu teddy, mềm, ôm, dễ thương, đáng mến]
 
 <br>
 
 
 **65. Language model**
 
-&#10230;
+&#10230; Mô hình ngôn ngữ
 
 <br>
 
 
 **66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
 
-&#10230;
+&#10230; Tổng quan - Một mô hình ngôn ngữ sẽ dự đoán xác suất của một câu P(y).
 
 <br>
 
@@ -495,7 +495,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **70. Machine translation**
 
-&#10230;
+&#10230; Dịch máy
 
 <br>
 
@@ -593,7 +593,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **84. Attention**
 
-&#10230;
+&#10230; Chú ý
 
 <br>
 
@@ -607,7 +607,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **86. with**
 
-&#10230;
+&#10230; với
 
 <br>
 
@@ -642,7 +642,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **91. The Deep Learning cheatsheets are now available in [target language].**
 
-&#10230;
+&#10230; Deep Learning cheatsheets hiện đã có bản dịch [tiếng việt].
 
 <br>
 

From 7de7c1b42bf396908d85f48db03d18f3dc906467 Mon Sep 17 00:00:00 2001
From: Hiroki <xxyy_z@pb3.so-net.ne.jp>
Date: Tue, 29 Oct 2019 13:56:00 +0900
Subject: [PATCH 430/531] Fixed some Japanese style letters around math area

Some areas don't need to be used Japanese letters.
The other areas need to be used Japanese letters and spaces.
---
 ja/cs-229-linear-algebra.md | 66 ++++++++++++++++++-------------------
 1 file changed, 33 insertions(+), 33 deletions(-)

diff --git a/ja/cs-229-linear-algebra.md b/ja/cs-229-linear-algebra.md
index 4dcf78e4b..c806cb4ca 100644
--- a/ja/cs-229-linear-algebra.md
+++ b/ja/cs-229-linear-algebra.md
@@ -19,19 +19,19 @@
 **4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
 
 &#10230;
-ベクトル - x∈Rn は n個の要素を持つベクトルを表し、xi∈Rはi番目の要素を表します。
+ベクトル - x∈Rn はn個の要素を持つベクトルを表し、xi∈R はi番目の要素を表します。
 <br>
 
 **5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
 
 &#10230;
-行列 - m行n列の行列をA∈Rm×nと表記し、Ai、j∈Rは i行目のj列目の要素を指します。
+行列 - m行n列の行列を A∈Rm×n と表記し、Ai,j∈R はi行目のj列目の要素を指します。
 <br>
 
 **6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
 
 &#10230;
-備考：上記で定義されたベクトルxはn×1の行列と見なすことができ、列ベクトルと呼ばれます。
+備考：上記で定義されたベクトル x は n×1 の行列と見なすことができ、列ベクトルと呼ばれます。
 <br>
 
 **7. Main matrices**
@@ -43,25 +43,25 @@
 **8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
 
 &#10230;
-単位行列 - 単位行列I∈Rn×nは、対角成分に 1 が並び、他は全て 0 となる正方行列です。
+単位行列 - 単位行列 I∈Rn×n は、対角成分に 1 が並び、他は全て 0 となる正方行列です。
 <br>
 
 **9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
 
 &#10230;
-備考：すべての行列A∈Rn×nに対して、A×I = I×A = Aとなります。
+備考：すべての行列 A∈Rn×n に対して、A×I=I×A=A となります。
 <br>
 
 **10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
 
 &#10230;
-対角行列 - 対角行列D∈Rn×nは、対角成分の値がゼロ以外で、それ以外はゼロである正方行列です。
+対角行列 - 対角行列 D∈Rn×n は、対角成分の値が 0 以外で、それ以外は 0 である正方行列です。
 <br>
 
 **11. Remark: we also note D as diag(d1,...,dn).**
 
 &#10230;
-備考：Dをdiag（d 1、...、d n）とも表記します。
+備考：Dをdiag(d1,...,dn) とも表記します。
 <br>
 
 **12. Matrix operations**
@@ -85,37 +85,37 @@
 **15. inner product: for x,y∈Rn, we have:**
 
 &#10230;
-内積: x、y∈Rnに対して、内積の定義は下記の通りです:
+内積: x,y∈Rn に対して、内積の定義は下記の通りです:
 <br>
 
 **16. outer product: for x∈Rm,y∈Rn, we have:**
 
 &#10230;
-外積: x∈Rm,y∈Rnに対して、外積の定義は下記の通りです:
+外積: x∈Rm,y∈Rn に対して、外積の定義は下記の通りです:
 <br>
 
 **17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
 
 &#10230;
-行列-ベクトル - 行列A∈Rm×nとベクトルx∈Rnの積は以下の条件を満たすようなサイズRnのベクトルです。
+行列-ベクトル - 行列 A∈Rm×n とベクトル x∈Rn の積は以下の条件を満たすようなサイズ Rn のベクトルです。
 <br>
 
 **18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
 
 &#10230;
-上記 aTr、iはAの行ベクトルで、ac、jはAの列ベクトルです。 xiはxの要素です。
+上記 aTr,i は A の行ベクトルで、ac,j は A の列ベクトルです。 xi は x の要素です。
 <br>
 
 **19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
 
 &#10230;
-行列-行列 - 行列A∈Rm×nとB∈Rn×pの積は以下の条件を満たすようなサイズRm×pの行列です。 (There is a typo in the original: Rn×p)
+行列-行列 - 行列 A∈Rm×n と B∈Rn×p の積は以下の条件を満たすようなサイズ Rm×p の行列です。 (There is a typo in the original: Rn×p)
 <br>
 
 **20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
 
 &#10230;
-aTr,i、bTr,iはAとBの行ベクトルで　ac,j、bc,jはAとBの列ベクトルです。
+aTr,i,bTr,i は A と B の行ベクトルで　ac,j,bc,j は A と B の列ベクトルです。
 <br>
 
 **21. Other operations**
@@ -127,50 +127,50 @@ aTr,i、bTr,iはAとBの行ベクトルで　ac,j、bc,jはAとBの列ベクト
 **22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
 
 &#10230;
-転置 ― A∈Rm×nの転置行列はATと表記し、Aの行列要素が交換した行列です。
+転置 ― A∈Rm×n の転置行列は AT と表記し、A の行列要素が交換した行列です。
 <br>
 
 **23. Remark: for matrices A,B, we have (AB)T=BTAT**
 
 &#10230;
-備考： 行列AとBの場合、（AB）T = BTAT** となります。
+備考： 行列AとBの場合、(AB)T=BTAT** となります。
 <br>
 
 **24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
 
 &#10230;
-逆行列 ― 可逆正方行列Ａの逆行列はＡ − １と表記し、 以下の条件を満たす唯一の行列です。
+逆行列 ― 可逆正方行列 A の逆行列は A-1 と表記し、 以下の条件を満たす唯一の行列です。
 <br>
 
 **25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
 
 &#10230;
-備考： すべての正方行列が可逆とは限りません。　行列A、Bについては、(AB)−1=B−1A−1
+備考： すべての正方行列が可逆とは限りません。　行列 A,B については、(AB)−1=B−1A−1
 <br>
 
 **26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
 
 &#10230;
-跡 - 正方行列Aの跡は、tr(A)と表記し、その対角成分の要素の和です。
+跡 - 正方行列 A の跡は、tr(A) と表記し、その対角成分の要素の和です。
 <br>
 
 **27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
 
 &#10230;
-備考： 行列A、Bの場合：　tr(AT)=tr(A)とtr(AB)=tr(BA)となります。
+備考： 行列 A,B の場合：　tr(AT)=tr(A) と tr(AB)=tr(BA) となります。
 <br>
 
 **28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
 
 &#10230;
-行列式 ― 正方行列A∈Rn×nの行列式は|A| または det(A) と表記し、以下のように i番目の行とj番目の列を抜いたA, Aijによって再帰的に表現されます。
+行列式 ― 正方行列 A∈Rn×n の行列式は |A| または det(A) と表記し、以下のように i番目の行とj番目の列を抜いた行列A、Aij によって再帰的に表現されます。
  それはi番目の行とj番目の列のない行列Aです。 次のように：
 <br>
 
 **29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
 
 &#10230;
-備考： |A|≠0の場合に限り、行列は可逆行列です。また|AB|=|A||B| と |AT|=|A|。
+備考： |A|≠0の場合に限り、行列は可逆行列です。また |AB|=|A||B| と |AT|=|A|。
 <br>
 
 **30. Matrix properties**
@@ -200,7 +200,7 @@ aTr,i、bTr,iはAとBの行ベクトルで　ac,j、bc,jはAとBの列ベクト
 **34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
 
 &#10230;
-ノルムは関数N:V⟶[0,+∞[　Vはすべてのx、y∈Vに対して、以下の条件を満たすようなベクトル空間です。
+ノルムは関数N:V⟶[0,+∞[　Vはすべての x,y∈V に対して、以下の条件を満たすようなベクトル空間です。
 ]]
 <br>
 
@@ -243,31 +243,31 @@ x∈Vに対して、最も多用されているノルムは、以下の表にま
 **41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
 
 &#10230;
-行列の階数　―　行列Aの階数は rank（A）と表記し、列空間の次元を表します。これは、Aの線形独立の列の最大数に相当します。
+行列の階数　―　行列Aの階数は rank(A) と表記し、列空間の次元を表します。これは、Aの線形独立の列の最大数に相当します。
 <br>
 
 **42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
 
 &#10230;
-半正定値行列 ― 行列 A, A∈Rn×nに対して、以下の式が成り立つならば、 Aを半正定値(PSD)といい、A⪰0と表記します。
+半正定値行列 ― 行列A、A∈Rn×nに対して、以下の式が成り立つならば、 Aを半正定値(PSD)といい、A⪰0 と表記します。
 <br>
 
 **43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
 
 &#10230;
-備考：　同様に、全ての非ゼロベクトルx, xTAx>0に対して条件を満たすような行列Aは正定値行列といい、A≻0と表記します。
+備考：　同様に、全ての非ゼロベクトルx、xTAx>0 に対して条件を満たすような行列Aは正定値行列といい、A≻0 と表記します。
 <br>
 
 **44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
 &#10230;
-固有値、固有ベクトル　―　行列 A, A∈Rn×nに対して、以下の条件を満たすようなベクトルz, z∈Rn∖{0}が存在するならば、λは固有値といい、z は固有ベクトルといいます。
+固有値、固有ベクトル　―　行列A、A∈Rn×n に対して、以下の条件を満たすようなベクトルz、z∈Rn∖{0} が存在するならば、λ は固有値といい、z は固有ベクトルといいます。
 <br>
 
 **45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
 &#10230;
-スペクトル定理 ― A∈Rn×nとします。　Aが対称ならば、Aは実直交行列U∈Rn×nによって対角化可能です。Λ=diag(λ1,...,λn)と表記すると、次のように表現できます。
+スペクトル定理 ― A∈Rn×n とします。A が対称ならば、A は実直交行列 U∈Rn×n によって対角化可能です。Λ=diag(λ1,...,λn) と表記すると、次のように表現できます。
 <br>
 
 **46. diagonal**
@@ -279,7 +279,7 @@ x∈Vに対して、最も多用されているノルムは、以下の表にま
 **47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
 
 &#10230;
-特異値分解 ― Aをm×nの行列とします。特異値分解（SVD）は、ユニタリ行列Ｕ ｍ×ｍ、Σ ｍ×ｎの対角行列、およびユニタリ行列Ｖ ｎ×ｎの存在を保証する因数分解手法で、以下の条件を満たします。
+特異値分解 ― A を m×n の行列とします。特異値分解（SVD）は、ユニタリ行列 U m×m、Σ m×n の対角行列、およびユニタリ行列 V n×n の存在を保証する因数分解手法で、以下の条件を満たします。
 <br>
 
 **48. Matrix calculus**
@@ -291,31 +291,31 @@ x∈Vに対して、最も多用されているノルムは、以下の表にま
 **49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
 
 &#10230;
-勾配 ― f:Rm×n→Rを関数とし、A∈Rm×nを行列とします。 Aに対するfの勾配はm×n行列で、∇Af（A）と表記し、次の条件を満たします。
+勾配 ― f:Rm×n→R を関数とし、A∈Rm×n を行列とします。 A に対する f の勾配は m×n 行列で、∇Af(A) と表記し、次の条件を満たします。
 <br>
 
 **50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
 
 &#10230;
-備考：　fの勾配は、fがスカラーを返す関数であるときに限り存在します。
+備考：　f の勾配は、f がスカラーを返す関数であるときに限り存在します。
 <br>
 
 **51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
 
 &#10230;
-ヘッセ行列 ― f：Rn→Rを関数とし、x∈Rnをベクトルとします。 xに対するfのヘッセ行列は、n×n対称行列で∇2xf（x）と表記し、以下の条件を満たします。
+ヘッセ行列 ― f:Rn→R を関数とし、x∈Rn をベクトルとします。 x に対する f のヘッセ行列は、n×n 対称行列で ∇2xf(x) と表記し、以下の条件を満たします。
 <br>
 
 **52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
 
 &#10230;
-備考：　fのヘッセ行列は、fがスカラーを返す関数である場合に限り存在します。
+備考：　f のヘッセ行列は、f がスカラーを返す関数である場合に限り存在します。
 <br>
 
 **53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
 
 &#10230;
-勾配演算 ― 行列A、B、Cの場合、特に以下の勾配の性質を意識する甲斐があります。
+勾配演算 ― 行列 A,B,C の場合、特に以下の勾配の性質を意識する甲斐があります。
 <br>
 
 **54. [General notations, Definitions, Main matrices]**

From 4da977e6b6dd0dd645bf74e83bd699abbde18ced Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 28 Oct 2019 23:11:11 -0700
Subject: [PATCH 431/531] Add [ja] contributors

---
 CONTRIBUTORS | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index db83fa3d0..370f6d67a 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -90,7 +90,7 @@
   
   Wooil Jeong (translation of probabilities and statistics)
   
-  Kwang Hyeok Ahn (translation of Unsupervised Learning)
+  Kwang Hyeok Ahn (translation of unsupervised learning)
 
 --ja
   Tran Tuan Anh (translation of convolutional neural networks)
@@ -112,7 +112,12 @@
   
   Yuta Kanzawa (translation of supervised learning)
   Tran Tuan Anh (review of supervised learning)
-
+  
+  Tran Tuan Anh (translation of unsupervised learning)
+  Yoshiyuki Nakai (review of unsupervised learning)
+  Yuta Kanzawa (review of unsupervised learning)
+  Dan Lillrank (review of unsupervised learning)
+  
 --pt
   Leticia Portella (translation of convolutional neural networks)
   Gabriel Aparecido Fonseca (review of convolutional neural networks)

From e32728cbf36875af7b99486a05ae742bd1832e37 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 28 Oct 2019 23:12:05 -0700
Subject: [PATCH 432/531] Update [ja] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index ba299d551..0e6dfaf73 100644
--- a/README.md
+++ b/README.md
@@ -66,7 +66,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
 |**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/173)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
+|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
 |**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|
 |**Português**|done|done|done|done|done|done|

From a4f044be322cb4159de6ca519ce7442c747a2023 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 28 Oct 2019 23:13:34 -0700
Subject: [PATCH 433/531] Rename cheatsheet-unsupervised-learning.md to
 cs-229-unsupervised-learning.md

---
 ...t-unsupervised-learning.md => cs-229-unsupervised-learning.md} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename ja/{cheatsheet-unsupervised-learning.md => cs-229-unsupervised-learning.md} (100%)

diff --git a/ja/cheatsheet-unsupervised-learning.md b/ja/cs-229-unsupervised-learning.md
similarity index 100%
rename from ja/cheatsheet-unsupervised-learning.md
rename to ja/cs-229-unsupervised-learning.md

From 318c9b1ee744c0e99f70e97a88a35d4be72cacf5 Mon Sep 17 00:00:00 2001
From: Hiroki <taniokah@gmail.com>
Date: Tue, 29 Oct 2019 19:04:53 +0900
Subject: [PATCH 434/531] Revised a Japanese sentence in iff.

"if and only if" means iff but Japanese sentence is a little bit confusing.
---
 ja/cs-229-probability.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/cs-229-probability.md b/ja/cs-229-probability.md
index 0d575ec38..46a1b5552 100644
--- a/ja/cs-229-probability.md
+++ b/ja/cs-229-probability.md
@@ -102,7 +102,7 @@
 
 **18. Independence ― Two events A and B are independent if and only if we have:**
 
-&#10230;独立性 - 次が成り立つ場合かつその場合に限り、2つの事象AとBは独立であるといいます:
+&#10230;独立性 - 次が成り立ちかつその場合に限り（必要十分）、2つの事象AとBは独立であるといいます:
 
 <br>
 

From cec917edf9e54028af2ef56173c326b6575cc6b0 Mon Sep 17 00:00:00 2001
From: Hiroki <taniokah@gmail.com>
Date: Tue, 29 Oct 2019 21:36:07 +0900
Subject: [PATCH 435/531] Omitted an unnecessary comma

An unnecessary comma was remaining before Japanese punctuation.
---
 ja/cs-229-probability.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ja/cs-229-probability.md b/ja/cs-229-probability.md
index 0d575ec38..cc17c38d1 100644
--- a/ja/cs-229-probability.md
+++ b/ja/cs-229-probability.md
@@ -222,7 +222,7 @@
 
 **38. [Case, Marginal density, Cumulative function]**
 
-&#10230;[種類,、周辺密度、累積関数]
+&#10230;[種類、周辺密度、累積関数]
 
 <br>
 

From f58b98a66f3082f04d9daf9570d863d6316fb36f Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Tue, 29 Oct 2019 23:09:42 +0900
Subject: [PATCH 436/531] vi translate for rnn

---
 vi/cs-230-recurrent-neural-networks.md | 42 +++++++++++++-------------
 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/vi/cs-230-recurrent-neural-networks.md b/vi/cs-230-recurrent-neural-networks.md
index 75bfba294..1de42c45d 100644
--- a/vi/cs-230-recurrent-neural-networks.md
+++ b/vi/cs-230-recurrent-neural-networks.md
@@ -474,21 +474,21 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
 
-&#10230;
+&#10230; Mô hình n-gram - Mô hình này là cách tiếp cận naive với mục đích định lượng xác suất mà một biểu hiện xuất hiện trong văn bản bằng cách đếm số lần xuất hiện của nó trong tập dữ liệu huấn luyện.
 
 <br>
 
 
 **68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
 
-&#10230;
+&#10230; Độ hỗn tạp - Các mô hình ngôn ngữ thường được đánh giá dựa theo độ đo hỗ tạp, cũng được biết đến là PP, có thể được hiểu như là nghịch đảo xác suất của tập dữ liệu được chuẩn hoá bởi số lượng các từ T. Độ hỗn tạp càng thấp thì càng tốt và được định nghĩa như sau:
 
 <br>
 
 
 **69. Remark: PP is commonly used in t-SNE.**
 
-&#10230;
+&#10230; Chú ý: PP thường được sử dụng trong t-SNE.
 
 <br>
 
@@ -502,91 +502,91 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
 
-&#10230;
+&#10230; Tổng quan - Một mô hình dịch máy tương tự với mô hình ngôn ngữ ngoại trừ nó có một mạng encoder được đặt phía trước. Vì lí do này, đôi khi nó còn được biết đến là mô hình ngôn ngữ có điều kiện. Mục tiêu là tìm một câu văn y như sau:
 
 <br>
 
 
 **72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
 
-&#10230;
+&#10230; Tìm kiếm Beam - Nó là một giải thuật tìm kiếm heuristic được sử dụng trong dịch máy và ghi nhận tiếng nói để tìm câu văn y đúng nhất tương ứng với đầu vào x.
 
 <br>
 
 
 **73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
 
-&#10230;
+&#10230; [Bước 1: Tìm top B các từ y<1>, Bước 2: Tính xác suất có điều kiện y<k>|x,y<1>,...,y<k-1>, Bước 3: Giữ top B các tổ hợp x,y<1>,...,y<k>, Kết thúc quá trình xử lí bằng một từ dừng]
 
 <br>
 
 
 **74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
 
-&#10230;
+&#10230; Chú ý: nếu độ rộng của beam được thiết lập là 1, thì nó tương đương với tìm kiếm tham lam naive. 
 
 <br>
 
 
 **75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
 
-&#10230;
+&#10230; Độ rộng Beam - Độ rộng beam B là một tham số của giải thuật tìm kiếm beam. Các giá trị lớn của B tạo ra kết quả tốt hơn nhưng với hiệu năng thấp hơn và lượng bộ nhớ sử dụng sẽ tăng.
 
 <br>
 
 
 **76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
 
-&#10230;
+&#10230; Chuẩn hoá độ dài - Đến cải thiện tính ổn định, beam search thường được áp dụng mục tiêu chuẩn hoá sau, thường được gọi là mục tiêu chuẩn hoá log-likelihood, được định nghĩa như sau:
 
 <br>
 
 
 **77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
 
-&#10230;
+&#10230; Chú ý: tham số α có thể được xem như là softener, và giá trị của nó thường nằm trong đoạn 0.5 và 1.
 
 <br>
 
 
 **78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
 
-&#10230;
+&#10230; Phân tích lỗi - Khi có được một bản dịch tồi ˆy, chúng ta có thể tự hỏi rằng tại sao chúng ta không có được một kết quả dịch tốt y∗ bằng việc thực hiện việc phân tích lỗi như sau:
 
 <br>
 
 
 **79. [Case, Root cause, Remedies]**
 
-&#10230;
+&#10230; [Trường hợp, Nguyên nhân xâu xa, Biện pháp khắc phục]
 
 <br>
 
 
 **80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
 
-&#10230;
+&#10230; [Lỗi Beam search, lỗi RNN, Tăng beam width, Thử kiến trúc khác, Chính quy, Lấy nhiều dữ liệu hơn]
 
 <br>
 
 
 **81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
 
-&#10230;
+&#10230; Điểm Bleu - Bilingual evaluation understudy (bleu) score định lượng mức độ tốt của dịch máy bằng cách tính một độ tương đồng dựa trên dự đoán n-gram. Nó được định nghĩa như sau:
 
 <br>
 
 
 **82. where pn is the bleu score on n-gram only defined as follows:**
 
-&#10230;
+&#10230; với pn là bleu score chỉ trên n-gram được định nghĩa như sau:
 
 <br>
 
 
 **83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
 
-&#10230;
+&#10230; Chú ý: một mức phạt ngắn có thể được áp dụng với các dự đoán dịch ngắn để tránh việc làm thổi phồng giá trị bleu score.
 
 <br>
 
@@ -600,7 +600,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
 
-&#10230;
+&#10230; Attention model - Mô hình này cho phép một RNN chú ý lên các phần cụ thể của đầu vào được xem xét là quan trọng, nó giúp cải thiện hiệu năng của mô hình kết quả trong thực tế. Bằng việc kí hiệu α<t,t′> là mức độ chú ý mà đầu ra y<t> nên có đối với hàm kích hoạt a<t′> và c<t> là ngữ cảnh ở thời điểm t, chúng ta có:
 
 <br>
 
@@ -614,28 +614,28 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **87. Remark: the attention scores are commonly used in image captioning and machine translation.**
 
-&#10230;
+&#10230; Chú ý: Các attention scores thường được sử dụng trong chú thích ảnh và dịch máy.
 
 <br>
 
 
 **88. A cute teddy bear is reading Persian literature.**
 
-&#10230;
+&#10230; Một chú gấu bông dễ thương đang đọc bài văn Persian.
 
 <br>
 
 
 **89. Attention weight ― The amount of attention that the output y<t> should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
 
-&#10230;
+&#10230; Attention weight - Sự chú ý mà đầu ra y<t> nên có với hàm kích hoạt a<t′> với α<t,t′> được tính như sau:
 
 <br>
 
 
 **90. Remark: computation complexity is quadratic with respect to Tx.**
 
-&#10230;
+&#10230; Chú ý: độ phức tạp tính toán là một phương trình bậc hai đối với Tx.
 
 <br>
 

From f12799ea17d24ac2b6822331e4cd8a3310e3dda6 Mon Sep 17 00:00:00 2001
From: Hiroki <taniokah@gmail.com>
Date: Wed, 30 Oct 2019 07:58:05 +0900
Subject: [PATCH 437/531] Some expressions revised in Japanese style

Some expressions including comma, punctuation and wording are changed.
---
 ja/cs-229-supervised-learning.md | 50 ++++++++++++++++----------------
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/ja/cs-229-supervised-learning.md b/ja/cs-229-supervised-learning.md
index b5d6896fb..71f63afdd 100644
--- a/ja/cs-229-supervised-learning.md
+++ b/ja/cs-229-supervised-learning.md
@@ -12,7 +12,7 @@
 
 **3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
 
-&#10230;入力が{x(1),...,x(m)}, 出力が{y(1),...,y(m)}であるとき, xからyを予測する分類器を構築したい。
+&#10230;入力が{x(1),...,x(m)}、出力が{y(1),...,y(m)}であるとき、xからyを予測する分類器を構築したい。
 
 <br>
 
@@ -24,13 +24,13 @@
 
 **5. [Regression, Classifier, Outcome, Examples]**
 
-&#10230;回帰, 分類, 出力, 例
+&#10230;回帰、分類、出力、例
 
 <br>
 
 **6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
 
-&#10230;連続値, クラス, 線形回帰, ロジスティック回帰, SVM, ナイーブベイズ
+&#10230;連続値、クラス、線形回帰、ロジスティック回帰、SVM、ナイーブベイズ
 
 <br>
 
@@ -42,13 +42,13 @@
 
 **8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
 
-&#10230;判別モデル, 生成モデル, 目的, 学習対象, イメージ図, 例
+&#10230;判別モデル、生成モデル、目的、学習対象、イメージ図、例
 
 <br>
 
 **9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
 
-&#10230;P(y|x)を直接推定する, P(y|x)を推測するためにP(x|y)を推定する, 決定境界, データの確率分布, 回帰, SVM, GDA, ナイーブベイズ
+&#10230;P(y|x)の直接推定、後にP(y|x)を推測するためのP(x|y)の推定、決定境界、データの確率分布、回帰、SVM、GDA、ナイーブベイズ
 
 <br>
 
@@ -72,13 +72,13 @@
 
 **13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
 
-&#10230;最小2乗誤差, ロジスティック損失, ヒンジ損失, クロスエントロピー
+&#10230;最小2乗誤差、ロジスティック損失、ヒンジ損失、交差エントロピー
 
 <br>
 
 **14. [Linear regression, Logistic regression, SVM, Neural Network]**
 
-&#10230;線形回帰, ロジスティック回帰, SVM, ニューラルネットワーク
+&#10230;線形回帰、ロジスティック回帰、SVM、ニューラルネットワーク
 
 <br>
 
@@ -198,13 +198,13 @@
 
 **34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
 
-&#10230;指数分布族 ― 正準パラメータまたはリンク関数とも呼ばれる自然パラメータη、十分統計量T(y)及び対数分配関数a(η)を用いて、次のように表すことのできる一群の分布は指数分布族と呼ばれる：
+&#10230;指数分布族 ― ある分布の集合は指数分布族と呼ばれ、正準パラメータまたはリンク関数とも呼ばれる自然パラメータη、十分統計量T(y)及び対数分配関数a(η)を用いて、次のように表される：
 
 <br>
 
 **35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
 
-&#10230;備考：T(y)=yとすることが多い。また、exp(−a(η))は確率の合計が１になることを担保する正規化定数だと見なせる。
+&#10230;備考：T(y)=yとすることが多い。また、exp(−a(η))は確率の合計が1になることを保証する正規化定数と見なせる。
 
 <br>
 
@@ -216,7 +216,7 @@
 
 **37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
 
-&#10230;分布, ベルヌーイ, ガウス, ポワソン, 幾何
+&#10230;分布、ベルヌーイ、ガウス、ポワソン、幾何
 
 <br>
 
@@ -294,7 +294,7 @@
 
 **50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
 
-&#10230;非線形分離問題, カーネル写像の適用, 元の空間における決定境界
+&#10230;非線形分離問題、カーネル写像の適用、元の空間における決定境界
 
 <br>
 
@@ -396,7 +396,7 @@
 
 **67. Remark: random forests are a type of ensemble methods.**
 
-&#10230;備考：ランダムフォレストはアンサンブル学習の1種である。
+&#10230;備考：ランダムフォレストはアンサンブル学習の一種である。
 
 <br>
 
@@ -408,7 +408,7 @@
 
 **69. [Adaptive boosting, Gradient boosting]**
 
-&#10230;[適応的ブースティング, 勾配ブースティング]
+&#10230;[適応的ブースティング、勾配ブースティング]
 
 <br>
 
@@ -426,13 +426,13 @@
 
 **72. Other non-parametric approaches**
 
-&#10230;他のノン・パラメトリックな手法
+&#10230;他のノンパラメトリックな手法
 
 <br>
 
 **73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
 
-&#10230;k近傍法 ― k近傍法は、一般的にk-NNとして知られ、あるデータ点の応答はそのk個の最近傍点の性質によって決まるノン・パラメトリックな手法である。分類と回帰の両方に用いることができる。
+&#10230;k近傍法 ― k近傍法は、一般的にk-NNとして知られ、あるデータ点の応答はそのk個の最近傍点の性質によって決まるノンパラメトリックな手法である。分類と回帰の両方に用いることができる。
 
 <br>
 
@@ -468,7 +468,7 @@
 
 **79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
 
-&#10230;学習誤差 ― ある分類器hに対して、学習誤差、あるいは経験損失か経験誤差としても知られる、ˆϵ(h)を次のように定義する：
+&#10230;学習誤差 ― ある分類器hに対して、学習誤差、あるいは経験損失か経験誤差としても知られるˆϵ(h)を次のように定義する：
 
 <br>
 
@@ -498,7 +498,7 @@
 
 **84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
 
-&#10230;上界定理 ― Hを|H|=kで有限の仮説集合とし、δとサンプルサイズmは定数とする。そのとき、少なくとも1-δ の確率で次が成り立つ：
+&#10230;上界定理 ― Hを|H|=kで有限の仮説集合とし、δとサンプルサイズmは定数とする。そのとき、少なくとも1-δの確率で次が成り立つ：
 
 <br>
 
@@ -522,13 +522,13 @@
 
 **88. [Introduction, Type of prediction, Type of model]**
 
-&#10230;[導入, 予測の種類, モデルの種類]
+&#10230;[導入、予測の種類、モデルの種類]
 
 <br>
 
 **89. [Notations and general concepts, loss function, gradient descent, likelihood]**
 
-&#10230;[記法と全般的な概念, 損失関数, 勾配降下, 尤度]
+&#10230;[記法と全般的な概念、損失関数、勾配降下、尤度]
 
 <br>
 
@@ -536,32 +536,32 @@
 
 &#10230;
 
-<br>[線形モデル, 線形回帰, ロジスティック回帰, 一般化線形モデル]
+<br>[線形モデル、線形回帰、ロジスティック回帰、一般化線形モデル]
 
 **91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
 
 &#10230;
 
-<br>[サポートベクターマシン, 最適マージン分類器, ヒンジ損失, カーネル]
+<br>[サポートベクターマシン、最適マージン分類器、ヒンジ損失、カーネル]
 
 **92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
 
 &#10230;
 
-<br>[生成学習, ガウシアン判別分析, ナイーブベイズ]
+<br>[生成学習、ガウシアン判別分析、ナイーブベイズ]
 
 **93. [Trees and ensemble methods, CART, Random forest, Boosting]**
 
-&#10230;[ツリーとアンサンブル学習, CART, ランダムフォレスト, ブースティング]
+&#10230;[ツリーとアンサンブル学習、CART、ランダムフォレスト、ブースティング]
 
 <br>
 
 **94. [Other methods, k-NN]**
 
-&#10230;[他の手法, k近傍法]
+&#10230;[他の手法、k近傍法]
 
 <br>
 
 **95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
 
-&#10230;[学習理論, ヘフディング不等式, PAC, VC次元]
+&#10230;[学習理論、ヘフディング不等式、PAC、VC次元]

From 9d75d753a127339911770460a543cf9cca9fc7f9 Mon Sep 17 00:00:00 2001
From: Hiroki <taniokah@gmail.com>
Date: Wed, 30 Oct 2019 12:25:00 +0900
Subject: [PATCH 438/531] Some expressions revised in Japanese style

Some expressions including comma, punctuation and wording are changed.
---
 ja/cs-229-unsupervised-learning.md | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/ja/cs-229-unsupervised-learning.md b/ja/cs-229-unsupervised-learning.md
index 917433d45..cc8111e7c 100644
--- a/ja/cs-229-unsupervised-learning.md
+++ b/ja/cs-229-unsupervised-learning.md
@@ -42,13 +42,13 @@
 
 **8. [Setting, Latent variable z, Comments]**
 
-&#10230;[設定, 潜在変数z, コメント]
+&#10230;[設定、潜在変数z、コメント]
 
 <br>
 
 **9. [Mixture of k Gaussians, Factor analysis]**
 
-&#10230;[k個のガウス分布の混, 因子分析]
+&#10230;[k個のガウス分布の混合、因子分析]
 
 <br>
 
@@ -72,7 +72,7 @@
 
 **13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
 
-&#10230;[ガウス分布初期化, 期待値ステップ, 最大化ステップ, 収束]
+&#10230;[ガウス分布初期化、期待値ステップ、最大化ステップ、収束]
 
 <br>
 
@@ -96,7 +96,7 @@
 
 **17. [Means initialization, Cluster assignment, Means update, Convergence]**
 
-&#10230; [平均の初期化, クラスター割り当て,平均の更新, 収束]
+&#10230; [平均の初期化、クラスター割り当て、平均の更新、収束]
 
 <br>
 
@@ -126,7 +126,7 @@
 
 **22. [Ward linkage, Average linkage, Complete linkage]**
 
-&#10230; [ウォードリンケージ, 平均リンケージ, 完全リンケージ]
+&#10230; [ウォードリンケージ、平均リンケージ、完全リンケージ]
 
 <br>
 
@@ -144,7 +144,7 @@
 
 **25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
 
-&#10230; 教師なし学習では、教師あり学習の場合のような正解ラベルがないため、モデルの性能を評価することが困難な場合が多いです。
+&#10230; 教師なし学習では、教師あり学習の場合のような正解ラベルがないため、モデルの性能を評価することが困難な場合が多くあります。
 
 <br>
 
@@ -162,7 +162,7 @@
 
 **28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
 
-&#10230; Calinski-Harabazインデックスs(k)はクラスタリングモデルが各クラスタをどの程度適切に定義しているかを示します。つまり、スコアが高いほど、各クラスタはより密で、十分に分離されています。 それは次のように定義されます:
+&#10230; Calinski-Harabazインデックスs(k)はクラスタリングモデルが各クラスタをどの程度適切に定義しているかを示します。つまり、スコアが高いほど、各クラスタはより密で、十分に分離されています。それは次のように定義されます:
 
 <br>
 
@@ -210,7 +210,7 @@
 
 **36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:**
 
-&#10230; アルゴリズム ― 主成分分析 (PCA)の過程は、次のようにデータの分散を最大化することによりデータをk次元に射影する次元削減の技術です。
+&#10230; アルゴリズム ― 主成分分析（PCA）の過程は、次のようにデータの分散を最大化することによりデータをk次元に射影する次元削減の技術です。
 
 <br>
 
@@ -246,7 +246,7 @@
 
 **42. [Data in feature space, Find principal components, Data in principal components space]**
 
-&#10230; [特徴空間内のデータ, 主成分を見つける, 主成分空間内のデータ]
+&#10230; [特徴空間内のデータ、主成分の探索、主成分空間内のデータ]
 
 <br>
 
@@ -324,16 +324,16 @@
 
 **55. [Introduction, Motivation, Jensen's inequality]**
 
-&#10230; [導入, 動機, イェンセンの不等式]
+&#10230; [導入、動機、イェンセンの不等式]
 
 <br>
 
 **56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
 
-&#10230;[クラスタリング, 期待値最大化法, k-means, 階層クラスタリング, 指標]
+&#10230;[クラスタリング、期待値最大化法、k-means、階層クラスタリング、指標]
 
 <br>
 
 **57. [Dimension reduction, PCA, ICA]**
 
-&#10230; [次元削減, PCA, ICA]
+&#10230; [次元削減、PCA、ICA]

From cff3a451ee8b204536c564ccd695c3156a0111c2 Mon Sep 17 00:00:00 2001
From: Hiroki <taniokah@gmail.com>
Date: Wed, 30 Oct 2019 12:36:48 +0900
Subject: [PATCH 439/531] Revised an unnecessary space and separators.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

真偽 doesn't need a space between 真 and 偽.
Other cases of names are separated with '・', but a case is ','
---
 ja/cs-230-convolutional-neural-networks.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ja/cs-230-convolutional-neural-networks.md b/ja/cs-230-convolutional-neural-networks.md
index bff314dce..f5d545fc3 100644
--- a/ja/cs-230-convolutional-neural-networks.md
+++ b/ja/cs-230-convolutional-neural-networks.md
@@ -649,7 +649,7 @@
 
 **93. [Training set, Noise, Real-world image, Generator, Discriminator, Real Fake]**
 
-&#10230; [学習セット, ノイズ, 現実世界の画像, 生成器, 識別器, 真 偽]
+&#10230; [学習セット, ノイズ, 現実世界の画像, 生成器, 識別器, 真偽]
 
 <br>
 
@@ -698,7 +698,7 @@
 
 **100. Reviewed by X, Y and Z**
 
-&#10230; X, Y, Z 校正
+&#10230; X・Y・Z 校正
 
 <br>
 

From dd096ff81879fa6779bbc64d51c6c8e6cc1157c3 Mon Sep 17 00:00:00 2001
From: Hiroki <taniokah@gmail.com>
Date: Thu, 31 Oct 2019 12:48:36 +0900
Subject: [PATCH 440/531] Some expressions revised in Japanese style

Some expressions including comma, punctuation and wording are changed.
Also, square brackets [ ... ] are appended into some lines.
---
 ja/cs-230-recurrent-neural-networks.md | 92 +++++++++++++-------------
 1 file changed, 46 insertions(+), 46 deletions(-)

diff --git a/ja/cs-230-recurrent-neural-networks.md b/ja/cs-230-recurrent-neural-networks.md
index b236d8727..e366a86de 100644
--- a/ja/cs-230-recurrent-neural-networks.md
+++ b/ja/cs-230-recurrent-neural-networks.md
@@ -18,49 +18,49 @@
 
 **3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
 
-&#10230;概要、アーキテクチャの構造、RNNの応用アプリケーション、損失関数、逆伝播
+&#10230;[概要、アーキテクチャの構造、RNNの応用アプリケーション、損失関数、逆伝播]
 
 <br>
 
 
 **4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
 
-&#10230;長期依存性関係の処理、活性化関数、勾配喪失と発散、勾配クリッピング、GRU/LTSM、ゲートの種類、双方向性RNN、ディープ(深層学習)RNN
+&#10230;[長期依存性関係の処理、活性化関数、勾配喪失と発散、勾配クリッピング、GRU/LTSM、ゲートの種類、双方向性RNN、ディープ(深層学習)RNN]
 
 <br>
 
 
 **5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
 
-&#10230;単語出現の学習、ノーテーション、埋め込み行列、Word2vec、スキップグラム、ネガティブサンプリング、グローブ
+&#10230;[単語出現の学習、ノーテーション、埋め込み行列、Word2vec、スキップグラム、ネガティブサンプリング、グローブ]
 
 <br>
 
 
 **6. [Comparing words, Cosine similarity, t-SNE]**
 
-&#10230;単語の比較、コサイン類似度、t-SNE
+&#10230;[単語の比較、コサイン類似度、t-SNE]
 
 <br>
 
 
 **7. [Language model, n-gram, Perplexity]**
 
-&#10230;言語モデル、n-gramモデル、パープレキシティ
+&#10230;[言語モデル、n-gramモデル、パープレキシティ]
 
 <br>
 
 
 **8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
 
-&#10230;機械翻訳、ビームサーチ、言語長正規化、エラー分析、ブルースコア(機械翻訳比較スコア)
+&#10230;[機械翻訳、ビームサーチ、単語長の正規化、エラー分析、BLEUスコア(機械翻訳比較スコア)]
 
 <br>
 
 
 **9. [Attention, Attention model, Attention weights]**
 
-&#10230;アテンション、アテンションモデル、アテンションウェイト
+&#10230;[アテンション、アテンションモデル、アテンションウェイト]
 
 <br>
 
@@ -74,14 +74,14 @@
 
 **11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
 
-&#10230;一般的なRNNのアーキテクチャ - RNNとして知られるリカレントニューラルネットワークは、隠れ層の状態を利用して、前の出力を次の入力として取り扱うことを可能にするニューラルネットワークの一種です。一般的なモデルは下記のようになります。
+&#10230;一般的なRNNのアーキテクチャ - RNNとして知られるリカレントニューラルネットワークは、隠れ層の状態を利用して、前の出力を次の入力として取り扱うことを可能にするニューラルネットワークの一種です。一般的なモデルは下記のようになります:
 
 <br>
 
 
 **12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
 
-&#10230;それぞれの時点 t において活性化関数の状態 a<t> と出力 y<t> は下記のように表現されます。　
+&#10230;それぞれの時点 t において活性化関数の状態 a<t> と出力 y<t> は下記のように表現されます:
 
 <br>
 
@@ -95,7 +95,7 @@
 
 **14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
 
-&#10230;Wax,Waa,Wya,baは全ての時点で共有される係数であり、g1,g2は活性化関数です。
+&#10230;ここで、Wax,Waa,Wya,ba,by は全ての時点で共有される係数であり、g1,g2 は活性化関数です。
 
 <br>
 
@@ -109,56 +109,56 @@
 
 **16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
 
-&#10230;長所、任意の長さの入力を処理できる、入力サイズに応じてモデルサイズが大きくならない、計算は時系列情報を考慮している、重みは全ての時点で共有される
+&#10230;[長所、任意の長さの入力の処理可能性、入力サイズに応じて大きくならないモデルサイズ、時系列情報を考慮した計算、全ての時点で共有される重み]
 
 <br>
 
 
 **17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
 
-&#10230;短所、遅い計算、長い時間軸での情報の利用が困難、現在の状態から将来の入力を予測不可能
+&#10230;[短所、遅い計算、長い時間軸での情報の利用の困難性、現在の状態から将来の入力が予測不可能]
 
 <br>
 
 
 **18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
 
-&#10230;RNNの応用 - RNNモデルは主に自然言語処理と音声認識の分野で使用されます。以下の表に、さまざまな応用例がまとめられています。
+&#10230;RNNの応用 - RNNモデルは主に自然言語処理と音声認識の分野で使用されます。さまざまな応用例が以下の表にとめられています:
 
 <br>
 
 
 **19. [Type of RNN, Illustration, Example]**
 
-&#10230;RNNの種類、図、例
+&#10230;[RNNの種類、図、例]
 
 <br>
 
 
 **20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
 
-&#10230;一対一、一対多、多対一、多対多
+&#10230;[一対一、一対多、多対一、多対多]
 
 <br>
 
 
 **21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
 
-&#10230;伝統的なニューラルネットワーク、音楽生成、感情分類、固有表現認識、機械翻訳
+&#10230;[伝統的なニューラルネットワーク、音楽生成、感情分類、固有表現認識、機械翻訳]
 
 <br>
 
 
 **22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
 
-&#10230;損失関数 - リカレントニューラルネットワークの場合、時間軸全体での損失関数Lは、各時点での損失に基づき、次のように定義されます。
+&#10230;損失関数 - リカレントニューラルネットワークの場合、時間軸全体での損失関数Lは、各時点での損失に基づき、次のように定義されます:
 
 <br>
 
 
 **23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
 
-&#10230;時間軸での誤差逆伝播法 - 誤差逆伝播(バックプロパゲーション)が各時点で行われます。時刻 T における、重み行列 W に関する損失 L の導関数は以下のように表されます。
+&#10230;時間軸での誤差逆伝播法 - 誤差逆伝播(バックプロパゲーション)が各時点で行われます。時刻 T における、重み行列 W に関する損失 L の導関数は以下のように表されます:
 
 <br>
 
@@ -172,14 +172,14 @@
 
 **25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
 
-&#10230;一般的に使用される活性化関数 - RNNモジュールで使用される最も一般的な活性化関数を以下に説明します。
+&#10230;一般的に使用される活性化関数 - RNNモジュールで使用される最も一般的な活性化関数を以下に説明します:
 
 <br>
 
 
 **26. [Sigmoid, Tanh, RELU]**
 
-&#10230;[シグモイド、Tanh、RELU]
+&#10230;[シグモイド、ハイパボリックタンジェント、RELU]
 
 <br>
 
@@ -207,21 +207,21 @@
 
 **30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
 
-&#10230;ゲートの種類 - 勾配消失問題を解決するために、特定のゲートがいくつかのRNNで使用され、通常明確に定義された目的を持っています。それらは通常Γと記され、以下のように定義されます。
+&#10230;ゲートの種類 - 勾配消失問題を解決するために、特定のゲートがいくつかのRNNで使用され、通常明確に定義された目的を持っています。それらは通常Γと記され、以下のように定義されます:
 
 <br>
 
 
 **31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
 
-&#10230;ここで、W、U、bはゲート固有の係数、σはシグモイド関数です。主なものは以下の表にまとめられています。
+&#10230;ここで、W、U、bはゲート固有の係数、σはシグモイド関数です。主なものは以下の表にまとめられています:
 
 <br>
 
 
 **32. [Type of gate, Role, Used in]**
 
-&#10230;[ゲートの種類、役割、下記で使用される]
+&#10230;[ゲートの種類、役割、下記で使用]
 
 <br>
 
@@ -242,35 +242,35 @@
 
 **35. [LSTM, GRU]**
 
-&#10230;[LSTM GRU]
+&#10230;[LSTM、GRU]
 
 <br>
 
 
 **36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
 
-&#10230;GRU/LSTM - ゲート付きリカレントユニット（GRU）およびロングショートタームメモリユニット（LSTM）は、従来のRNNが直面した勾配消失問題を解決しようとします。LSTMはGRUを一般化したものです。以下は、各アーキテクチャを特徴づける式をまとめた表です。
+&#10230;GRU/LSTM - ゲート付きリカレントユニット（GRU）およびロングショートタームメモリユニット（LSTM）は、従来のRNNが直面した勾配消失問題を解決しようとします。LSTMはGRUを一般化したものです。各アーキテクチャを特徴づける式を以下の表にまとめます:
 
 <br>
 
 
 **37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
 
-&#10230;特徴づけ、ゲート付きリカレントユニット（GRU）、ロングショートタームメモリ（LSTM）、依存関係
+&#10230;[特徴づけ、ゲート付きリカレントユニット（GRU）、ロングショートタームメモリ（LSTM）、依存関係]
 
 <br>
 
 
 **38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
 
-&#10230;備考：記号*は2つのベクトル間の要素ごとの乗算を表します。
+&#10230;備考：記号 ⋆ は2つのベクトル間の要素ごとの乗算を表します。
 
 <br>
 
 
 **39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
 
-&#10230;RNNの変種 - 以下の表は、一般的に使用されている他のRNNアーキテクチャをまとめたものです。
+&#10230;RNNの変種 - 一般的に使用されている他のRNNアーキテクチャを以下の表にまとめます:
 
 <br>
 
@@ -312,28 +312,28 @@
 
 **45. [1-hot representation, Word embedding]**
 
-&#10230;[1-hot表現、単語埋め込み]
+&#10230;[1-hot表現、単語埋め込み（単語分散表現）]
 
 <br>
 
 
 **46. [teddy bear, book, soft]**
 
-&#10230;テディベア、本、柔らかい
+&#10230;[テディベア、本、柔らかい]
 
 <br>
 
 
 **47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
 
-&#10230;[owと表記される、素朴なアプローチ、類似性情報なし、ewと表記される、単語の類似性を考慮に入れる]
+&#10230;[owの表記、素朴なアプローチ、類似性のない情報、ewの表記、単語の類似性の考慮]
 
 <br>
 
 
 **48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
 
-&#10230;埋め込み行列 - 与えられた単語wに対して、埋め込み行列Eは、以下のように1-hot表現owを埋め込み行列ewに写像します。
+&#10230;埋め込み行列（分散表現行列） - 与えられた単語wに対して、埋め込み行列Eは、1-hot表現owを以下のように埋め込み行列ewに写像します:
 
 <br>
 
@@ -375,7 +375,7 @@
 
 **54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
 
-&#10230;スキップグラム - スキップグラムword2vecモデルは、あるターゲット単語tがコンテキスト単語cと一緒に出現する確率を評価することで単語の埋め込みを学習する教師付き学習タスクです。tに関するパラメータをθtと表記すると、その確率P(t|c) は下記の式で与えられます。
+&#10230;スキップグラム - スキップグラムword2vecモデルは、あるターゲット単語tがコンテキスト単語cと一緒に出現する確率を評価することで単語の埋め込みを学習する教師付き学習タスクです。tに関するパラメータをθtと表記すると、その確率P(t|c) は以下の式で与えられます:
 
 <br>
 
@@ -403,7 +403,7 @@
 
 **57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
 
-&#10230;GloVe - GloVeモデルは、単語表現のためのグローバルベクトルの略で、共起行列Xを使用する単語の埋め込み手法です。ここで、各Xi,jは、ターゲットiがコンテキストjで発生した回数を表します。そのコスト関数Jは以下の通りです。
+&#10230;GloVe - GloVeモデルは、単語表現のためのグローバルベクトルの略で、共起行列Xを使用する単語の埋め込み手法です。ここで、各Xi,jは、ターゲットiがコンテキストjで発生した回数を表します。そのコスト関数Jは以下の通りです:
 
 <br>
 
@@ -411,7 +411,7 @@
 **58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
 Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
 
-&#10230;ここで、fはXi,j =0⟹f（Xi,j）= 0となるような重み関数です。このモデルでeとθが果たす対称性を考えると、最後の単語の埋め込みe（final）wは下記ののようになります。
+&#10230;ここで、fはXi,j =0⟹f（Xi,j）= 0となるような重み関数です。このモデルでeとθが果たす対称性を考えると、最後の単語の埋め込みe（final）wは以下のようになります:
 
 <br>
 
@@ -432,7 +432,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
 
-&#10230;コサイン類似度 - 単語w1とw2のコサイン類似度は次のように表されます。
+&#10230;コサイン類似度 - 単語w1とw2のコサイン類似度は次のように表されます
 
 <br>
 
@@ -481,7 +481,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
 
-&#10230;パープレキシティ - 言語モデルは一般的に、PPとも呼ばれるパープレキシティメトリックを使用して評価されます。これは、単語数Tにより正規化されたデータセットの逆確率と解釈できます。パープレキシティは低いほど良く、次のように定義されます。
+&#10230;パープレキシティ - 言語モデルは一般的に、PPとも呼ばれるパープレキシティメトリックを使用して評価されます。これは、単語数Tにより正規化されたデータセットの逆確率と解釈できます。パープレキシティは低いほど良く、次のように定義されます:
 (訳注:パープレキシティの数値はより低いものがより選択しやすい単語として評価されます。10であれば10個の中から1つ、10000であれば10000個の中から1つ選択されます。)
 
 <br>
@@ -503,7 +503,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
 
-&#10230;概要 - 機械翻訳モデルは、エンコーダーネットワークのロジックが最初に付加されている以外は、言語モデルと似ています。このため、条件付き言語モデルと呼ばれることもあります。目的は次のような文yを見つけることです。
+&#10230;概要 - 機械翻訳モデルは、エンコーダーネットワークのロジックが最初に付加されている以外は、言語モデルと似ています。このため、条件付き言語モデルと呼ばれることもあります。目的は次のような文yを見つけることです:
 
 <br>
 
@@ -517,7 +517,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
 
-&#10230;［ステップ１：上位Ｂ個の高い確率を持つ単語y<1>を見つける。ステップ２：条件付き確率y<k>|x,y<1>,...,y<k−1>を計算する。ステップ３：上位Ｂ個の組み合わせx,y<1>,...,y<k>を保持する。あるストップワードでプロセスを終了する]
+&#10230;［ステップ１：上位Ｂ個の高い確率を持つ単語y<1>を見つけ、ステップ２：条件付き確率y<k>|x,y<1>,...,y<k−1>を計算し、ステップ３：上位Ｂ個の組み合わせx,y<1>,...,y<k>を保持し、あるストップワードでプロセスを終了します]
 
 <br>
 
@@ -538,21 +538,21 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
 
-&#10230;文章の長さの正規化 - 数値の安定性を向上させるために、ビーム検索は通常次のように正規化（対数尤度正規化）された目的関数に対して適用されます。
+&#10230;文章の長さの正規化 - 数値の安定性を向上させるために、ビーム検索は通常、正規化（対数尤度正規化）された目的関数に対して適用され、次のように定義されます:
 
 <br>
 
 
 **77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
 
-&#10230;注：パラメータαは緩衝パラメータと見なされ、その値は通常0.5から1の間です。
+&#10230;注：パラメータαは緩衝パラメータと見なされ、その値は通常、0.5から1の間です。
 
 <br>
 
 
 **78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
 
-&#10230;エラー分析 - 予測されたˆyの翻訳が良くない場合、以下のようなエラー分析を実行することで、なぜy∗のような良い翻訳を得られなかったのか考えることが可能です。
+&#10230;エラー分析 - 予測されたˆyの翻訳が良くない場合、以下のようなエラー分析を実行することで、なぜy∗のような良い翻訳を得られなかったのか考えることが可能です:
 
 <br>
 
@@ -573,14 +573,14 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
 
-&#10230;Bleuスコア - Bleu（Bilingual evaluation understudy）スコアは、n-gramの精度に基づき類似性スコアを計算することで、機械翻訳がどれほど優れているかを定量化します。以下のように定義されています。
+&#10230;Bleuスコア - Bleu（Bilingual evaluation understudy）スコアは、n-gramの精度に基づき類似性スコアを計算することで、機械翻訳がどれほど優れているかを定量化します。以下のように定義されています:
 
 <br>
 
 
 **82. where pn is the bleu score on n-gram only defined as follows:**
 
-&#10230;ここで、pnはn-gramでのbleuスコアで下記のようにだけ定義されています。
+&#10230;ここで、pnはn-gramでのbleuスコアで下記のようにだけ定義されています:
 
 <br>
 
@@ -601,7 +601,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
 
-&#10230;アテンションモデル - このモデルを使用するとRNNは重要であると考えられる入力の特定部分に注目することができ、得られるモデルの性能が実際に向上します。時刻tにおいて、出力y<t>が活性化関数a<t'>とコンテキストc<t>とに払うべき注意量をα<t,t′>と表記すると次のようになります。
+&#10230;アテンションモデル - このモデルを使用するとRNNは重要であると考えられる入力の特定部分に注目することができ、得られるモデルの性能が実際に向上します。時刻tにおいて、出力y<t>が活性化関数a<t'>とコンテキストc<t>とに払うべき注意量をα<t,t′>と表記すると次のようになります:
 
 <br>
 
@@ -643,7 +643,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **91. The Deep Learning cheatsheets are now available in [target language].**
 
-&#10230;ディープラーニングのチートシートが日本語で利用可能になりました。
+&#10230;ディープラーニングのチートシートが[日本語]で利用可能になりました。
 
 <br>
 

From 2fffb0e176fd6b1d7fd945555650e090425be7ea Mon Sep 17 00:00:00 2001
From: Hiroki <taniokah@gmail.com>
Date: Thu, 31 Oct 2019 13:06:21 +0900
Subject: [PATCH 441/531] Some expressions revised in Japanese style

Some expressions including comma, punctuation and wording are changed.
---
 ja/cs-230-convolutional-neural-networks.md | 88 +++++++++++-----------
 1 file changed, 44 insertions(+), 44 deletions(-)

diff --git a/ja/cs-230-convolutional-neural-networks.md b/ja/cs-230-convolutional-neural-networks.md
index f5d545fc3..178592414 100644
--- a/ja/cs-230-convolutional-neural-networks.md
+++ b/ja/cs-230-convolutional-neural-networks.md
@@ -18,63 +18,63 @@
 
 **3. [Overview, Architecture structure]**
 
-&#10230; [概要, アーキテクチャ構造]
+&#10230; [概要、アーキテクチャ構造]
 
 <br>
 
 
 **4. [Types of layer, Convolution, Pooling, Fully connected]**
 
-&#10230; [層の種類, 畳み込み, プーリング, 全結合]
+&#10230; [層の種類、畳み込み、プーリング、全結合]
 
 <br>
 
 
 **5. [Filter hyperparameters, Dimensions, Stride, Padding]**
 
-&#10230; [フィルタハイパーパラメータ, 次元, ストライド, パディング]
+&#10230; [フィルタハイパーパラメータ、次元、ストライド、パディング]
 
 <br>
 
 
 **6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
 
-&#10230; [ハイパーパラメータの調整, パラメータの互換性, モデルの複雑さ, 受容野]
+&#10230; [ハイパーパラメータの調整、パラメータの互換性、モデルの複雑さ、受容野]
 
 <br>
 
 
 **7. [Activation functions, Rectified Linear Unit, Softmax]**
 
-&#10230; [活性化関数, 正規化線形ユニット, ソフトマックス]
+&#10230; [活性化関数、正規化線形ユニット、ソフトマックス]
 
 <br>
 
 
 **8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
 
-&#10230; [物体検出, モデルの種類, 検出, IoU, 非極大抑制, YOLO, R-CNN]
+&#10230; [物体検出、モデルの種類、検出、IoU、非極大抑制、YOLO、R-CNN]
 
 <br>
 
 
 **9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
 
-&#10230; [顔認証/認識, One shot学習, シャムネットワーク, トリプレット損失]
+&#10230; [顔認証/認識、One shot学習、シャムネットワーク、トリプレット損失]
 
 <br>
 
 
 **10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
 
-&#10230; [ニューラルスタイル変換, 活性化, スタイル行列, スタイル/コンテンツコスト関数]
+&#10230; [ニューラルスタイル変換、活性化、スタイル行列、スタイル/コンテンツコスト関数]
 
 <br>
 
 
 **11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
 
-&#10230; [計算トリックアーキテクチャ, 敵対的生成ネットワーク, ResNet, インセプションネットワーク]
+&#10230; [計算トリックアーキテクチャ、敵対的生成ネットワーク、ResNet、インセプションネットワーク]
 
 <br>
 
@@ -130,21 +130,21 @@
 
 **19. [Type, Purpose, Illustration, Comments]**
 
-&#10230; [種類, 目的, 図, コメント]
+&#10230; [種類、目的、図、コメント]
 
 <br>
 
 
 **20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
 
-&#10230; [最大プーリング, 平均プーリング, 各プーリング操作は現在のビューの中から最大値を選ぶ, 各プーリング操作は現在のビューに含まれる値を平均する]
+&#10230; [最大プーリング、平均プーリング、各プーリング操作は現在のビューの中から最大値を選ぶ、各プーリング操作は現在のビューに含まれる値を平均する]
 
 <br>
 
 
 **21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
 
-&#10230; [検出された特徴を保持する, 最も一般的に利用される, 特徴マップをダウンサンプリングする, LeNetで利用される]
+&#10230; [検出された特徴の保持、最も一般的な利用、特徴マップをダウンサンプリング、LeNetでの利用]
 
 <br>
 
@@ -208,14 +208,14 @@
 
 **30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
 
-&#10230; [モード, 値, 図, 目的, Valid, Same, Full]
+&#10230; [モード、値、図、目的、Valid、Same、Full]
 
 <br>
 
 
 **31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
 
-&#10230; [パディングなし, もし次元が合わなかったら最後の畳み込みをやめる, 特徴マップのサイズが[IS]になるようなパディング, 出力サイズは数学的に扱いやすい, 「ハーフ」パディングとも呼ばれる, 入力の一番端まで畳み込みが適用されるような最大パディング, フィルタは入力を端から端まで「見る」]
+&#10230; [パディングなし、次元が合わなかったら場合の最後の畳み込みの終了, 特徴マップのサイズが⌈IS⌉になるようなパディング、出力サイズは数学的に扱いやすい、「ハーフ」パディングとも呼ばれる、入力の一番端まで畳み込みが適用されるような最大パディング, フィルタは入力を端から端まで「見る」]
 
 <br>
 
@@ -236,7 +236,7 @@
 
 **34. [Input, Filter, Output]**
 
-&#10230; [入力, フィルタ, 出力]
+&#10230; [入力、フィルタ、出力]
 
 <br>
 
@@ -250,35 +250,35 @@
 
 **36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
 
-&#10230; モデルの複雑さを理解する - モデルの複雑さを評価するために、モデルのアーキテクチャが持つパラメータの数を測定することがしばしば有用です。畳み込みニューラルネットワークの各層では、以下のように行なわれます。
+&#10230; モデルの複雑さを理解する - モデルの複雑さを評価するために、モデルのアーキテクチャが持つパラメータの数を測定することがしばしば有用です。畳み込みニューラルネットワークの各層では、以下のように行なわれます:
 
 <br>
 
 
 **37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
 
-&#10230; [図, 入力サイズ, 出力サイズ, パラメータの数, 備考]
+&#10230; [図、入力サイズ、出力サイズ、パラメータの数、備考]
 
 <br>
 
 
 **38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
 
-&#10230; [フィルタごとに1つのバイアスパラメータ, ほとんどの場合, S<F, Kの一般的な選択は2C]
+&#10230; [フィルタごとの1つのバイアスパラメータ、ほとんどの場合、S<F、Kの一般的な選択は2C]
 
 <br>
 
 
 **39. [Pooling operation done channel-wise, In most cases, S=F]**
 
-&#10230; [プール操作はチャネルごとに行われる, ほとんどの場合, S=F]
+&#10230; [チャネルごとに行われるプーリング操作、ほとんどの場合、S=F]
 
 <br>
 
 
 **40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
 
-&#10230; [入力は平坦化される, ニューロンごとにひとつのバイアスパラメータ, FCのニューロンの数には構造的制約がない]
+&#10230; [入力は平坦化される、ニューロンごとにひとつのバイアスパラメータ、FCのニューロンの数には構造的制約がない]
 
 <br>
 
@@ -313,21 +313,21 @@
 
 **45. [ReLU, Leaky ReLU, ELU, with]**
 
-&#10230;[ReLU, Leaky ReLU, ELU, ただし]
+&#10230;[ReLU、Leaky ReLU、ELU、ただし]
 
 <br>
 
 
 **46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
 
-&#10230; [生物学的に解釈可能な非線形複雑性, 負の値に対してReLUが死んでいる問題に対処する,どこても微分可能]
+&#10230; [生物学的に解釈可能な非線形複雑性、負の値に対してReLUが死んでいる問題への対処、どこても微分可能]
 
 <br>
 
 
 **47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
 
-&#10230; ソフトマックス - ソフトマックスのステップは入力としてスコアx∈Rnのベクトルを取り、アーキテクチャの最後にあるソフトマックス関数を通じて確率p∈Rnのベクトルを出力する一般化されたロジスティック関数として見ることができます。次のように定義されます。
+&#10230; ソフトマックス - ソフトマックスのステップは入力としてスコアx∈Rnのベクトルを取り、アーキテクチャの最後にあるソフトマックス関数を通じて確率p∈Rnのベクトルを出力する一般化されたロジスティック関数として見ることができます。次のように定義されます:
 
 <br>
 
@@ -348,35 +348,35 @@
 
 **50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
 
-&#10230; モデルの種類 - 物体認識アルゴリズムは主に3つの種類があり、予測されるものの性質は異なります。次の表で説明されています。
+&#10230; モデルの種類 - 物体認識アルゴリズムは主に3つの種類があり、予測されるものの性質は異なります。次の表で説明されています:
 
 <br>
 
 
 **51. [Image classification, Classification w. localization, Detection]**
 
-&#10230; [画像分類, 位置特定を伴う分類, 検出]
+&#10230; [画像分類、位置特定を伴う分類、検出]
 
 <br>
 
 
 **52. [Teddy bear, Book]**
 
-&#10230; [テディベア, 本]
+&#10230; [テディベア、本]
 
 <br>
 
 
 **53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
 
-&#10230; [画像を分類する, 物体の確率を予測する, 画像内の物体を検出する, 物体の確率とその位置を予測する, 画像内の複数の物体を検出する, 複数の物体の確率と位置を予測する]
+&#10230; [画像の分類、物体の確率の予測, 画像内の物体の検出、物体の確率とその位置の予測、画像内の複数の物体の検出、複数の物体の確率と位置の予測]
 
 <br>
 
 
 **54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
 
-&#10230; [伝統的なCNN, 単純されたYOLO, R-CNN, YOLO, R-CNN]
+&#10230; [伝統的なCNN、単純されたYOLO、R-CNN、YOLO、R-CNN]
 
 <br>
 
@@ -390,21 +390,21 @@
 
 **56. [Bounding box detection, Landmark detection]**
 
-&#10230; [バウンディングボックス検出, ランドマーク検出]
+&#10230; [バウンディングボックス検出、ランドマーク検出]
 
 <br>
 
 
 **57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
 
-&#10230; [物体が配置されている画像の部分を検出する, 物体（たとえば目）の形状または特徴を検出する, よりきめ細かい]
+&#10230; [物体が配置されている画像の部分の検出、物体（たとえば目）の形状または特徴の検出、詳細]
 
 <br>
 
 
 **58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
 
-&#10230; [中心(bx, by)、高さbh、幅bwのボックス, 参照点(l1x,l1y), ..., (lnx,lny)]
+&#10230; [中心(bx, by)、高さbh、幅bwのボックス、参照点(l1x,l1y), ..., (lnx,lny)]
 
 <br>
 
@@ -439,28 +439,28 @@
 
 **63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
 
-&#10230; [特定のクラスに対して, ステップ1: 最大の予測確率を持つボックスを選ぶ。, ステップ2: そのボックスに対してIoU⩾0.5となる全てのボックスを破棄する。]
+&#10230; [特定のクラスに対して、ステップ1: 最大の予測確率を持つボックスを選ぶ。ステップ2: そのボックスに対してIoU⩾0.5となる全てのボックスを破棄する。]
 
 <br>
 
 
 **64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
 
-&#10230; [ボックス予測, 最大確率のボックス選択, 同じクラスの重複除去, 最終的な境界ボックス]
+&#10230; [ボックス予測、最大確率のボックス選択、同じクラスの重複除去、最終的な境界ボックス]
 
 <br>
 
 
 **65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
 
-&#10230; YOLO - You Only Look Once (YOLO)は次の手順を実行する物体検出アルゴリズムです。
+&#10230; YOLO - You Only Look Once (YOLO)は次の手順を実行する物体検出アルゴリズムです:
 
 <br>
 
 
 **66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
 
-&#10230; [ステップ1: 入力画像をGxGグリッドに分割する。, ステップ2: 各グリッドセルに対して次の形式のyを予測するCNNを実行する:,k回繰り返す]
+&#10230; [ステップ1: 入力画像をGxGグリッドに分割する。ステップ2: 各グリッドセルに対して次の形式のyを予測するCNNを実行する:,k回繰り返す。]
 
 <br>
 
@@ -481,7 +481,7 @@
 
 **69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
 
-&#10230; [元の画像, GxGグリッドでの分割, 境界ボックス予測, 非極大抑制]
+&#10230; [元の画像、GxGグリッドでの分割、境界ボックス予測、非極大抑制]
 
 <br>
 
@@ -502,7 +502,7 @@
 
 **72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
 
-&#10230; [元の画像, セグメンテーション, 境界ボックス予測, 非極大抑制]
+&#10230; [元の画像、セグメンテーション、境界ボックス予測、非極大抑制]
 
 <br>
 
@@ -530,14 +530,14 @@
 
 **76. [Face verification, Face recognition, Query, Reference, Database]**
 
-&#10230; [顔認証, 顔認識, クエリ, 参照, データベース]
+&#10230; [顔認証、顔認識、クエリ、参照、データベース]
 
 <br>
 
 
 **77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
 
-&#10230; [これは正しい人ですか?, 1対1検索, これはデータベース内のK人のうちの1人ですか, 1対多検索]
+&#10230; [これは正しい人ですか?、1対1検索、これはデータベース内のK人のうちの1人ですか？、1対多検索]
 
 <br>
 
@@ -579,7 +579,7 @@
 
 **83. [Content C, Style S, Generated image G]**
 
-&#10230; [コンテンツC, スタイルS, 生成された画像G]
+&#10230; [コンテンツC、スタイルS、生成された画像G]
 
 <br>
 
@@ -600,7 +600,7 @@
 
 **86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
 
-&#10230; スタイル行列 - 与えられた層lのスタイル行列G[l]はグラム行列で、各要素G[l]kk′がチャネルkとk′の相関関係を定量化します。活性化a[l]に関して次のように定義されます。
+&#10230; スタイル行列 - 与えられた層lのスタイル行列G[l]はグラム行列で、各要素G[l]kk′がチャネルkとk′の相関関係を定量化します。活性化a[l]に関して次のように定義されます:
 
 <br>
 
@@ -649,7 +649,7 @@
 
 **93. [Training set, Noise, Real-world image, Generator, Discriminator, Real Fake]**
 
-&#10230; [学習セット, ノイズ, 現実世界の画像, 生成器, 識別器, 真偽]
+&#10230; [学習セット、ノイズ、現実世界の画像、生成器、識別器、真偽]
 
 <br>
 
@@ -663,7 +663,7 @@
 
 **95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
 
-&#10230; ResNet - Residual Networkアーキテクチャ（ResNetとも呼ばれる）は学習エラーを減らすため多数の層がある残差ブロックを使用します。残差ブロックは次の特性方程式を有します。
+&#10230; ResNet - Residual Networkアーキテクチャ（ResNetとも呼ばれる）は学習エラーを減らすため多数の層がある残差ブロックを使用します。残差ブロックは次の特性方程式を有します:
 
 <br>
 
@@ -677,7 +677,7 @@
 
 **97. The Deep Learning cheatsheets are now available in [target language].**
 
-&#10230; ディープラーニングのチートシートが日本語で利用可能になりました。
+&#10230; ディープラーニングのチートシートが[日本語]で利用可能になりました。
 
 <br>
 

From 0448391755bb61115d12a9b464f96c478dbc3373 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Sun, 3 Nov 2019 12:40:17 +0900
Subject: [PATCH 442/531] vi translate for cheatsheet supervised learning

---
 vi/cheatsheet-supervised-learning.md | 30 ++++++++++++++--------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/vi/cheatsheet-supervised-learning.md b/vi/cheatsheet-supervised-learning.md
index 33ce30d81..dd9f017c6 100644
--- a/vi/cheatsheet-supervised-learning.md
+++ b/vi/cheatsheet-supervised-learning.md
@@ -18,7 +18,7 @@
 
 **4. Type of prediction ― The different types of predictive models are summed up in the table below:**
 
-&#10230; Kiểu dự đoán - Các kiểu khác nhau của mô hình dự đoán được tổng kết trong bảng bên dưới: 
+&#10230; Loại dự đoán - Các loại mô hình dự đoán được tổng kết trong bảng bên dưới: 
 
 <br>
 
@@ -36,7 +36,7 @@
 
 **7. Type of model ― The different models are summed up in the table below:**
 
-&#10230; Kiểu của mô hình - Các mô hình khác nhau được tổng kết trong bảng bên dưới:
+&#10230; Loại mô hình - Các mô hình khác nhau được tổng kết trong bảng bên dưới:
 
 <br>
 
@@ -54,13 +54,13 @@
 
 **10. Notations and general concepts**
 
-&#10230; Kí hiệu và các khái niệm tổng quát
+&#10230; Các kí hiệu và khái niệm tổng quát
 
 <br>
 
 **11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
 
-&#10230; Hypothesis - Hypothesis được kí hiệu là h0, là một mô hình mà chúng ta chọn. Với dữ liệu đầu vào cho trước x(i), mô hình dự đoán đầu ra là h0(x(i)).
+&#10230; Hypothesis - Hypothesis được kí hiệu là hθ, là một mô hình mà chúng ta chọn. Với dữ liệu đầu vào cho trước x(i), mô hình dự đoán đầu ra là hθ(x(i)).
 
 <br>
 
@@ -120,7 +120,7 @@
 
 **21. Linear models**
 
-&#10230; Mô hình tuyến tính
+&#10230; Các mô hình tuyến tính
 
 <br>
 
@@ -138,7 +138,7 @@
 
 **24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
 
-&#10230; Phương trình chuẩn - Bằng việc kí hiệu X là ma trận thiết kế, giá trị của θ mà cực tiểu hoá cost function là một phương pháp dạng đóng như là:
+&#10230; Phương trình chuẩn - Bằng việc kí hiệu X là ma trận thiết kế, giá trị của θ làm cực tiểu hoá cost function là một phương pháp dạng đóng như sau:
 
 <br>
 
@@ -156,7 +156,7 @@
 
 **27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
 
-&#10230; LWR - Hồi quy trọng số cục bộ, còn được biết như là LWR, là biến thể của hồi quy tuyến tính, nó sẽ đánh trọng số cho mỗi ví dụ huấn luyện trong cost function của nó bởi w(i)(x), đươc định nghĩa với tham số τ∈R như sau:
+&#10230; LWR - Hồi quy trọng số cục bộ, còn được biết với cái tên LWR, là biến thể của hồi quy tuyến tính, nó sẽ đánh trọng số cho mỗi ví dụ huấn luyện trong cost function của nó bởi w(i)(x), được định nghĩa với tham số τ∈R như sau:
 
 <br>
 
@@ -324,7 +324,7 @@
 
 **55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
 
-&#10230; Một mô hình sinh đầu tiên cố gắng học cách dữ liệu được sinh ra thông qua việc ước lượng P(x|y), chúng ta có thể sau đó sử dụng P(x|y) để ước lượng P(y|x) bằng cách sử dụng luật Bayes.
+&#10230; Một mô hình sinh đầu tiên cố gắng học cách dữ liệu được sinh ra thông qua việc ước lượng P(x|y), sau đó chúng ta có thể sử dụng P(x|y) để ước lượng P(y|x) bằng cách sử dụng luật Bayes.
 
 <br>
 
@@ -372,7 +372,7 @@
 
 **63. Tree-based and ensemble methods**
 
-&#10230; Phương thức Tree-based và ensemble
+&#10230; Các phương thức Tree-based và ensemble
 
 <br>
 
@@ -390,13 +390,13 @@
 
 **66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
 
-&#10230; Rừng ngẫu nhiên 0 Nó là một kĩ thuật dựa trên cây, sử dụng số lượng lớn các cây quyết định để lựa chọn ngẫu nhiện các tập thuộc tính. Ngược lại với một cây quyết định đơn, kĩ thuật này khá khó diễn giải nhưng do có hiệu năng tốt nên đã trở thành một giải thuật khá phổ biến hiện nay.
+&#10230; Rừng ngẫu nhiên - Là một kĩ thuật dựa trên cây (tree-based), sử dụng số lượng lớn các cây quyết định để lựa chọn ngẫu nhiên các tập thuộc tính. Ngược lại với một cây quyết định đơn, kĩ thuật này khá khó diễn giải nhưng do có hiệu năng tốt nên đã trở thành một giải thuật khá phổ biến hiện nay.
 
 <br>
 
 **67. Remark: random forests are a type of ensemble methods.**
 
-&#10230; Chú ý: rững ngẫu nhiên là một loại giải thuật ensemble.
+&#10230; Chú ý: rừng ngẫu nhiên là một loại giải thuật ensemble.
 
 <br>
 
@@ -456,7 +456,7 @@
 
 **77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
 
-&#10230; Bất đẳng thức Hoeffding
+&#10230; Bất đẳng thức Hoeffding - Cho Z1,..,Zm là m biến iid được đưa ra từ phân phối Bernoulli của tham số ϕ. Cho ˆϕ là trung bình mẫu của chúng và γ>0 cố định. Ta có:
 
 <br>
 
@@ -486,13 +486,13 @@
 
 **82. the training examples are drawn independently**
 
-&#10230; ví dụ huấn luyện được tạo ra độc lập
+&#10230; các ví dụ huấn luyện được tạo ra độc lập
 
 <br>
 
 **83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
 
-&#10230; Shattering (Chia nhỏ) - Cho một tập hợp S={x(1),...,x(d)}, và một tập hợp các classifiers H, ta nó rằng H chia nhỏ S nếu với bất kì tập các nhãn {y(1),...,y(d)} nào, ta có:
+&#10230; Shattering (Chia nhỏ) - Cho một tập hợp S={x(1),...,x(d)}, và một tập hợp các classifiers H, ta nói rằng H chia nhỏ S nếu với bất kì tập các nhãn {y(1),...,y(d)} nào, ta có:
 
 <br>
 
@@ -534,7 +534,7 @@
 
 **90. [Linear models, linear regression, logistic regression, generalized linear models]**
 
-&#10230; [Các mô hình tuyến tính, hồi quy tuyến tính, hồi quy logistic, mô hình tuyến tính tổng quát]
+&#10230; [Các mô hình tuyến tính, hồi quy tuyến tính, hồi quy logistic, các mô hình tuyến tính tổng quát]
 
 <br>
 

From 85851dc16e2676231ae86af280e1f90964e459ad Mon Sep 17 00:00:00 2001
From: qunaieer <fsaq2000@gmail.com>
Date: Sun, 3 Nov 2019 07:48:22 +0300
Subject: [PATCH 443/531] Update cs-229-machine-learning-tips-and-tricks.md
 with RTL tags

---
 ar/cs-229-machine-learning-tips-and-tricks.md | 142 ++++++++++++------
 1 file changed, 96 insertions(+), 46 deletions(-)

diff --git a/ar/cs-229-machine-learning-tips-and-tricks.md b/ar/cs-229-machine-learning-tips-and-tricks.md
index 1739e8ffb..d48445a75 100644
--- a/ar/cs-229-machine-learning-tips-and-tricks.md
+++ b/ar/cs-229-machine-learning-tips-and-tricks.md
@@ -4,285 +4,335 @@
 
 **1. Machine Learning tips and tricks cheatsheet**
 
+<div dir="rtl">
 مرجع سريع لنصائح وحيل تعلّم الآلة
-
+</div>
 <br>
 
 **2. Classification metrics**
 
+<div dir="rtl">
 مقاييس التصنيف
-
+</div>
 <br>
 
 **3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
 
+<div dir="rtl">
 في سياق التصنيف الثنائي، هذه المقاييس (metrics) المهمة التي يجدر مراقبتها من أجل تقييم آداء النموذج.
-
+</div>
 <br>
 
 **4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
 
+<div dir="rtl">
 مصفوفة الدقّة (confusion matrix) - تستخدم مصفوفة الدقّة لأخذ تصور شامل عند تقييم أداء النموذج. وهي تعرّف كالتالي: 
-
+</div>
 <br>
 
 **5. [Predicted class, Actual class]**
 
+<div dir="rtl">
 [التصنيف المتوقع، التصنيف الفعلي]
-
+</div>
 <br>
 
 **6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
 
+<div dir="rtl">
 المقاييس الأساسية - المقاييس التالية تستخدم في العادة لتقييم أداء نماذج التصنيف:
-
+</div>
 <br>
 
 **7. [Metric, Formula, Interpretation]**
 
+<div dir="rtl">
 [المقياس، المعادلة، التفسير]
-
+</div>
 <br>
 
 **8. Overall performance of model**
 
+<div dir="rtl">
 الأداء العام للنموذج
-
+</div>
 <br>
 
 **9. How accurate the positive predictions are**
 
+<div dir="rtl">
 دقّة التوقعات الإيجابية (positive)
-
+</div>
 <br>
 
 **10. Coverage of actual positive sample**
 
+<div dir="rtl">
 تغطية عينات التوقعات الإيجابية الفعلية
-
+</div>
 <br>
 
 **11. Coverage of actual negative sample**
 
+<div dir="rtl">
 تغطية عينات التوقعات السلبية الفعلية
-
+</div>
 <br>
 
 **12. Hybrid metric useful for unbalanced classes**
 
+<div dir="rtl">
 مقياس هجين مفيد للأصناف غير المتوازنة (unbalanced)
-
+</div>
 <br>
 
 **13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
 
+<div dir="rtl">
 منحنى دقّة الأداء (ROC) - منحنى دقّة الآداء، ويطلق عليه ROC، هو رسمة لمعدل التصنيفات الإيجابية الصحيحة (TPR) مقابل معدل التصنيفات الإيجابية الخاطئة (FPR) باستخدام قيم حد (threshold) متغيرة. هذه المقاييس ملخصة في الجدول التالي:
+</div>
 <br>
 
 **14. [Metric, Formula, Equivalent]**
 
+<div dir="rtl">
 [المقياس، المعادلة، مرادف]
-
+</div>
 <br>
 
 **15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
 
+<div dir="rtl">
 المساحة تحت منحنى دقة الأداء (المساحة تحت المنحنى) (AUC) - المساحة تحت منحنى دقة الأداء (المساحة تحت المنحنى)، ويطلق عليها  AUC أو AUROC، هي المساحة تحت ROC كما هو موضح في الرسمة التالية:
-
+</div>
 <br>
 
 **16. [Actual, Predicted]**
 
+<div dir="rtl">
 [الفعلي، المتوقع]
-
+</div>
 <br>
 
 **17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
 
+<div dir="rtl">
 المقاييس الأساسية - إذا كان لدينا نموذج الانحدار f، فإن المقاييس التالية غالباً ما تستخدم لتقييم أداء النموذج:
-
+</div>
 <br>
 
 **18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
 
+<div dir="rtl">
 [المجموع الكلي للمربعات، مجموع المربعات المُفسَّر، مجموع المربعات المتبقي]
-
+</div>
 <br>
 
 **19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
 
+<div dir="rtl">
 مُعامل التحديد (Coefficient of determination) - مُعامل التحديد، وغالباً يرمز له بـ R2 أو r2، يعطي قياس لمدى مطابقة النموذج للنتائج الملحوظة، ويعرف كما يلي:
-
+</div>
 <br>
 
 **20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
 
+<div dir="rtl">
 المقاييس الرئيسية - المقاييس التالية تستخدم غالباً لتقييم أداء نماذج الانحدار، وذلك بأن يتم الأخذ في الحسبان عدد المتغيرات n المستخدمة فيها:
-
+</div>
 <br>
 
 **21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
 
+<div dir="rtl">
 حيث L هو الأرجحية، و ˆσ2 تقدير التباين الخاص بكل نتيجة.
-
+</div>
 <br>
 
 **22. Model selection**
 
+<div dir="rtl">
 اختيار النموذج
-
+</div>
 <br>
 
 **23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
 
+<div dir="rtl">
 مفردات - عند اختيار النموذج، نفرق بين 3 أجزاء من البيانات التي لدينا كالتالي:
-
+</div>
 <br>
 
 **24. [Training set, Validation set, Testing set]**
 
+<div dir="rtl">
 [مجموعة تدريب، مجموعة تحقق، مجموعة اختبار]
-
+</div>
 <br>
 
 **25. [Model is trained, Model is assessed, Model gives predictions]**
 
+<div dir="rtl">
 [يتم تدريب النموذج، يتم تقييم النموذج، النموذج يعطي التوقعات]
-
+</div>
 <br>
 
 **26. [Usually 80% of the dataset, Usually 20% of the dataset]**
 
+<div dir="rtl">
 [غالباً 80% من مجموعة البيانات، غالباً 20% من مجموعة البيانات]
-
+</div>
 <br>
 
 **27. [Also called hold-out or development set, Unseen data]**
 
+<div dir="rtl">
 [يطلق عليها كذلك المجموعة المُجنّبة أو مجموعة التطوير، بيانات لم يسبق رؤيتها من قبل]
-
+</div>
 <br>
 
 **28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
 
+<div dir="rtl">
 بمجرد اختيار النموذج، يتم تدريبه على مجموعة البيانات بالكامل ثم يتم اختباره على مجموعة اختبار لم يسبق رؤيتها من قبل. كما هو موضح في الشكل التالي:
-
+</div>
 <br>
 
 **29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
 
+<div dir="rtl">
 التحقق المتقاطع (Cross-validation) - التحقق المتقاطع، وكذلك يختصر بـ CV، هو طريقة تستخدم لاختيار نموذج بحيث لا يعتمد بشكل كبير على مجموعة بيانات التدريب المبدأية. أنواع التحقق المتقاطع المختلفة ملخصة في الجدول التالي:
-
+</div>
 <br>
 
 **30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
 
+<div dir="rtl">
 [التدريب على k-1 جزء والتقييم باستخدام الجزء الباقي، التدريب على n−p عينة والتقييم باستخدام الـ p عينات المتبقية]
-
+</div>
 <br>
 
 **31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
 
+<div dir="rtl">
 [بشكل عام k=5 أو 10، الحالة p=1 يطلق عليها الإبقاء على واحد (leave-one-out)]
-
+</div>
 <br>
 
 **32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
 
+<div dir="rtl">
 الطريقة الأكثر استخداماً يطلق عليها التحقق المتقاطع س جزء/أجزاء (k-fold)، ويتم فيها تقسيم البيانات إلى k جزء، بحيث يتم تدريب النموذج باستخدام k−1 والتحقق باستخدام الجزء المتبقي، ويتم تكرار ذلك k مرة. يتم بعد ذلك حساب معدل الأخطاء في الأجزاء k ويسمى خطأ التحقق المتقاطع.
-
+</div>
 <br>
 
 **33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
 
+<div dir="rtl">
 ضبط (Regularization) - عمليه الضبط تهدف إلى تفادي فرط التخصيص (overfit) للنموذج، وهو بذلك يتعامل مع مشاكل التباين العالي. الجدول التالي يلخص أنواع وطرق الضبط الأكثر استخداماً:
-
+</div>
 <br>
 
 **34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
+<div dir="rtl">
 [يقلص المُعاملات إلى 0، جيد لاختيار المتغيرات، يجعل المُعاملات أصغر، المفاضلة بين اختيار المتغيرات والمُعاملات الصغيرة]
-
+</div>
 <br>
 
 **35. Diagnostics**
 
+<div dir="rtl">
 التشخيصات
-
+</div>
 <br>
 
 **36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
 
+<div dir="rtl">
 الانحياز (Bias) - الانحياز للنموذج هو الفرق بين التنبؤ المتوقع والنموذج الحقيقي الذي نحاول تنبؤه للبيانات المعطاة.
-
+</div>
 <br>
 
 **37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
 
+<div dir="rtl">
 التباين (Variance) - تباين النموذج هو مقدار التغير في تنبؤ النموذج لنقاط البيانات المعطاة.
-
+</div>
 <br>
 
 **38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
 
+<div dir="rtl">
 موازنة الانحياز/التباين (Bias/variance tradeoff) - كلما زادت بساطة النموذج، زاد الانحياز، وكلما زاد تعقيد النموذج، زاد التباين.
-
+</div>
 <br>
 
 **39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
 
+<div dir="rtl">
 [الأعراض، توضيح الانحدار، توضيح التصنيف، توضيح التعلم العميق، العلاجات الممكنة]
-
+</div>
 <br>
 
 **40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
 
+<div dir="rtl">
 [خطأ التدريب عالي، خطأ التدريب قريب من خطأ الاختبار، انحياز عالي، خطأ التدريب أقل بقليل من خطأ الاختبار، خطأ التدريب منخفض جداً، خطأ التدريب أقل بكثير من خطأ الاختبار، تباين عالي]
-
+</div>
 <br>
 
 **41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
 
+<div dir="rtl">
 [زيادة تعقيد النموذج، إضافة المزيد من الخصائص، تدريب لمدة أطول، إجراء الضبط (regularization)، الحصول على المزيد من البيانات]
-
+</div>
 <br>
 
 **42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
 
+<div dir="rtl">
 تحليل الخطأ - تحليل الخطأ هو تحليل السبب الرئيسي للفرق في الأداء بين النماذج الحالية والنماذج المثالية.
-
+</div>
 <br>
 
 **43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
 
+<div dir="rtl">
 تحليل استئصالي (Ablative analysis) - التحليل الاستئصالي هو تحليل السبب الرئيسي للفرق في الأداء بين النماذج الحالية والنماذج المبدئية (baseline).
-
+</div>
 <br>
 
 **44. Regression metrics**
 
+<div dir="rtl">
 مقاييس الانحدار
-
+</div>
 <br>
 
 **45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
 
+<div dir="rtl">
 [مقاييس التصنيف، مصفوفة الدقّة، الضبط (accuracy)، الدقة (precision)، الاستدعاء (recall)، درجة F1]
-
+</div>
 <br>
 
 **46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
 
+<div dir="rtl">
 [مقاييس الانحدار، مربع R، معيار معامل مالوس (Mallow's)، معيار آكياك المعلوماتي (AIC)، معيار المعلومات البايزي (BIC)]
-
+</div>
 <br>
 
 **47. [Model selection, cross-validation, regularization]**
 
+<div dir="rtl">
 [اختيار النموذج، التحقق المتقاطع، الضبط]
-
+</div>
 <br>
 
 **48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
 
+<div dir="rtl">
 [التشخيصات، موازنة الانحياز/التباين، تحليل الخطأ/التحليل الاستئصالي]
+</div>

From 58e3043b47ed0414ba08684f57132ac3ff184a57 Mon Sep 17 00:00:00 2001
From: qunaieer <fsaq2000@gmail.com>
Date: Sun, 3 Nov 2019 07:57:50 +0300
Subject: [PATCH 444/531] Update cs-229-supervised-learning.md to RTL

---
 ar/cs-229-supervised-learning.md | 287 ++++++++++++++++++++-----------
 1 file changed, 191 insertions(+), 96 deletions(-)

diff --git a/ar/cs-229-supervised-learning.md b/ar/cs-229-supervised-learning.md
index 58bc6aeaa..9104d46a1 100644
--- a/ar/cs-229-supervised-learning.md
+++ b/ar/cs-229-supervised-learning.md
@@ -1,568 +1,663 @@
-﻿**1. Supervised Learning cheatsheet**
+**1. Supervised Learning cheatsheet**
 
+<div dir="rtl">
 مرجع سريع للتعلّم المُوَجَّه
-
+</div> 
 <br>
 
 **2. Introduction to Supervised Learning**
 
+<div dir="rtl">
 مقدمة للتعلّم المُوَجَّه
-
+</div> 
 <br>
 
 **3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
 
+<div dir="rtl">
 إذا كان لدينا مجموعة من نقاط البيانات {x(1),...,x(m)} مرتبطة بمجموعة مخرجات {y(1),...,y(m)}، نريد أن نبني مُصَنِّف يتعلم كيف يتوقع y من x.
-
-
+</div> 
 <br>
 
 **4. Type of prediction ― The different types of predictive models are summed up in the table below:**
 
+<div dir="rtl">
 نوع التوقّع - أنواع نماذج التوقّع المختلفة موضحة في الجدول التالي:
-
+</div> 
 <br>
 
 **5. [Regression, Classifier, Outcome, Examples]**
 
+<div dir="rtl">
 [الانحدار (Regression)، التصنيف (Classification)، المُخرَج، أمثلة]
-
+</div> 
 <br>
 
 **6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
 
+<div dir="rtl">
 [مستمر، صنف، انحدار خطّي (Linear regression)، انحدار لوجستي (Logistic regression)، آلة المتجهات الداعمة (SVM)، بايز البسيط (Naive Bayes)]
-
+</div> 
 <br>
 
 **7. Type of model ― The different models are summed up in the table below:**
 
+<div dir="rtl">
 نوع النموذج - أنواع النماذج المختلفة موضحة في الجدول التالي:
-
+</div> 
 <br>
 
 **8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
 
+<div dir="rtl">
 [نموذج تمييزي (discriminative)، نموذج توليدي (Generative)، الهدف، ماذا يتعلم، توضيح، أمثلة]
-
+</div> 
 <br>
 
 **9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, آلة المتجهات الداعمة (SVM), GDA, Naive Bayes]**
 
+<div dir="rtl">
 [التقدير المباشر لـ P(y|x)، تقدير P(x|y) ثم استنتاج P(y|x)، حدود القرار، التوزيع الاحتمالي للبيانات، الانحدار (Regression)، آلة المتجهات الداعمة (SVM)، GDA، بايز البسيط (Naive Bayes)]
-
+</div> 
 <br>
 
 **10. Notations and general concepts**
 
+<div dir="rtl">
 الرموز ومفاهيم أساسية
-
+</div> 
 <br>
 
 **11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
 
+<div dir="rtl">
 الفرضية (Hypothesis) - الفرضية، ويرمز لها بـ hθ، هي النموذج الذي نختاره. إذا كان لدينا المدخل x(i)، فإن المخرج الذي سيتوقعه النموذج هو hθ(x(i)).
-
+</div> </div> 
 <br>
 
 **12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
 
+<div dir="rtl">
 دالة الخسارة (Loss function) - دالة الخسارة هي الدالة L:(z,y)∈R×Y⟼L(z,y)∈R التي تأخذ كمدخلات القيمة المتوقعة z والقيمة الحقيقية y وتعطينا الاختلاف بينهما. الجدول التالي يحتوي على بعض دوال الخسارة الشائعة:
-
+</div> 
 <br>
 
 **13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
 
+<div dir="rtl">
 [خطأ أصغر تربيع (Least squared error)، خسارة لوجستية (Logistic loss)، خسارة مفصلية (Hinge loss)، الانتروبيا التقاطعية (Cross-entropy)]
-
+</div> 
 <br>
 
 **14. [Linear regression, Logistic regression, SVM, Neural Network]**
 
+<div dir="rtl">
 [الانحدار الخطّي (Linear regression)، الانحدار اللوجستي (Logistic regression)، آلة المتجهات الداعمة (SVM)، الشبكات العصبية (Neural Network)]
-
+</div> 
 <br>
 
 **15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
 
+<div dir="rtl">
 دالة التكلفة (Cost function) - دالة التكلفة J تستخدم عادة لتقييم أداء نموذج ما، ويتم تعريفها مع دالة الخسارة L كالتالي:
-
+</div> 
 <br>
 
 **16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
 
+<div dir="rtl">
 النزول الاشتقاقي (Gradient descent) - لنعرّف معدل التعلّم α∈R، يمكن تعريف القانون الذي يتم تحديث خوارزمية النزول الاشتقاقي من خلاله باستخدام معدل التعلّم ودالة التكلفة J كالتالي:
-
+</div> 
 <br>
 
 **17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
 
+<div dir="rtl">
 ملاحظة: في النزول الاشتقاقي العشوائي (Stochastic gradient descent (SGD)) يتم تحديث المُعاملات (parameters) بناءاً على كل عينة تدريب على حدة، بينما في النزول الاشتقاقي الحُزَمي (batch gradient descent) يتم تحديثها باستخدام حُزَم من عينات التدريب.
-
+</div> 
 <br>
 
 **18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
 
+<div dir="rtl">
 الأرجحية (Likelihood) - تستخدم أرجحية النموذج L(θ)، حيث أن θ هي المُدخلات، للبحث عن المُدخلات θ الأحسن عن طريق تعظيم (maximizing) الأرجحية. عملياً يتم استخدام الأرجحية اللوغاريثمية (log-likelihood) ℓ(θ)=log(L(θ)) حيث أنها أسهل في التحسين (optimize). فيكون لدينا:
-
+</div> 
 <br>
 
 **19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
 
+<div dir="rtl">
 خوارزمية نيوتن (Newton's algorithm) - خوارزمية نيوتن هي طريقة حسابية للعثور على θ بحيث يكون ℓ′(θ)=0. قاعدة التحديث للخوارزمية كالتالي:
-
+</div> 
 <br>
 
 **20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
 
+<div dir="rtl">
 ملاحظة: هناك خوارزمية أعم وهي متعددة الأبعاد (multidimensional)، يطلق عليها خوارزمية نيوتن-رافسون (Newton-Raphson)، ويتم تحديثها عبر القانون التالي:
-
+</div> 
 <br>
 
 **21. Linear models**
 
+<div dir="rtl">
 النماذج الخطيّة (Linear models)
-
+</div> 
 <br>
 
 **22. Linear regression**
 
+<div dir="rtl">
 الانحدار الخطّي (Linear regression)
-
+</div> 
 <br>
 
 **23. We assume here that y|x;θ∼N(μ,σ2)**
 
+<div dir="rtl">
 هنا نفترض أن y|x;θ∼N(μ,σ2)
-
+</div> 
 <br>
 
 **24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
 
+<div dir="rtl">
 المعادلة الطبيعية/الناظمية (Normal) - إذا كان لدينا المصفوفة X، القيمة θ التي تقلل من دالة التكلفة يمكن حلها رياضياً بشكل مغلق (closed-form) عن طريق:
-
+</div> 
 <br>
 
 **25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
 
+<div dir="rtl">
 خوارزمية أصغر معدل تربيع LMS - إذا كان لدينا معدل التعلّم α، فإن قانون التحديث لخوارزمية أصغر معدل تربيع (Least Mean Squares (LMS)) لمجموعة بيانات من m عينة، ويطلق عليه قانون تعلم ويدرو-هوف (Widrow-Hoff)، كالتالي:
-
+</div> 
 <br>
 
 **26. Remark: the update rule is a particular case of the gradient ascent.**
 
+<div dir="rtl">
 ملاحظة: قانون التحديث هذا يعتبر حالة خاصة من النزول الاشتقاقي (Gradient descent).
-
+</div> 
 <br>
 
 **27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
 
+<div dir="rtl">
 الانحدار الموزون محليّاً (LWR) - الانحدار الموزون محليّاً (Locally Weighted Regression)، ويعرف بـ LWR، هو نوع من الانحدار الخطي يَزِن كل عينة تدريب أثناء حساب دالة التكلفة باستخدام w(i)(x)، التي يمكن تعريفها باستخدام المُدخل (parameter) τ∈R كالتالي:
-
+</div> 
 <br>
 
 **28. Classification and logistic regression**
 
+<div dir="rtl">
 التصنيف والانحدار اللوجستي
-
+</div> 
 <br>
 
 **29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
 
+<div dir="rtl">
 دالة سيجمويد (Sigmoid) - دالة سيجمويد g، وتعرف كذلك بالدالة اللوجستية، تعرّف كالتالي:
-
+</div> 
 <br>
 
 **30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
 
+<div dir="rtl">
 الانحدار اللوجستي (Logistic regression) - نفترض هنا أن  y|x;θ∼Bernoulli(ϕ). فيكون لدينا:
-
+</div> 
 <br>
 
 **31. Remark: there is no closed form solution for the case of logistic regressions.**
 
+<div dir="rtl">
 ملاحظة: ليس هناك حل رياضي مغلق للانحدار اللوجستي.
-
+</div> 
 <br>
 
 **32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
 
+<div dir="rtl">
 انحدار سوفت ماكس (Softmax) - ويطلق عليه الانحدار اللوجستي متعدد الأصناف (multiclass logistic regression)، يستخدم لتعميم الانحدار اللوجستي إذا كان لدينا أكثر من صنفين. في العرف يتم تعيين θK=0، بحيث تجعل مُدخل بيرنوللي (Bernoulli) ϕi لكل فئة i يساوي:
-
+</div> 
 <br>
 
 **33. Generalized Linear Models**
 
+<div dir="rtl">
 النماذج الخطية العامة (Generalized Linear Models - GLM)
-
+</div> 
 <br>
 
 **34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
 
+<div dir="rtl">
 العائلة الأُسيّة (Exponential family) - يطلق على صنف من التوزيعات (distributions) بأنها تنتمي إلى العائلة الأسيّة إذا كان يمكن كتابتها بواسطة مُدخل قانوني (canonical parameter) η، إحصاء كافٍ (sufficient statistic) T(y)، ودالة تجزئة لوغاريثمية a(η)، كالتالي:
-
+</div> 
 <br>
 
 **35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
 
+<div dir="rtl">
 ملاحظة: كثيراً ما سيكون T(y)=y. كذلك فإن exp(−a(η)) يمكن أن تفسر كمُدخل تسوية (normalization) للتأكد من أن الاحتمالات يكون حاصل جمعها يساوي واحد.
-
+</div> 
 <br>
 
 **36. Here are the most common exponential distributions summed up in the following table:**
 
+<div dir="rtl">
 تم تلخيص أكثر التوزيعات الأسيّة استخداماً في الجدول التالي:
-
+</div> 
 <br>
 
 **37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
 
+<div dir="rtl">
 [التوزيع، بِرنوللي (Bernoulli)، جاوسي (Gaussian)، بواسون (Poisson)، هندسي (Geometric)]
-
+</div> 
 <br>
 
 **38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
 
+<div dir="rtl">
 افتراضات GLMs - تهدف النماذج الخطيّة العامة (GLM) إلى توقع المتغير العشوائي y كدالة لـ x∈Rn+1، وتستند إلى ثلاثة افتراضات:
-
+</div> 
 <br>
 
 **39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
 
+<div dir="rtl">
 ملاحظة: أصغر تربيع (least squares) الاعتيادي و الانحدار اللوجستي يعتبران من الحالات الخاصة للنماذج الخطيّة العامة.
-
+</div> 
 <br>
 
 **40. Support Vector Machines**
 
+<div dir="rtl">
 آلة المتجهات الداعمة (Support Vector Machines)
-
+</div> 
 <br>
 
 **41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
 
+<div dir="rtl">
 تهدف آلة المتجهات الداعمة (SVM) إلى العثور على الخط الذي يعظم أصغر مسافة إليه:
-
+</div> 
 <br>
 
 **42: Optimal margin classifier ― The optimal margin classifier h is such that:**
 
+<div dir="rtl">
 مُصنِّف الهامش الأحسن (Optimal margin classifier) - يعرَّف مُصنِّف الهامش الأحسن h كالتالي:
-
+</div> 
 <br>
 
 **43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
 
+<div dir="rtl">
 حيث (w,b)∈Rn×R هو الحل لمشكلة التحسين (optimization) التالية:
-
+</div> 
 <br>
 
 **44. such that**
 
+<div dir="rtl">
 بحيث أن
-
+</div> 
 <br>
 
 **45. support vectors**
 
+<div dir="rtl">
 المتجهات الداعمة (support vectors)
-
+</div> 
 <br>
 
 **46. Remark: the line is defined as wTx−b=0.**
 
+<div dir="rtl">
 ملاحظة: يتم تعريف الخط بهذه المعادلة wTx−b=0.
-
+</div> 
 <br>
 
 **47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
 
+<div dir="rtl">
 الخسارة المفصلية (Hinge loss) - تستخدم الخسارة المفصلية في حل SVM ويعرف على النحو التالي:
-
+</div> 
 <br>
 
 **48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
 
+<div dir="rtl">
 النواة (Kernel) - إذا كان لدينا دالة ربط الخصائص (features) ϕ، يمكننا تعريف النواة K كالتالي:
-
+</div> 
 <br>
 
 **49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
 
+<div dir="rtl">
 عملياً، يمكن أن تُعَرَّف الدالة K عن طريق المعادلة K(x,z)=exp(−||x−z||22σ2)، ويطلق عليها النواة الجاوسية (Gaussian kernel)، وهي تستخدم بكثرة.
-
+</div> 
 <br>
 
 **50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
 
+<div dir="rtl">
 [قابلية الفصل غير الخطي، استخدام ربط النواة، حد القرار في الفضاء الأصلي]
-
+</div> 
 <br>
 
 **51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
 
+<div dir="rtl">
 ملاحظة: نقول أننا نستخدم "حيلة النواة" (kernel trick) لحساب دالة التكلفة عند استخدام النواة لأننا في الحقيقة لا نحتاج أن نعرف التحويل الصريح ϕ، الذي يكون في الغالب شديد التعقيد. ولكن، نحتاج أن فقط أن نحسب القيم K(x,z).
-
+</div> 
 <br>
 
 **52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
 
+<div dir="rtl">
 اللّاغرانجي (Lagrangian) - يتم تعريف اللّاغرانجي L(w,b) على النحو التالي: 
-
+</div> 
 <br>
 
 **53. Remark: the coefficients βi are called the Lagrange multipliers.**
 
+<div dir="rtl">
 ملاحظة: المعامِلات (coefficients) βi يطلق عليها مضروبات لاغرانج (Lagrange multipliers).
-
+</div> 
 <br>
 
 **54. Generative Learning**
 
+<div dir="rtl">
 التعلم التوليدي (Generative Learning)
-
+</div> 
 <br>
 
 **55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
 
+<div dir="rtl">
 النموذج التوليدي في البداية يحاول أن يتعلم كيف تم توليد البيانات عن طريق تقدير P(x|y)، التي يمكن حينها استخدامها لتقدير P(y|x) باستخدام قانون بايز (Bayes' rule).
-
+</div> 
 <br>
 
 **56. Gaussian Discriminant Analysis**
 
+<div dir="rtl">
 تحليل التمايز الجاوسي (Gaussian Discriminant Analysis)
-
+</div> 
 <br>
 
 **57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
 
+<div dir="rtl">
 الإطار - تحليل التمايز الجاوسي يفترض أن y و x|y=0 و x|y=1 بحيث يكونوا كالتالي:
-
+</div> 
 <br>
 
 **58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
 
+<div dir="rtl">
 التقدير - الجدول التالي يلخص التقديرات التي يمكننا التوصل لها عند تعظيم الأرجحية (likelihood):
-
+</div> 
 <br>
 
 **59. Naive Bayes**
 
+<div dir="rtl">
 بايز البسيط (Naive Bayes)
-
+</div> 
 <br>
 
 **60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
 
+<div dir="rtl">
 الافتراض - يفترض نموذج بايز البسيط أن جميع الخصائص لكل عينة بيانات مستقلة (independent):
-
+</div> 
 <br>
 
 **61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
 
+<div dir="rtl">
 الحل - تعظيم الأرجحية اللوغاريثمية (log-likelihood) يعطينا الحلول التالية إذا كان k∈{0,1}، l∈[[1,L]]:
-
+</div> 
 <br>
 
 **62. Remark: Naive Bayes is widely used for text classification and spam detection.**
 
+<div dir="rtl">
 ملاحظة: بايز البسيط يستخدم بشكل واسع لتصنيف النصوص واكتشاف البريد الإلكتروني المزعج.
-
+</div> 
 <br>
 
 **63. Tree-based and ensemble methods**
 
+<div dir="rtl">
 الطرق الشجرية (tree-based) والتجميعية (ensemble)
-
+</div> 
 <br>
 
 **64. These methods can be used for both regression and classification problems.**
 
+<div dir="rtl">
 هذه الطرق يمكن استخدامها لكلٍ من مشاكل الانحدار (regression) والتصنيف (classification).
-
+</div> 
 <br>
 
 **65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
 
+<div dir="rtl">
 التصنيف والانحدار الشجري (CART) - والاسم الشائع له أشجار القرار (decision trees)، يمكن أن يمثل كأشجار ثنائية (binary trees). من المزايا لهذه الطريقة إمكانية تفسيرها بسهولة.
-
+</div> 
 <br>
 
 **66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
 
+<div dir="rtl">
 الغابة العشوائية (Random forest) - هي أحد الطرق الشجرية التي تستخدم عدداً كبيراً من أشجار القرار مبنية باستخدام مجموعة عشوائية من الخصائص. بخلاف شجرة القرار البسيطة لا يمكن تفسير النموذج بسهولة، ولكن أدائها العالي جعلها أحد الخوارزمية المشهورة.
-
+</div> 
 <br>
 
 **67. Remark: random forests are a type of ensemble methods.**
 
+<div dir="rtl">
 ملاحظة: أشجار القرار نوع من الخوارزميات التجميعية (ensemble).
-
+</div> 
 <br>
 
 **68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
 
+<div dir="rtl">
 التعزيز (Boosting) - فكرة خوارزميات التعزيز هي دمج عدة خوارزميات تعلم ضعيفة لتكوين نموذج قوي. الطرق الأساسية ملخصة في الجدول التالي:
-
+</div> 
 <br>
 
 **69. [Adaptive boosting, Gradient boosting]**
 
+<div dir="rtl">
 [التعزيز التَكَيُّفي (Adaptive boosting)، التعزيز الاشتقاقي (Gradient boosting)]
-
+</div> 
 <br>
 
 **70. High weights are put on errors to improve at the next boosting step**
 
+<div dir="rtl">
 يتم التركيز على مواطن الخطأ لتحسين النتيجة في الخطوة التالية.
-
+</div> 
 <br>
 
 **71. Weak learners trained on remaining errors**
 
+<div dir="rtl">
 يتم تدريب خوارزميات التعلم الضعيفة على الأخطاء المتبقية.
-
+</div> 
 <br>
 
 **72. Other non-parametric approaches**
 
+<div dir="rtl">
 طرق أخرى غير بارامترية (non-parametric)
-
+</div> 
 <br>
 
 **73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
 
+<div dir="rtl">
 خوارزمية أقرب الجيران (k-nearest neighbors) - تعتبر خوارزمية أقرب الجيران، وتعرف بـ k-NN، طريقة غير بارامترية، حيث يتم تحديد نتيجة عينة من البيانات من خلال عدد k من البيانات المجاورة في مجموعة التدريب. ويمكن استخدامها للتصنيف والانحدار.
-
+</div> 
 <br>
 
 **74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
 
+<div dir="rtl">
 ملاحظة: كلما زاد المُدخل k، كلما زاد الانحياز (bias)، وكلما نقص k، زاد التباين (variance).
-
+</div> 
 <br>
 
 **75. Learning Theory**
 
+<div dir="rtl">
 نظرية التعلُّم
-
+</div> 
 <br>
 
 **76. Union bound ― Let A1,...,Ak be k events. We have:**
 
+<div dir="rtl">
 حد الاتحاد (Union bound) - لنجعل A1,...,Ak تمثل k حدث. فيكون لدينا:
-
+</div> 
 <br>
 
 **77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
 
+<div dir="rtl">
 متراجحة هوفدينج (Hoeffding) - لنجعل Z1,..,Zm تمثل m متغير مستقلة وموزعة بشكل مماثل (iid) مأخوذة من توزيع بِرنوللي (Bernoulli distribution) ذا مُدخل ϕ. لنجعل ˆϕ متوسط العينة (sample mean) و γ>0 ثابت. فيكون لدينا:
-
+</div> 
 <br>
 
 **78. Remark: this inequality is also known as the Chernoff bound.**
 
+<div dir="rtl">
 ملاحظة: هذه المتراجحة تعرف كذلك بحد تشرنوف (Chernoff bound).
-
+</div> 
 <br>
 
 **79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
 
+<div dir="rtl">
 خطأ التدريب - ليكن لدينا المُصنِّف h، يمكن تعريف خطأ التدريب ˆϵ(h)، ويعرف كذلك بالخطر التجريبي أو الخطأ التجريبي، كالتالي:
-
+</div> 
 <br>
 
 **80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
 
+<div dir="rtl">
 تقريباً صحيح احتمالياً (Probably Approximately Correct (PAC)) - هو إطار يتم من خلاله إثبات العديد من نظريات التعلم، ويحتوي على الافتراضات التالية:
-
+</div> 
 <br>
 
 **81: the training and testing sets follow the same distribution **
 
+<div dir="rtl">
 مجموعتي التدريب والاختبار يتبعان نفس التوزيع.
-
+</div> 
 <br>
 
 **82. the training examples are drawn independently**
 
+<div dir="rtl">
 عينات التدريب تؤخذ بشكل مستقل.
-
+</div> 
 <br>
 
 **83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
 
+<div dir="rtl">
 مجموعة تكسيرية (Shattering Set) - إذا كان لدينا المجموعة S={x(1),...,x(d)}، ومجموعة مُصنٍّفات H، نقول أن H تكسر S (H shatters S) إذا كان لكل مجموعة علامات (labels) {y(1),...,y(d)} لدينا:
-
+</div> 
 <br>
 
 **84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
 
+<div dir="rtl">
 مبرهنة الحد الأعلى (Upper bound theorem) - لنجعل H فئة فرضية محدودة (finite hypothesis class) بحيث |H|=k، و δ وحجم العينة m ثابتين. حينها سيكون لدينا، مع احتمال على الأقل 1−δ، التالي:
-
+</div> 
 <br>
 
 **85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
 
+<div dir="rtl">
 بُعْد فابنيك-تشرفونيكس (Vapnik-Chervonenkis - VC) لفئة فرضية غير محدودة (infinite hypothesis class) H، ويرمز له بـ VC(H)، هو حجم أكبر مجموعة (set) التي تم تكسيرها بواسطة H (shattered by H).
-
+</div> 
 <br>
 
 **86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
 
+<div dir="rtl">
 ملاحظة: بُعْد فابنيك-تشرفونيكس VC لـ H = {مجموعة التصنيفات الخطية في بُعدين} يساوي 3.
-
+</div> 
 <br>
 
 **87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
 
+<div dir="rtl">
 مبرهنة فابنيك (Vapnik theorem) - ليكن لدينا H، مع VC(H)=d وعدد عيّنات التدريب m. سيكون لدينا، مع احتمال على الأقل 1−δ، التالي:
-
+</div> 
 <br>
 
 **88. [Introduction, Type of prediction, Type of model]**
 
+<div dir="rtl">
 [مقدمة، نوع التوقع، نوع النموذج]
-
+</div> 
 <br>
 
 **89. [Notations and general concepts, loss function, gradient descent, likelihood]**
 
+<div dir="rtl">
 [الرموز ومفاهيم أساسية، دالة الخسارة، النزول الاشتقاقي، الأرجحية]
-
+</div> 
 <br>
 
 **90. [Linear models, linear regression, logistic regression, generalized linear models]**
 
+<div dir="rtl">
 [النماذج الخطيّة، الانحدار الخطّي، الانحدار اللوجستي، النماذج الخطية العامة]
-
+</div> 
 <br>
 
 **91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
 
+<div dir="rtl">
 [آلة المتجهات الداعمة (SVM)، مُصنِّف الهامش الأحسن، الفرق المفصلي، النواة]
-
+</div> 
 <br>
 
 **92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
 
+<div dir="rtl">
 [التعلم التوليدي، تحليل التمايز الجاوسي، بايز البسيط]
-
+</div> 
 <br>
 
 **93. [Trees and ensemble methods, CART, Random forest, Boosting]**
 
+<div dir="rtl">
 [الطرق الشجرية والتجميعية، التصنيف والانحدار الشجري (CART)، الغابة العشوائية (Random forest)، التعزيز (Boosting)]
-
+</div> 
 <br>
 
 **94. [Other methods, k-NN]**
 
+<div dir="rtl">
 [طرق أخرى، خوارزمية أقرب الجيران (k-NN)]
-
+</div> 
 <br>
 
 **95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
 
+<div dir="rtl">
 [نظرية التعلُّم، متراجحة هوفدنك، تقريباً صحيح احتمالياً (PAC)، بُعْد فابنيك-تشرفونيكس (VC dimension)]
+</div> 

From aedb7f291e3d00c4aee0a562ee4e35242974653e Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Sun, 3 Nov 2019 22:45:19 +0900
Subject: [PATCH 445/531] vi translate for cheatsheet supervised learning

---
 vi/cheatsheet-supervised-learning.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/vi/cheatsheet-supervised-learning.md b/vi/cheatsheet-supervised-learning.md
index dd9f017c6..3bdac042a 100644
--- a/vi/cheatsheet-supervised-learning.md
+++ b/vi/cheatsheet-supervised-learning.md
@@ -228,7 +228,7 @@
 
 **39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
 
-&#10230; Chú ý: Bình phương nhỏ nhất thông thường và logistic regression đều là các trường hợp đặc biệt của các mô hình tuyến tính tổng quát.
+&#10230; Chú ý: Bình phương nhỏ nhất thông thường và hồi quy logistic đều là các trường hợp đặc biệt của các mô hình tuyến tính tổng quát.
 
 <br>
 
@@ -438,7 +438,7 @@
 
 **74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
 
-&#10230; Chú ý: Tham số k cao hơn, bias cao hơn, tham số k thấp hơn, phương sai cao hơn
+&#10230; Chú ý: Tham số k cao hơn, độ chệch (bias) cao hơn, tham số k thấp hơn, phương sai cao hơn
 
 <br>
 

From 70ba0e4969d7f9fd3b9493e737e1d97ccff9b0c0 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 4 Nov 2019 22:18:06 -0800
Subject: [PATCH 446/531] Add reviewer to contributors

---
 CONTRIBUTORS | 128 +++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 123 insertions(+), 5 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 8dfa394f9..ccff563f5 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -1,6 +1,19 @@
---ar
+ --ar
+  Amjad Khatabi (translation of deep learning)
+  Zaid Alyafeai (review of deep learning)
+  
+  Zaid Alyafeai (translation of linear algebra)
+  Amjad Khatabi (review of linear algebra)
+  Mazen Melibari (review of linear algebra)
+  
+  Fares Al-Quaneier (translation of machine learning tips and tricks)
+  Zaid Alyafeai (review of machine learning tips and tricks)
+  
+  Fares Al-Quaneier (translation of supervised learning)
+  Zaid Alyafeai (review of supervised learning)
   
   Redouane Lguensat (translation of unsupervised learning)
+  Fares Al-Quaneier (review of unsupervised learning)
   
 --de
 
@@ -38,9 +51,16 @@
   Fernando Diaz (review of unsupervised learning)
   
 --fa
+  AlisterTA (translation of convolutional neural networks)
+  Ehsan Kermani (translation of convolutional neural networks)
+  Erfan Noury (review of convolutional neural networks)
+
   AlisterTA (translation of deep learning)
   Mohammad Karimi (review of deep learning)
   Erfan Noury (review of deep learning)
+  
+  AlisterTA (translation of deep learning tips and tricks)
+  Erfan Noury (review of deep learning tips and tricks)
 
   Erfan Noury (translation of linear algebra)
   Mohammad Karimi (review of linear algebra)
@@ -52,7 +72,10 @@
 
   Erfan Noury (translation of probabilities and statistics)
   Mohammad Karimi (review of probabilities and statistics)
-  
+
+  AlisterTA (translation of recurrent neural networks)
+  Erfan Noury (review of recurrent neural networks)
+
   Amirhosein Kazemnejad (translation of supervised learning)
   Erfan Noury (review of supervised learning)
   Mohammad Karimi (review of supervised learning)
@@ -67,17 +90,59 @@
 
 --hi
 
---ja
+--id
+  Prasetia Utama Putra (translation of convolutional neural networks)
+  Gunawan Tri (review of convolutional neural networks)
 
+--ko
+  Wooil Jeong (translation of machine learning tips and tricks)
+  
+  Wooil Jeong (translation of probabilities and statistics)
+  
+  Kwang Hyeok Ahn (translation of unsupervised learning)
+
+--ja
+  Tran Tuan Anh (translation of convolutional neural networks)
+  Yoshiyuki Nakai (review of convolutional neural networks)
+  Linh Dang (review of convolutional neural networks)
+  
+  Kamuela Lau (translation of deep learning tips and tricks)
+  Yoshiyuki Nakai (review of deep learning tips and tricks)
+  Hiroki Mori (review of deep learning tips and tricks)
+  
+  Robert Altena (translation of linear algebra)
+  Kamuela Lau (review of linear algebra)
+  
+  Takatoshi Nao (translation of probabilities and statistics)
+  Yuta Kanzawa (review of probabilities and statistics)
+  
+  H. Hamano (translation of recurrent neural networks)
+  Yoshiyuki Nakai (review of recurrent neural networks)
+  
+  Yuta Kanzawa (translation of supervised learning)
+  Tran Tuan Anh (review of supervised learning)
+  
+  Tran Tuan Anh (translation of unsupervised learning)
+  Yoshiyuki Nakai (review of unsupervised learning)
+  Yuta Kanzawa (review of unsupervised learning)
+  Dan Lillrank (review of unsupervised learning)
+  
 --pt
+  Leticia Portella (translation of convolutional neural networks)
+  Gabriel Aparecido Fonseca (review of convolutional neural networks)
+
   Gabriel Fonseca (translation of deep learning)
   Leticia Portella (review of deep learning)
 
   Gabriel Fonseca (translation of linear algebra)
   Leticia Portella (review of linear algebra)
+  
+  Fernando Santos (translation of machine learning tips and tricks)
+  Leticia Portella (review of machine learning tips and tricks)
+  Gabriel Fonseca (review of machine learning tips and tricks)
 
-  Leticia Portella (translation of probability)
-  Flavio Clesio (review of probability)
+  Leticia Portella (translation of probabilities and statistics)
+  Flavio Clesio (review of probabilities and statistics)
 
   Leticia Portella (translation of supervised learning)
   Gabriel Fonseca (review of supervised learning)
@@ -87,12 +152,50 @@
   Tiago Danin (review of unsupervised learning)
 
 --tr
+  Ayyüce Kızrak (translation of convolutional neural networks)
+  Yavuz Kömeçoğlu (review of convolutional neural networks)
+
   Ekrem Çetinkaya (translation of deep learning)
   Omer Bukte (review of deep learning)
   
+  Ayyüce Kızrak (translation of deep learning tips and tricks)
+  Yavuz Kömeçoğlu (review of deep learning tips and tricks)
+  
   Kadir Tekeli (translation of linear algebra)
   Ekrem Çetinkaya (review of linear algebra)
   
+  Ayyüce Kızrak (translation of logic-based models)
+  Başak Buluz (review of logic-based models)
+  
+  Seray Beşer (translation of machine learning tips and tricks)
+  Ayyüce Kızrak (review of machine learning tips and tricks)
+  Yavuz Kömeçoğlu (review of machine learning tips and tricks)
+
+  Ayyüce Kızrak (translation of probabilities and statistics)
+  Başak Buluz (review of probabilities and statistics)
+  
+  Başak Buluz (translation of recurrent neural networks)
+  Yavuz Kömeçoğlu (review of recurrent neural networks)
+  
+  Yavuz Kömeçoğlu (translation of reflex-based models)
+  Ayyüce Kızrak (review of reflex-based models)
+  
+  Cemal Gurpinar (translation of states-based models)
+  Başak Buluz (review of states-based models)
+  
+  Başak Buluz (translation of supervised learning)
+  Ayyüce Kızrak (review of supervised learning)
+  
+  Yavuz Kömeçoğlu (translation of unsupervised learning)
+  Başak Buluz (review of unsupervised learning)
+  
+  Başak Buluz (translation of variables-based models)
+  Ayyüce Kızrak (review of variables-based models)
+  
+--uk
+  Gregory Reshetniak (translation of probabilities and statistics)
+  Denys (review of probabilities and statistics)
+  
 --zh
   Wang Hongnian (translation of supervised learning)
   Xiaohu Zhu (朱小虎) (review of supervised learning)
@@ -102,3 +205,18 @@
   kevingo (translation of deep learning)
   TobyOoO (review of deep learning)
 
+  kevingo (translation of linear algebra)
+  Miyaya (review of linear algebra)
+
+  kevingo (translation of probabilities and statistics)
+  johnnychhsu (review of probabilities and statistics)
+
+  kevingo (translation of supervised learning)
+  accelsao (review of supervised learning)
+
+  kevingo (translation of unsupervised learning)
+  imironhead (review of unsupervised learning)
+  johnnychhsu (review of unsupervised learning)
+
+  kevingo (translation of machine learning tips and tricks)
+  kentropy (review of machine learning tips and tricks)

From 9b0f8187cfa246b29d297b269a17570ee3558a7e Mon Sep 17 00:00:00 2001
From: shervinea <shervine.amidi@centraliens.net>
Date: Mon, 4 Nov 2019 22:37:13 -0800
Subject: [PATCH 447/531] Synchronize branch

---
 .DS_Store                                     | Bin 0 -> 6148 bytes
 CONTRIBUTORS                                  |  74 +-
 README.md                                     | 126 ++-
 ...tsheet-machine-learning-tips-and-tricks.md | 285 -----
 ar/cheatsheet-supervised-learning.md          | 567 ----------
 ar/cs-229-deep-learning.md                    | 323 ++++++
 ar/cs-229-linear-algebra.md                   | 413 ++++++++
 ar/cs-229-machine-learning-tips-and-tricks.md | 338 ++++++
 ar/cs-229-supervised-learning.md              | 663 ++++++++++++
 ...ing.md => cs-229-unsupervised-learning.md} |  12 +-
 ar/refresher-linear-algebra.md                | 339 ------
 ar/refresher-probability.md                   | 381 -------
 de/cheatsheet-deep-learning.md                | 321 ------
 ...tsheet-machine-learning-tips-and-tricks.md | 285 -----
 de/cheatsheet-unsupervised-learning.md        | 340 ------
 ...ep-learning.md => cs-229-deep-learning.md} |   0
 ...ar-algebra.md => cs-229-linear-algebra.md} |   0
 ...s-229-machine-learning-tips-and-tricks.md} |   0
 ...r-probability.md => cs-229-probability.md} |   0
 ...rning.md => cs-229-supervised-learning.md} |   0
 ...ing.md => cs-229-unsupervised-learning.md} |   0
 ...ep-learning.md => cs-229-deep-learning.md} |   0
 ...ar-algebra.md => cs-229-linear-algebra.md} |   0
 ...s-229-machine-learning-tips-and-tricks.md} |   0
 ...r-probability.md => cs-229-probability.md} |   0
 ...rning.md => cs-229-supervised-learning.md} |   0
 ...ing.md => cs-229-unsupervised-learning.md} |   0
 fa/cs-230-convolutional-neural-networks.md    | 923 +++++++++++++++++
 fa/cs-230-deep-learning-tips-and-tricks.md    | 586 +++++++++++
 fa/cs-230-recurrent-neural-networks.md        | 868 ++++++++++++++++
 fr/cs-221-logic-models.md                     | 462 +++++++++
 fr/cs-221-reflex-models.md                    | 539 ++++++++++
 fr/cs-221-states-models.md                    | 980 ++++++++++++++++++
 fr/cs-221-variables-models.md                 | 617 +++++++++++
 ...ep-learning.md => cs-229-deep-learning.md} |   4 +-
 ...ar-algebra.md => cs-229-linear-algebra.md} |  12 +-
 ...s-229-machine-learning-tips-and-tricks.md} |   2 +-
 ...r-probability.md => cs-229-probability.md} |   4 +-
 ...rning.md => cs-229-supervised-learning.md} |  12 +-
 ...ing.md => cs-229-unsupervised-learning.md} |  14 +-
 fr/cs-230-convolutional-neural-networks.md    | 716 +++++++++++++
 fr/cs-230-deep-learning-tips-and-tricks.md    | 457 ++++++++
 fr/cs-230-recurrent-neural-networks.md        | 678 ++++++++++++
 he/cheatsheet-deep-learning.md                | 321 ------
 ...tsheet-machine-learning-tips-and-tricks.md | 285 -----
 he/cheatsheet-supervised-learning.md          | 567 ----------
 he/refresher-probability.md                   | 381 -------
 hi/cheatsheet-deep-learning.md                | 321 ------
 hi/cheatsheet-supervised-learning.md          | 567 ----------
 hi/cheatsheet-unsupervised-learning.md        | 340 ------
 hi/refresher-linear-algebra.md                | 339 ------
 hi/refresher-probability.md                   | 381 -------
 id/cs-230-convolutional-neural-networks.md    | 715 +++++++++++++
 .../cs-229-linear-algebra.md                  | 115 +-
 ja/cs-229-probability.md                      | 381 +++++++
 ja/cs-229-supervised-learning.md              | 567 ++++++++++
 ja/cs-229-unsupervised-learning.md            | 339 ++++++
 ja/cs-230-convolutional-neural-networks.md    | 717 +++++++++++++
 ja/cs-230-deep-learning-tips-and-tricks.md    | 457 ++++++++
 ja/cs-230-recurrent-neural-networks.md        | 678 ++++++++++++
 ko/cs-229-linear-algebra.md                   | 340 ++++++
 ko/cs-229-machine-learning-tips-and-tricks.md | 285 +++++
 ko/cs-229-probability.md                      | 381 +++++++
 ko/cs-229-unsupervised-learning.md            | 340 ++++++
 ...tsheet-machine-learning-tips-and-tricks.md | 285 -----
 ...ep-learning.md => cs-229-deep-learning.md} |   0
 ...ar-algebra.md => cs-229-linear-algebra.md} |   0
 pt/cs-229-machine-learning-tips-and-tricks.md | 284 +++++
 ...r-probability.md => cs-229-probability.md} |   0
 ...rning.md => cs-229-supervised-learning.md} |   0
 ...ing.md => cs-229-unsupervised-learning.md} |   0
 pt/cs-230-convolutional-neural-networks.md    | 718 +++++++++++++
 ru/cheatsheet-deep-learning.md                | 321 ------
 ...tsheet-machine-learning-tips-and-tricks.md | 285 -----
 ru/cheatsheet-supervised-learning.md          | 567 ----------
 ru/cheatsheet-unsupervised-learning.md        | 340 ------
 ru/refresher-linear-algebra.md                | 339 ------
 ru/refresher-probability.md                   | 381 -------
 template/cheatsheet-deep-learning.md          | 321 ------
 ...tsheet-machine-learning-tips-and-tricks.md | 285 -----
 template/cheatsheet-supervised-learning.md    | 567 ----------
 template/cheatsheet-unsupervised-learning.md  | 340 ------
 template/cs-221-logic-models.md               | 462 +++++++++
 template/cs-221-reflex-models.md              | 539 ++++++++++
 template/cs-221-states-models.md              | 980 ++++++++++++++++++
 template/cs-221-variables-models.md           | 617 +++++++++++
 .../cs-229-deep-learning.md                   |   4 +
 .../cs-229-linear-algebra.md                  |   4 +
 ...cs-229-machine-learning-tips-and-tricks.md |   4 +
 .../cs-229-probability.md                     |   4 +
 .../cs-229-supervised-learning.md             |   4 +
 .../cs-229-unsupervised-learning.md           |   6 +-
 .../cs-230-convolutional-neural-networks.md   | 716 +++++++++++++
 .../cs-230-deep-learning-tips-and-tricks.md   | 457 ++++++++
 template/cs-230-recurrent-neural-networks.md  | 677 ++++++++++++
 template/refresher-linear-algebra.md          | 339 ------
 template/refresher-probability.md             | 381 -------
 ...tsheet-machine-learning-tips-and-tricks.md | 285 -----
 tr/cheatsheet-supervised-learning.md          | 567 ----------
 tr/cheatsheet-unsupervised-learning.md        | 340 ------
 tr/cs-221-logic-models.md                     | 462 +++++++++
 tr/cs-221-reflex-models.md                    | 538 ++++++++++
 tr/cs-221-states-models.md                    | 980 ++++++++++++++++++
 tr/cs-221-variables-models.md                 | 617 +++++++++++
 ...ep-learning.md => cs-229-deep-learning.md} |  16 +-
 ...ar-algebra.md => cs-229-linear-algebra.md} |   0
 tr/cs-229-machine-learning-tips-and-tricks.md | 290 ++++++
 tr/cs-229-probability.md                      | 381 +++++++
 tr/cs-229-supervised-learning.md              | 567 ++++++++++
 tr/cs-229-unsupervised-learning.md            | 340 ++++++
 tr/cs-230-convolutional-neural-networks.md    | 712 +++++++++++++
 tr/cs-230-deep-learning-tips-and-tricks.md    | 450 ++++++++
 tr/cs-230-recurrent-neural-networks.md        | 674 ++++++++++++
 tr/refresher-probability.md                   | 381 -------
 uk/cs-229-probability.md                      | 381 +++++++
 ...ep-learning.md => cs-229-deep-learning.md} |   0
 .../cs-229-linear-algebra.md                  | 115 +-
 ...cs-229-machine-learning-tips-and-tricks.md | 116 +--
 .../cs-229-probability.md                     | 139 +--
 zh-tw/cs-229-supervised-learning.md           | 352 +++++++
 .../cs-229-unsupervised-learning.md           | 141 +--
 zh/cheatsheet-deep-learning.md                | 321 ------
 ...rning.md => cs-229-supervised-learning.md} |   0
 123 files changed, 26426 insertions(+), 13124 deletions(-)
 create mode 100644 .DS_Store
 delete mode 100644 ar/cheatsheet-machine-learning-tips-and-tricks.md
 delete mode 100644 ar/cheatsheet-supervised-learning.md
 create mode 100644 ar/cs-229-deep-learning.md
 create mode 100644 ar/cs-229-linear-algebra.md
 create mode 100644 ar/cs-229-machine-learning-tips-and-tricks.md
 create mode 100644 ar/cs-229-supervised-learning.md
 rename ar/{cheatsheet-unsupervised-learning.md => cs-229-unsupervised-learning.md} (99%)
 delete mode 100644 ar/refresher-linear-algebra.md
 delete mode 100644 ar/refresher-probability.md
 delete mode 100644 de/cheatsheet-deep-learning.md
 delete mode 100644 de/cheatsheet-machine-learning-tips-and-tricks.md
 delete mode 100644 de/cheatsheet-unsupervised-learning.md
 rename es/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (100%)
 rename es/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (100%)
 rename es/{cheatsheet-machine-learning-tips-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} (100%)
 rename es/{refresher-probability.md => cs-229-probability.md} (100%)
 rename es/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)
 rename es/{cheatsheet-unsupervised-learning.md => cs-229-unsupervised-learning.md} (100%)
 rename fa/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (100%)
 rename fa/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (100%)
 rename fa/{cheatsheet-machine-learning-tips-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} (100%)
 rename fa/{refresher-probability.md => cs-229-probability.md} (100%)
 rename fa/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)
 rename fa/{cheatsheet-unsupervised-learning.md => cs-229-unsupervised-learning.md} (100%)
 create mode 100644 fa/cs-230-convolutional-neural-networks.md
 create mode 100644 fa/cs-230-deep-learning-tips-and-tricks.md
 create mode 100644 fa/cs-230-recurrent-neural-networks.md
 create mode 100644 fr/cs-221-logic-models.md
 create mode 100644 fr/cs-221-reflex-models.md
 create mode 100644 fr/cs-221-states-models.md
 create mode 100644 fr/cs-221-variables-models.md
 rename fr/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (95%)
 rename fr/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (92%)
 rename fr/{cheatsheet-machine-learning-tips-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} (99%)
 rename fr/{refresher-probability.md => cs-229-probability.md} (98%)
 rename fr/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (96%)
 rename fr/{cheatsheet-unsupervised-learning.md => cs-229-unsupervised-learning.md} (95%)
 create mode 100644 fr/cs-230-convolutional-neural-networks.md
 create mode 100644 fr/cs-230-deep-learning-tips-and-tricks.md
 create mode 100644 fr/cs-230-recurrent-neural-networks.md
 delete mode 100644 he/cheatsheet-deep-learning.md
 delete mode 100644 he/cheatsheet-machine-learning-tips-and-tricks.md
 delete mode 100644 he/cheatsheet-supervised-learning.md
 delete mode 100644 he/refresher-probability.md
 delete mode 100644 hi/cheatsheet-deep-learning.md
 delete mode 100644 hi/cheatsheet-supervised-learning.md
 delete mode 100644 hi/cheatsheet-unsupervised-learning.md
 delete mode 100644 hi/refresher-linear-algebra.md
 delete mode 100644 hi/refresher-probability.md
 create mode 100644 id/cs-230-convolutional-neural-networks.md
 rename he/refresher-linear-algebra.md => ja/cs-229-linear-algebra.md (51%)
 create mode 100644 ja/cs-229-probability.md
 create mode 100644 ja/cs-229-supervised-learning.md
 create mode 100644 ja/cs-229-unsupervised-learning.md
 create mode 100644 ja/cs-230-convolutional-neural-networks.md
 create mode 100644 ja/cs-230-deep-learning-tips-and-tricks.md
 create mode 100644 ja/cs-230-recurrent-neural-networks.md
 create mode 100644 ko/cs-229-linear-algebra.md
 create mode 100644 ko/cs-229-machine-learning-tips-and-tricks.md
 create mode 100644 ko/cs-229-probability.md
 create mode 100644 ko/cs-229-unsupervised-learning.md
 delete mode 100644 pt/cheatsheet-machine-learning-tips-and-tricks.md
 rename pt/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (100%)
 rename pt/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (100%)
 create mode 100644 pt/cs-229-machine-learning-tips-and-tricks.md
 rename pt/{refresher-probability.md => cs-229-probability.md} (100%)
 rename pt/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)
 rename pt/{cheatsheet-unsupervised-learning.md => cs-229-unsupervised-learning.md} (100%)
 create mode 100644 pt/cs-230-convolutional-neural-networks.md
 delete mode 100644 ru/cheatsheet-deep-learning.md
 delete mode 100644 ru/cheatsheet-machine-learning-tips-and-tricks.md
 delete mode 100644 ru/cheatsheet-supervised-learning.md
 delete mode 100644 ru/cheatsheet-unsupervised-learning.md
 delete mode 100644 ru/refresher-linear-algebra.md
 delete mode 100644 ru/refresher-probability.md
 delete mode 100644 template/cheatsheet-deep-learning.md
 delete mode 100644 template/cheatsheet-machine-learning-tips-and-tricks.md
 delete mode 100644 template/cheatsheet-supervised-learning.md
 delete mode 100644 template/cheatsheet-unsupervised-learning.md
 create mode 100644 template/cs-221-logic-models.md
 create mode 100644 template/cs-221-reflex-models.md
 create mode 100644 template/cs-221-states-models.md
 create mode 100644 template/cs-221-variables-models.md
 rename ar/cheatsheet-deep-learning.md => template/cs-229-deep-learning.md (98%)
 rename de/refresher-linear-algebra.md => template/cs-229-linear-algebra.md (97%)
 rename hi/cheatsheet-machine-learning-tips-and-tricks.md => template/cs-229-machine-learning-tips-and-tricks.md (97%)
 rename de/refresher-probability.md => template/cs-229-probability.md (98%)
 rename de/cheatsheet-supervised-learning.md => template/cs-229-supervised-learning.md (98%)
 rename he/cheatsheet-unsupervised-learning.md => template/cs-229-unsupervised-learning.md (96%)
 create mode 100644 template/cs-230-convolutional-neural-networks.md
 create mode 100644 template/cs-230-deep-learning-tips-and-tricks.md
 create mode 100644 template/cs-230-recurrent-neural-networks.md
 delete mode 100644 template/refresher-linear-algebra.md
 delete mode 100644 template/refresher-probability.md
 delete mode 100644 tr/cheatsheet-machine-learning-tips-and-tricks.md
 delete mode 100644 tr/cheatsheet-supervised-learning.md
 delete mode 100644 tr/cheatsheet-unsupervised-learning.md
 create mode 100644 tr/cs-221-logic-models.md
 create mode 100644 tr/cs-221-reflex-models.md
 create mode 100644 tr/cs-221-states-models.md
 create mode 100644 tr/cs-221-variables-models.md
 rename tr/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (92%)
 rename tr/{refresher-linear-algebra.md => cs-229-linear-algebra.md} (100%)
 create mode 100644 tr/cs-229-machine-learning-tips-and-tricks.md
 create mode 100644 tr/cs-229-probability.md
 create mode 100644 tr/cs-229-supervised-learning.md
 create mode 100644 tr/cs-229-unsupervised-learning.md
 create mode 100644 tr/cs-230-convolutional-neural-networks.md
 create mode 100644 tr/cs-230-deep-learning-tips-and-tricks.md
 create mode 100644 tr/cs-230-recurrent-neural-networks.md
 delete mode 100644 tr/refresher-probability.md
 create mode 100644 uk/cs-229-probability.md
 rename zh-tw/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (100%)
 rename zh/refresher-linear-algebra.md => zh-tw/cs-229-linear-algebra.md (58%)
 rename zh/cheatsheet-machine-learning-tips-and-tricks.md => zh-tw/cs-229-machine-learning-tips-and-tricks.md (59%)
 rename zh/refresher-probability.md => zh-tw/cs-229-probability.md (56%)
 create mode 100644 zh-tw/cs-229-supervised-learning.md
 rename zh/cheatsheet-unsupervised-learning.md => zh-tw/cs-229-unsupervised-learning.md (59%)
 delete mode 100644 zh/cheatsheet-deep-learning.md
 rename zh/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)

diff --git a/.DS_Store b/.DS_Store
new file mode 100644
index 0000000000000000000000000000000000000000..5008ddfcf53c02e82d7eee2e57c38e5672ef89f6
GIT binary patch
literal 6148
zcmeH~Jr2S!425mzP>H1@V-^m;4Wg<&0T*E43hX&L&p$$qDprKhvt+--jT7}7np#A3
zem<@ulZcFPQ@L2!n>{z**<q8>++&mCkOWA81W14cNZ<zv;LbK1Poaz?KmsK2CSc!(
z0ynLxE!0092;Krf2c+FF_Fe*7ECH>lEfg7;MkzE(HCqgga^y>{tEnwC%0;vJ&^%eQ
zLs35+`xjp>T0<F0fCPF1$Cyrb|F7^5{eNG?83~ZUUlGt@xh*qZDeu<Z%US-OSsOPv
j)R!Z4KLME7ReXlK;d!wEw5GODWMKRea10D2@KpjYNUI8I

literal 0
HcmV?d00001

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index ccff563f5..8ba1a30e3 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -1,23 +1,23 @@
- --ar
+--ar
   Amjad Khatabi (translation of deep learning)
   Zaid Alyafeai (review of deep learning)
-  
+
   Zaid Alyafeai (translation of linear algebra)
   Amjad Khatabi (review of linear algebra)
   Mazen Melibari (review of linear algebra)
-  
+
   Fares Al-Quaneier (translation of machine learning tips and tricks)
   Zaid Alyafeai (review of machine learning tips and tricks)
-  
+
   Fares Al-Quaneier (translation of supervised learning)
   Zaid Alyafeai (review of supervised learning)
-  
+
   Redouane Lguensat (translation of unsupervised learning)
   Fares Al-Quaneier (review of unsupervised learning)
-  
+
 --de
 
---es 
+--es
   Erick Gabriel Mendoza Flores (translation of deep learning)
   Fernando Diaz (review of deep learning)
   Fernando González-Herrera (review of deep learning)
@@ -26,12 +26,12 @@
   Alonso Melgar López (review of deep learning)
   Gustavo Velasco-Hernández (review of deep learning)
   Juan Manuel Nava Zamudio (review of deep learning)
-  
+
   Fernando González-Herrera (translation of linear algebra)
   Fernando Diaz (review of linear algebra)
   Gustavo Velasco-Hernández (review of linear algebra)
   Juan P. Chavat (review of linear algebra)
-  
+
   David Jiménez Paredes (translation of machine learning tips and tricks)
   Fernando Diaz (translation of machine learning tips and tricks)
   Gustavo Velasco-Hernández (review of machine learning tips and tricks)
@@ -49,7 +49,7 @@
   Jaime Noel Alvarez Luna (translation of unsupervised learning)
   Alonso Melgar López (review of unsupervised learning)
   Fernando Diaz (review of unsupervised learning)
-  
+
 --fa
   AlisterTA (translation of convolutional neural networks)
   Ehsan Kermani (translation of convolutional neural networks)
@@ -58,13 +58,13 @@
   AlisterTA (translation of deep learning)
   Mohammad Karimi (review of deep learning)
   Erfan Noury (review of deep learning)
-  
+
   AlisterTA (translation of deep learning tips and tricks)
   Erfan Noury (review of deep learning tips and tricks)
 
   Erfan Noury (translation of linear algebra)
   Mohammad Karimi (review of linear algebra)
-  
+
   AlisterTA (translation of machine learning tips and tricks)
   Mohammad Reza (translation of machine learning tips and tricks)
   Erfan Noury (review of machine learning tips and tricks)
@@ -79,10 +79,10 @@
   Amirhosein Kazemnejad (translation of supervised learning)
   Erfan Noury (review of supervised learning)
   Mohammad Karimi (review of supervised learning)
-  
+
   Erfan Noury (translation of unsupervised learning)
   Mohammad Karimi (review of unsupervised learning)
-  
+
 --fr
   Original authors
 
@@ -96,37 +96,37 @@
 
 --ko
   Wooil Jeong (translation of machine learning tips and tricks)
-  
+
   Wooil Jeong (translation of probabilities and statistics)
-  
+
   Kwang Hyeok Ahn (translation of unsupervised learning)
 
 --ja
   Tran Tuan Anh (translation of convolutional neural networks)
   Yoshiyuki Nakai (review of convolutional neural networks)
   Linh Dang (review of convolutional neural networks)
-  
+
   Kamuela Lau (translation of deep learning tips and tricks)
   Yoshiyuki Nakai (review of deep learning tips and tricks)
   Hiroki Mori (review of deep learning tips and tricks)
-  
+
   Robert Altena (translation of linear algebra)
   Kamuela Lau (review of linear algebra)
-  
+
   Takatoshi Nao (translation of probabilities and statistics)
   Yuta Kanzawa (review of probabilities and statistics)
-  
+
   H. Hamano (translation of recurrent neural networks)
   Yoshiyuki Nakai (review of recurrent neural networks)
-  
+
   Yuta Kanzawa (translation of supervised learning)
   Tran Tuan Anh (review of supervised learning)
-  
+
   Tran Tuan Anh (translation of unsupervised learning)
   Yoshiyuki Nakai (review of unsupervised learning)
   Yuta Kanzawa (review of unsupervised learning)
   Dan Lillrank (review of unsupervised learning)
-  
+
 --pt
   Leticia Portella (translation of convolutional neural networks)
   Gabriel Aparecido Fonseca (review of convolutional neural networks)
@@ -136,7 +136,7 @@
 
   Gabriel Fonseca (translation of linear algebra)
   Leticia Portella (review of linear algebra)
-  
+
   Fernando Santos (translation of machine learning tips and tricks)
   Leticia Portella (review of machine learning tips and tricks)
   Gabriel Fonseca (review of machine learning tips and tricks)
@@ -147,7 +147,7 @@
   Leticia Portella (translation of supervised learning)
   Gabriel Fonseca (review of supervised learning)
   Flavio Clesio (review of supervised learning)
-  
+
   Gabriel Fonseca (translation of unsupervised learning)
   Tiago Danin (review of unsupervised learning)
 
@@ -157,45 +157,45 @@
 
   Ekrem Çetinkaya (translation of deep learning)
   Omer Bukte (review of deep learning)
-  
+
   Ayyüce Kızrak (translation of deep learning tips and tricks)
   Yavuz Kömeçoğlu (review of deep learning tips and tricks)
-  
+
   Kadir Tekeli (translation of linear algebra)
   Ekrem Çetinkaya (review of linear algebra)
-  
+
   Ayyüce Kızrak (translation of logic-based models)
   Başak Buluz (review of logic-based models)
-  
+
   Seray Beşer (translation of machine learning tips and tricks)
   Ayyüce Kızrak (review of machine learning tips and tricks)
   Yavuz Kömeçoğlu (review of machine learning tips and tricks)
 
   Ayyüce Kızrak (translation of probabilities and statistics)
   Başak Buluz (review of probabilities and statistics)
-  
+
   Başak Buluz (translation of recurrent neural networks)
   Yavuz Kömeçoğlu (review of recurrent neural networks)
-  
+
   Yavuz Kömeçoğlu (translation of reflex-based models)
   Ayyüce Kızrak (review of reflex-based models)
-  
+
   Cemal Gurpinar (translation of states-based models)
   Başak Buluz (review of states-based models)
-  
+
   Başak Buluz (translation of supervised learning)
   Ayyüce Kızrak (review of supervised learning)
-  
+
   Yavuz Kömeçoğlu (translation of unsupervised learning)
   Başak Buluz (review of unsupervised learning)
-  
+
   Başak Buluz (translation of variables-based models)
   Ayyüce Kızrak (review of variables-based models)
-  
+
 --uk
   Gregory Reshetniak (translation of probabilities and statistics)
   Denys (review of probabilities and statistics)
-  
+
 --zh
   Wang Hongnian (translation of supervised learning)
   Xiaohu Zhu (朱小虎) (review of supervised learning)
diff --git a/README.md b/README.md
index cc5798a0a..0e6dfaf73 100644
--- a/README.md
+++ b/README.md
@@ -1,46 +1,20 @@
 # Translation of VIP Cheatsheets
 ## Goal
-This repository aims at collaboratively translating our [Machine Learning cheatsheets](https://github.com/afshinea/stanford-cs-229-machine-learning) into a ton of languages, so that this content can be enjoyed by anyone from any part of the world!
-
-## Progression
-|Cheatsheet topic|Español|فارسی|Français|日本語|Português|简体中文|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|
-|Supervised learning|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/52)|
-|Unsupervised learning|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|
-|ML tips and tricks|done|done|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/57)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|
-|Probabilities and Statistics|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|
-|Linear algebra|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
-
-|Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|
-|:---|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|
-|Supervised learning|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|
-|Unsupervised learning|not started|not started|not started|not started|not started|
-|ML tips and tricks|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/39)|not started|
-|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/26)|not started|not started|not started|not started|
-|Linear algebra|not started|not started|not started|done|not started|
-
-
-|Cheatsheet topic|Polski|Suomi|Català|Українська|
-|:---|:---:|:---:|:---:|:---:|
-|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|
-|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started|
-|Unsupervised learning|not started|not started|not started|not started|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|
-|Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/64)|
-|Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|
-
-If your favorite language is missing, please feel free to add it!
+This repository aims at collaboratively translating our [Machine Learning](https://github.com/afshinea/stanford-cs-229-machine-learning), [Deep Learning](https://github.com/afshinea/stanford-cs-230-deep-learning) and [Artificial Intelligence](https://github.com/afshinea/stanford-cs-221-artificial-intelligence) cheatsheets into a ton of languages, so that this content can be enjoyed by anyone from any part of the world!
 
 ## Contribution guidelines
-Please first check for [existing pull requests](https://github.com/shervinea/cheatsheet-translation/pulls) before submitting yours. Also, please propose the translation of **only one** cheatsheet per pull request -- it simplifies a lot the review process.
+The translation process of each cheatsheet contains two steps:
+- the **translation** step, where contributors follow a template of items to translate,
+- the **review** step, where contributors go through each expression translated by their peers, on top of which they add their suggestions and remarks.
+
+### Translators
+0. Check for [existing pull requests](https://github.com/shervinea/cheatsheet-translation/pulls) to see which cheatsheet is yet to be translated.
 
 1. Fork the repository.
 
-2. Go to the folder associated to the language of your choice (e.g. `es/` for Spanish, `zh/` for Mandarin Chinese). If it is not created yet, copy `template/` into a language folder with a naming that follows the [ISO 639-1 notation](https://www.loc.gov/standards/iso639-2/php/code_list.php).
+2. Copy the template of the cheatsheet you wish to translate (provided in the `template/` folder) into the language folder with a naming that follows the [ISO 639-1 notation](https://www.loc.gov/standards/iso639-2/php/code_list.php).
 
-3. Translate anything you want by keeping the following template:
+3. Translate anything you want by keeping the [following template](https://github.com/shervinea/cheatsheet-translation/tree/master/template):
 > 34. **English blabla**
 >
 > &#10230; Translated blabla
@@ -49,7 +23,85 @@ Please first check for [existing pull requests](https://github.com/shervinea/che
 
 5. Submit a [pull request](https://help.github.com/articles/creating-a-pull-request/) and call it `[code of language name] Topic name`. For example, a translation in Spanish of the deep learning cheatsheet will be called `[es] Deep learning`.
 
-Submissions will have to be reviewed by a fellow native speaker before being accepted.
+### Reviewers
+1. Go to the [list of pull requests](https://github.com/shervinea/cheatsheet-translation/pulls) and filter them by your native language (e.g. `[es]` for Spanish, `[zh]` for Mandarin Chinese).
+
+2. Locate pull requests where help is needed. Those contain the tag `reviewer wanted`.
+
+3. Review the content line per line and add comments and suggestions when necessary.
+
+### Important note
+Please make sure to propose the translation of **only one** cheatsheet per pull request -- it simplifies a lot the review process.
+
+## Progression
+### CS 221 (Artificial Intelligence)
+| |[Reflex models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-reflex-models.md)|[States models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-states-models.md)|[Variables models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-variables-models.md)|[Logic models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-logic-models.md)|
+|:---|:---:|:---:|:---:|:---:|
+|**Deutsch**|not started|not started|not started|not started|
+|**Español**|not started|not started|not started|not started|
+|**فارسی**|not started|not started|not started|not started|
+|**Français**|done|done|done|done|
+|**עִבְרִית**|not started|not started|not started|not started|
+|**Italiano**|not started|not started|not started|not started|
+|**日本語**|not started|not started|not started|not started|
+|**한국어**|not started|not started|not started|not started|
+|**Português**|not started|not started|not started|not started|
+|**Türkçe**|done|done|done|done|
+|**Tiếng Việt**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/179)|
+|**简体中文**|not started|not started|not started|not started|
+|**繁體中文**|not started|not started|not started|not started|
+
+### CS 229 (Machine Learning)
+| |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)|
+|:---|:---:|:---:|:---:|:---:|:---:|:---:|
+|**العَرَبِيَّة**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/182)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|
+|**Català**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|
+|**Deutsch**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|
+|**Español**|done|done|done|done|done|done|
+|**فارسی**|done|done|done|done|done|done|
+|**Suomi**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|not started|not started|not started|
+|**Français**|done|done|done|done|done|done|
+|**עִבְרִית**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/156)|not started|not started|not started|not started|not started|
+|**हिन्दी**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|not started|not started|
+|**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
+|**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
+|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|not started|not started|
+|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
+|**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
+|**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|
+|**Português**|done|done|done|done|done|done|
+|**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started|
+|**Türkçe**|done|done|done|done|done|done|
+|**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
+|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/162)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/177)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
+|**简体中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|**繁體中文**|done|done|done|done|done|done|
+
+### CS 230 (Deep Learning)
+| |[Convolutional Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-convolutional-neural-networks.md)|[Recurrent Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-recurrent-neural-networks.md)|[Deep Learning tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-deep-learning-tips-and-tricks.md)|
+|:---|:---:|:---:|:---:|
+|**العَرَبِيَّة**|not started|not started|not started|
+|**Català**|not started|not started|not started|
+|**Deutsch**|not started|not started|not started|
+|**Español**|not started|not started|not started|
+|**فارسی**|done|done|done|
+|**Suomi**|not started|not started|not started|
+|**Français**|done|done|done|
+|**עִבְרִית**|not started|not started|not started|
+|**हिन्दी**|not started|not started|not started|
+|**Magyar**|not started|not started|not started|
+|**Bahasa Indonesia**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
+|**Italiano**|not started|not started|not started|
+|**日本語**|done|done|done|
+|**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
+|**Polski**|not started|not started|not started|
+|**Português**|done|not started|not started|
+|**Русский**|not started|not started|not started|
+|**Türkçe**|done|done|done|
+|**Українська**|not started|not started|not started|
+|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/180)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
+|**简体中文**|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/181)|not started|
+|**繁體中文**|not started|not started|not started|
 
 ## Acknowledgements
-Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching/cs-229.html).
+Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).
diff --git a/ar/cheatsheet-machine-learning-tips-and-tricks.md b/ar/cheatsheet-machine-learning-tips-and-tricks.md
deleted file mode 100644
index 9712297b8..000000000
--- a/ar/cheatsheet-machine-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,285 +0,0 @@
-**1. Machine Learning tips and tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Classification metrics**
-
-&#10230;
-
-<br>
-
-**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
-
-&#10230;
-
-<br>
-
-**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**5. [Predicted class, Actual class]**
-
-&#10230;
-
-<br>
-
-**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
-
-&#10230;
-
-<br>
-
-**7. [Metric, Formula, Interpretation]**
-
-&#10230;
-
-<br>
-
-**8. Overall performance of model**
-
-&#10230;
-
-<br>
-
-**9. How accurate the positive predictions are**
-
-&#10230;
-
-<br>
-
-**10. Coverage of actual positive sample**
-
-&#10230;
-
-<br>
-
-**11. Coverage of actual negative sample**
-
-&#10230;
-
-<br>
-
-**12. Hybrid metric useful for unbalanced classes**
-
-&#10230;
-
-<br>
-
-**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**14. [Metric, Formula, Equivalent]**
-
-&#10230;
-
-<br>
-
-**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
-
-&#10230;
-
-<br>
-
-**16. [Actual, Predicted]**
-
-&#10230;
-
-<br>
-
-**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
-
-&#10230;
-
-<br>
-
-**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
-
-&#10230;
-
-<br>
-
-**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
-
-&#10230;
-
-<br>
-
-**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
-
-&#10230;
-
-<br>
-
-**22. Model selection**
-
-&#10230;
-
-<br>
-
-**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
-
-&#10230;
-
-<br>
-
-**24. [Training set, Validation set, Testing set]**
-
-&#10230;
-
-<br>
-
-**25. [Model is trained, Model is assessed, Model gives predictions]**
-
-&#10230;
-
-<br>
-
-**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
-
-&#10230;
-
-<br>
-
-**27. [Also called hold-out or development set, Unseen data]**
-
-&#10230;
-
-<br>
-
-**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
-
-&#10230;
-
-<br>
-
-**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
-
-&#10230;
-
-<br>
-
-**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
-
-&#10230;
-
-<br>
-
-**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
-
-&#10230;
-
-<br>
-
-**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
-
-&#10230;
-
-<br>
-
-**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-**35. Diagnostics**
-
-&#10230;
-
-<br>
-
-**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
-
-&#10230;
-
-<br>
-
-**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
-
-&#10230;
-
-<br>
-
-**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
-
-&#10230;
-
-<br>
-
-**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
-
-&#10230;
-
-<br>
-
-**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
-
-&#10230;
-
-<br>
-
-**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
-
-&#10230;
-
-<br>
-
-**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
-
-&#10230;
-
-<br>
-
-**44. Regression metrics**
-
-&#10230;
-
-<br>
-
-**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-**47. [Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/ar/cheatsheet-supervised-learning.md b/ar/cheatsheet-supervised-learning.md
deleted file mode 100644
index a6b19ea1c..000000000
--- a/ar/cheatsheet-supervised-learning.md
+++ /dev/null
@@ -1,567 +0,0 @@
-**1. Supervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Supervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
-
-&#10230;
-
-<br>
-
-**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**5. [Regression, Classifier, Outcome, Examples]**
-
-&#10230;
-
-<br>
-
-**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**7. Type of model ― The different models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
-
-&#10230;
-
-<br>
-
-**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**10. Notations and general concepts**
-
-&#10230;
-
-<br>
-
-**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
-
-&#10230;
-
-<br>
-
-**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
-
-&#10230;
-
-<br>
-
-**14. [Linear regression, Logistic regression, SVM, Neural Network]**
-
-&#10230;
-
-<br>
-
-**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
-
-&#10230;
-
-<br>
-
-**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
-
-&#10230;
-
-<br>
-
-**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
-
-&#10230;
-
-<br>
-
-**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
-
-&#10230;
-
-<br>
-
-**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
-
-&#10230;
-
-<br>
-
-**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
-
-&#10230;
-
-<br>
-
-**21. Linear models**
-
-&#10230;
-
-<br>
-
-**22. Linear regression**
-
-&#10230;
-
-<br>
-
-**23. We assume here that y|x;θ∼N(μ,σ2)**
-
-&#10230;
-
-<br>
-
-**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
-
-&#10230;
-
-<br>
-
-**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
-
-&#10230;
-
-<br>
-
-**26. Remark: the update rule is a particular case of the gradient ascent.**
-
-&#10230;
-
-<br>
-
-**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
-
-&#10230;
-
-<br>
-
-**28. Classification and logistic regression**
-
-&#10230;
-
-<br>
-
-**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
-
-&#10230;
-
-<br>
-
-**31. Remark: there is no closed form solution for the case of logistic regressions.**
-
-&#10230;
-
-<br>
-
-**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
-
-&#10230;
-
-<br>
-
-**33. Generalized Linear Models**
-
-&#10230;
-
-<br>
-
-**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
-
-&#10230;
-
-<br>
-
-**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
-
-&#10230;
-
-<br>
-
-**36. Here are the most common exponential distributions summed up in the following table:**
-
-&#10230;
-
-<br>
-
-**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
-
-&#10230;
-
-<br>
-
-**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
-
-&#10230;
-
-<br>
-
-**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
-
-&#10230;
-
-<br>
-
-**40. Support Vector Machines**
-
-&#10230;
-
-<br>
-
-**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
-
-&#10230;
-
-<br>
-
-**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
-
-&#10230;
-
-<br>
-
-**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
-
-&#10230;
-
-<br>
-
-**44. such that**
-
-&#10230;
-
-<br>
-
-**45. support vectors**
-
-&#10230;
-
-<br>
-
-**46. Remark: the line is defined as wTx−b=0.**
-
-&#10230;
-
-<br>
-
-**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
-
-&#10230;
-
-<br>
-
-**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
-
-&#10230;
-
-<br>
-
-**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
-
-&#10230;
-
-<br>
-
-**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
-
-&#10230;
-
-<br>
-
-**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the coefficients βi are called the Lagrange multipliers.**
-
-&#10230;
-
-<br>
-
-**54. Generative Learning**
-
-&#10230;
-
-<br>
-
-**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
-
-&#10230;
-
-<br>
-
-**56. Gaussian Discriminant Analysis**
-
-&#10230;
-
-<br>
-
-**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
-
-&#10230;
-
-<br>
-
-**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
-
-&#10230;
-
-<br>
-
-**59. Naive Bayes**
-
-&#10230;
-
-<br>
-
-**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
-
-&#10230;
-
-<br>
-
-**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
-
-&#10230;
-
-<br>
-
-**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
-
-&#10230;
-
-<br>
-
-**63. Tree-based and ensemble methods**
-
-&#10230;
-
-<br>
-
-**64. These methods can be used for both regression and classification problems.**
-
-&#10230;
-
-<br>
-
-**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
-
-&#10230;
-
-<br>
-
-**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
-
-&#10230;
-
-<br>
-
-**67. Remark: random forests are a type of ensemble methods.**
-
-&#10230;
-
-<br>
-
-**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**69. [Adaptive boosting, Gradient boosting]**
-
-&#10230;
-
-<br>
-
-**70. High weights are put on errors to improve at the next boosting step**
-
-&#10230;
-
-<br>
-
-**71. Weak learners trained on remaining errors**
-
-&#10230;
-
-<br>
-
-**72. Other non-parametric approaches**
-
-&#10230;
-
-<br>
-
-**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
-
-&#10230;
-
-<br>
-
-**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**75. Learning Theory**
-
-&#10230;
-
-<br>
-
-**76. Union bound ― Let A1,...,Ak be k events. We have:**
-
-&#10230;
-
-<br>
-
-**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
-
-&#10230;
-
-<br>
-
-**78. Remark: this inequality is also known as the Chernoff bound.**
-
-&#10230;
-
-<br>
-
-**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
-
-&#10230;
-
-<br>
-
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
-
-&#10230;
-
-<br>
-
-**81: the training and testing sets follow the same distribution **
-
-&#10230;
-
-<br>
-
-**82. the training examples are drawn independently**
-
-&#10230;
-
-<br>
-
-**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
-
-&#10230;
-
-<br>
-
-**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
-
-&#10230;
-
-<br>
-
-**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
-
-&#10230;
-
-<br>
-
-**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**88. [Introduction, Type of prediction, Type of model]**
-
-&#10230;
-
-<br>
-
-**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
-
-&#10230;
-
-<br>
-
-**90. [Linear models, linear regression, logistic regression, generalized linear models]**
-
-&#10230;
-
-<br>
-
-**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
-
-&#10230;
-
-<br>
-
-**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
-
-&#10230;
-
-<br>
-
-**94. [Other methods, k-NN]**
-
-&#10230;
-
-<br>
-
-**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
-
-&#10230;
diff --git a/ar/cs-229-deep-learning.md b/ar/cs-229-deep-learning.md
new file mode 100644
index 000000000..d4cf59da6
--- /dev/null
+++ b/ar/cs-229-deep-learning.md
@@ -0,0 +1,323 @@
+
+**1. Deep Learning cheatsheet**
+
+&#10230;
+ملخص مختصر التعلم العميق
+<br> 
+
+**2. Neural Networks**
+
+&#10230;
+الشبكة العصبونية الاصطناعية(Neural Networks)
+<br> 
+**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
+
+&#10230;
+الشبكة العصبونية الاصطناعيةهي عبارة عن نوع من النماذج يبنى من عدة طبقات , اكثر هذة الانواع استخداما هي الشبكات الالتفافية و الشبكات العصبونية المتكرره
+
+<br> 
+
+**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
+
+&#10230;
+البنية - المصطلحات حول بنية الشبكة العصبونية موضح في الشكل ادناة
+<br> 
+
+**5. [Input layer, hidden layer, output layer]**
+
+&#10230;
+[طبقة ادخال, طبقة مخفية, طبقة اخراج ]
+<br>  
+
+**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
+
+&#10230;
+عبر تدوين i كالطبقة رقم i و j للدلالة على رقم الوحده الخفية في تلك الطبقة , نحصل على:
+<br>  
+
+**7. where we note w, b, z the weight, bias and output respectively.**
+
+&#10230;
+حيث نعرف w, b, z كالوزن , و معامل التعديل , و الناتج حسب الترتيب.
+<br>  
+
+**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
+
+&#10230;
+دالة التفعيل(Activation function) - دالة التفعيل تستخدم في نهاية الوحده الخفية لتضمن المكونات الغير خطية للنموذج. هنا بعض دوال التفعيل الشائعة
+<br> 
+
+**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
+
+&#10230;
+[Sigmoid, Tanh, ReLU, Leaky ReLU] 
+<br> 
+
+**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
+
+&#10230;
+دالة الانتروبيا التقاطعية للخسارة(Cross-entropy loss) - في سياق الشبكات العصبونية, دالة الأنتروبيا L(z,y) تستخدم و تعرف كالاتي:
+<br>  
+
+**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
+
+&#10230;
+معدل التعلم(Learning rate) - معدل التعلم, يرمز , و هو مؤشر في اي تجاة يتم تحديث الاوزان. يمكن تثبيت هذا المعامل او تحديثة بشكل تأقلمي . حاليا اكثر النسب شيوعا تدعى Adam , وهي طريقة تجعل هذه النسبة سرعة التعلم بشكل تأقلمي    α او η ب , 
+<br>  
+
+**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
+
+&#10230;
+التغذية الخلفية(Backpropagation) - التغذية الخلفية هي طريقة لتحديث الاوزان في الشبكة العصبونية عبر اعتبار القيم الحقيقة للناتج مع القيمة المطلوبة للخرج. المشتقة بالنسبة للوزن w يتم حسابها باستخدام قاعدة التسلسل و تكون عبر الشكل الاتي: 
+<br>
+
+**13. As a result, the weight is updated as follows:**
+
+&#10230;
+كنتيجة , الوزن سيتم تحديثة كالتالي:
+<br> 
+
+**14. Updating weights ― In a neural network, weights are updated as follows:**
+
+&#10230;
+تحديث الاوزان - في الشبكات العصبونية , يتم تحديث الاوزان كما يلي: 
+<br>  
+
+**15. Step 1: Take a batch of training data.**
+
+&#10230;
+الخطوة 1: خذ حزمة من بيانات التدريب
+<br> 
+
+**16. Step 2: Perform forward propagation to obtain the corresponding loss.**
+
+&#10230;
+الخطوة 2: قم بعملية التغذيه الامامية لحساب الخسارة الناتجة
+<br> 
+
+**17. Step 3: Backpropagate the loss to get the gradients.**
+
+&#10230;
+الخطوة 3: قم بتغذية خلفية للخساره للحصول على دالة الانحدار
+<br>  
+
+**18. Step 4: Use the gradients to update the weights of the network.**
+
+&#10230;
+الخطوة 4: استخدم قيم الانحدار لتحديث اوزان الشبكة
+<br> 
+
+**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
+
+&#10230;
+الاسقاط(Dropout) - الاسقاط هي طريقة الغرض منها منع التكيف الزائد للنموذج في بيانات التدريب عبر اسقاط بعض الواحدات في الشبكة العصبونية, العصبونات يتم اما اسقاطها باحتمالية p او الحفاظ عليها باحتمالية 1-p.
+<br>  
+
+**20. Convolutional Neural Networks**
+
+&#10230;
+الشبكات العصبونية الالتفافية(CNN) 
+<br> 
+
+**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
+
+&#10230;
+احتياج الطبقة الالتفافية - عبر رمز w لحجم المدخل , F حجم العصبونات للطبقة الالتفافية , P عدد الحشوات الصفرية , فأن N عدد العصبونات لكل حجم معطى يحسب عبر الاتي: 
+<br>
+
+**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
+
+&#10230;
+تنظيم الحزمة(Batch normalization) - هي خطوه من قيم التحسين الخاصة γ,β  والتي تعدل الحزمة {xi}. لنجعل μB,σ2B المتوسط و الانحراف للحزمة المعنية و نريد تصحيح هذه الحزمة, يتم ذلك كالتالي:    
+<br> 
+
+**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
+
+&#10230;
+في الغالب تتم بعد الطبقة الالتفافية أو المتصلة كليا و قبل طبقة التغيرات الغير خطية و تهدف للسماح للسرعات التعليم العالية للتقليل من الاعتمادية القوية للقيم الاولية.
+
+
+<br>
+
+**24. Recurrent Neural Networks**
+
+&#10230;
+(RNN)الشبكات العصبونية التكرارية
+<br> 
+
+**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
+
+&#10230;
+انواع البوابات - هنا الانواع المختلفة التي ممكن مواجهتها في الشبكة العصبونية الاعتيادية:
+<br>  
+
+**26. [Input gate, forget gate, gate, output gate]**
+
+&#10230;
+[بوابة ادخال, بوابة نسيان, بوابة منفذ, بوابة اخراج ]
+<br> 
+
+**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
+
+&#10230;
+[كتابة ام عدم كتابة الى الخلية؟, مسح ام عدم مسح الخلية؟, كمية الكتابة الى الخلية ؟ , مدى الافصاح عن الخلية ؟ ]
+<br> 
+
+**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
+
+&#10230;
+LSTM - ذاكرة طويلة قصير الامد (long short-term memory) هي نوع من نموذج ال RNN تستخدم لتجنب مشكلة اختفاء الانحدار عبر اضافة بوابات النسيان.
+<br>  
+
+**29. Reinforcement Learning and Control**
+
+&#10230;
+التعلم و التحكم المعزز(Reinforcement Learning)
+<br> 
+
+**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
+
+&#10230;
+الهدف من التعلم المعزز للعميل الذكي هو التعلم لكيفية التأقلم في اي بيئة.
+<br>  
+
+**31. Definitions**
+
+&#10230;
+تعريفات
+<br> 
+
+**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
+
+&#10230;
+عملية ماركوف لاتخاذ القرار - عملية ماركوف لاتخاذ القرار هي سلسلة خماسية (S,A,{Psa},γ,R) حيث
+
+<br> 
+**33. S is the set of states**
+
+&#10230;
+ S هي مجموعة من حالات البيئة
+<br>
+
+**34. A is the set of actions**
+
+&#10230;
+A هي مجموعة من حالات الاجراءات
+<br> 
+**35. {Psa} are the state transition probabilities for s∈S and a∈A**
+
+&#10230;
+{Psa} هو حالة احتمال الانتقال من الحالة s∈S و a∈A
+<br>  
+
+**36. γ∈[0,1[ is the discount factor**
+
+&#10230;
+γ∈[0,1[ هي عامل الخصم
+<br>   
+
+**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
+
+&#10230;
+R:S×A⟶R or R:S⟶R  هي دالة المكافأة والتي تعمل الخوارزمية على جعلها اعلى قيمة
+<br> 
+
+**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
+
+&#10230;
+دالة القواعد - دالة القواعد π:S⟶A  هي التي تقوم بترجمة الحالات الى اجراءات.
+<br>  
+
+**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
+
+&#10230;
+ملاحظة: نقول ان النموذج ينفذ القاعدة المعينه π للحالة المعطاة s ان نتخذ الاجراءa=π(s).  
+<br>  
+ 
+**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
+
+&#10230;
+دالة القاعدة - لاي قاعدة معطاة π و حالة s, نقوم بتعريف دالة القيمة Vπ  كما يلي:  
+<br>    
+
+**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
+
+&#10230;
+معادلة بيلمان - معادلات بيلمان المثلى تشخص دالة القيمة دالة القيمة Vπ∗  π∗:للقاعدة المثلى 
+<br>  
+
+**42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
+
+&#10230;
+  π∗ للحالة المعطاه s تعطى كاالتالي: ملاحظة: نلاحظ ان القاعدة المثلى
+<br>  
+
+**43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
+
+&#10230;
+خوارزمية تكرار القيمة(Value iteration algorithm) - خوارزمية تكرار القيمة تكون في خطوتين:
+<br>  
+
+**44. 1) We initialize the value:**
+
+&#10230;
+ 1) نقوم بوضع قيمة اولية:
+<br> 
+
+**45. 2) We iterate the value based on the values before:**
+
+&#10230;
+2) نقوم بتكرير القيمة حسب القيم السابقة: 
+
+<br> 
+**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
+
+&#10230;
+تقدير الامكانية القصوى - تقديرات الامكانية القصوى (تقدير الاحتمال الأرجح) لحتماليات انتقال الحالة تكون كما يلي : 
+<br>   
+
+**47. times took action a in state s and got to s′**
+
+&#10230;
+اوقات تنفيذ الاجراء a في الحالة s و انتقلت الى s' 
+
+<br> 
+**48. times took action a in state s**
+
+&#10230;
+اوقات تنفيذ الاجراء a في الحالة s
+<br>  
+
+**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
+
+&#10230;
+التعلم-Q (Q-learning) -هي طريقة غير منمذجة لتقدير Q , و تتم كالاتي:
+<br>  
+**50. View PDF version on GitHub**
+
+&#10230;
+قم باستعراض نسخة ال PDF على GitHub
+<br>
+
+**51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
+
+&#10230;
+ [شبكات عصبونية, البنية , دالة التفعيل , التغذية الخلفية , الاسقاط ]
+<br> 
+
+**52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
+
+&#10230;
+[ الشبكة العصبونية الالتفافية , طبقة التفافية , تنظيم الحزمة ] 
+<br>  
+
+**53. [Recurrent Neural Networks, Gates, LSTM]**
+
+&#10230;
+[الشبكة العصبونية التكرارية , البوابات , LSTM]
+<br>  
+
+**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
+
+&#10230;
+[التعلم المعزز , عملية ماركوف لاتخاذ القرار , تكرير القيمة / القاعدة , بحث القاعدة]
diff --git a/ar/cs-229-linear-algebra.md b/ar/cs-229-linear-algebra.md
new file mode 100644
index 000000000..d0e88a543
--- /dev/null
+++ b/ar/cs-229-linear-algebra.md
@@ -0,0 +1,413 @@
+**1. Linear Algebra and Calculus refresher**
+
+<div dir="rtl">
+ملخص الجبر الخطي و التفاضل و التكامل
+</div>
+<br>
+
+**2. General notations**
+<div dir="rtl">
+الرموز العامة 
+</div> 
+
+<br>
+
+**3. Definitions**
+
+<div dir="rtl">
+التعريفات  
+</div>
+
+<br>
+
+**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
+<div dir="rtl">
+  متجه (vector) - نرمز ل $x \in \mathbb{R^n}$ متجه يحتوي على $n$ مدخلات، حيث $x_i \in \mathbb{R}$  يعتبر المدخل رقم $i$ . 
+</div>
+<br>
+
+**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
+
+<div dir="rtl">
+ مصفوفة (Matrix) - نرمز ل ${A \in \mathbb{R}^{m\times n$ مصفوفة تحتوي على $m$ صفوف و $n$ أعمدة، حيث $A_{i,j}$  يرمز للمدخل في الصف$ i$ و العمود $j$  
+</div>
+
+<br>
+
+**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
+<div dir="rtl">
+ملاحظة : المتجه $x$ المعرف مسبقا يمكن اعتباره مصفوفة من الشكل $n \times 1$ والذي يسمى ب مصفوفة من عمود واحد.
+</div>
+
+<br>
+
+**7. Main matrices**
+
+<div dir="rtl">
+المصفوفات الأساسية
+</div>
+<br>
+
+**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
+<div dir="rtl">
+  مصفوفة الوحدة (Identity) - مصفوفة الوحدة $I \in \mathbb{R^{n\times n}$ تعتبر مصفوفة مربعة تحتوي على المدخل 1 في قطر المصفوفة و 0 في بقية المدخلات:
+
+</div>
+<br>
+
+**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
+
+<div dir="rtl">
+ملاحظة : جميع المصفوفات من الشكل $A \in \mathbb{R^}{n\times n}$  فإن $A \times I = I \times A = A$.
+</div>
+<br>
+
+**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
+<div dir="rtl">
+مصفوفة قطرية (diagonal) - المصفوفة القطرية هي مصفوفة من الشكل
+ $D \in \mathbb{R}^{n\times n}$  حيث أن جميع العناصر الواقعة خارج القطر الرئيسي تساوي الصفر والعناصر على القطر الرئيسي تحتوي أعداد لاتساوي الصفر.   
+</div>
+<br>
+
+**11. Remark: we also note D as diag(d1,...,dn).**
+
+<div dir="rtl">
+ملاحظة: نرمز كذلك ل $D$ ب $text{diag}(d_1, \dots, d_n)\$.
+</div>
+<br>
+
+**12. Matrix operations**
+
+<div dir="rtl">
+ عمليات المصفوفات
+</div>
+
+<br>
+
+**13. Multiplication**
+
+<div dir="rtl">
+  الضرب
+</div>
+
+<br>
+
+**14. Vector-vector ― There are two types of vector-vector products:**
+
+<div dir="rtl">
+  ضرب المتجهات - توجد طريقتين لضرب متجه بمتجه : 
+</div>
+
+<br>
+
+**15. inner product: for x,y∈Rn, we have:**
+
+<div dir="rtl">
+  ضرب داخلي (inner product): ل $x,y \in \mathbb{R}^n$ نستنتج :
+</div>
+
+<br>
+
+**16. outer product: for x∈Rm,y∈Rn, we have:**
+
+<div dir="rtl">
+  ضرب خارجي (outer product):  ل $x \in \mathbb{m}, y \in \mathbb{R}^n$ نستنتج : 
+</div>
+
+<br>
+
+**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
+
+<div dir="rtl">
+  مصفوفة - متجه : ضرب المصفوفة $A \in \mathbb{R}^{n\times m}$ والمتجه $x \in \mathbb{R}^n$ ينتجه متجه من الشكل $x \in \mathbb{R}^n$ حيث : 
+</div>
+<br>
+
+**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
+
+<div dir="rtl">
+  حيث $a^{T}_{r,i}$ يعتبر متجه الصفوف و $a_{c,j}$ يعتبر متجه الأعمدة ل $A$ كذلك $x_i$ يرمز لعناصر $x$.
+</div>
+
+<br>
+
+**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
+
+<div dir="rtl">
+  ضرب مصفوفة ومصفوفة - ضرب المصفوفة $A \in \mathbb{R}^{n \times m}$ و $A \in \mathbb{R}^{n \times p}$ ينتجه عنه المصفوفة $A \in \mathbb{R}^{n \times p}$ حيث أن : 
+</div>
+
+<br>
+
+**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
+
+<div dir="rtl">
+حيث $a^T_{r, i}$ و $b^T_{r, i}$ يعتبر متجه الصفوف $a_{c, j}$ و b_{c, j}$ متجه الأعمدة ل $A$ و $B$ على التوالي.
+</div>
+
+<br>
+
+**21. Other operations**
+
+<div dir="rtl">
+  عمليات أخرى
+</div>
+
+<br>
+
+**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
+
+<div dir="rtl">
+  المنقول (Transpose) - منقول المصفوفة$A \in \mathbb{R}^{m \times n}$ يرمز له ب $A^T$ حيث الصفوف يتم تبديلها مع الأعمدة : 
+</div>
+
+<br>
+
+**23. Remark: for matrices A,B, we have (AB)T=BTAT**
+
+<div dir="rtl">
+   ملاحظة: لأي مصفوفتين $A$ و $B$، نستنتج $(AB)^T = B^T A^T$. 
+</div>
+<br>
+
+**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
+
+<div dir="rtl">
+   المعكوس (Inverse)- معكوس أي مصفوفة $A$ قابلة للعكس (Invertible) يرمز له ب $A^{-1}$ ويعتبر المعكوس المصفوفة الوحيدة التي لديها الخاصية التالية :
+</div>
+<br>
+
+**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
+
+<div dir="rtl">
+ملاحظة: ليس جميع المصفوفات يمكن إيجاد معكوس لها. كذلك لأي مصفوفتين $A$ و $B$ نستنتج $(AB)^{-1} = B^{-1} A^{-1}$.
+</div>
+
+<br>
+
+**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
+
+<div dir="rtl">
+أثر المصفوفة (Trace) - أثر أي مصفوفة مربعة $A$ يرمز له ب $tr(A)$ يعتبر مجموع العناصر التي في القطر:
+</div>
+<br>
+
+**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
+
+<div dir="rtl">
+ ملاحظة : لأي مصفوفتين $A$ و $B$ لدينا $tr(A^T) = tr(A)$ و $tr(AB) = tr(BA)$. 
+</div>
+<br>
+
+**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
+
+<div dir="rtl">
+المحدد (Determinant) - المحدد لأي مصفوفة مربعة من الشكل $A \in \mathbb{R}^{n \times n}$ يرمز له ب $|A|$ او $det(A)$يتم تعريفه بإستخدام $ِA_{\\i,\\j}$ والذي يعتبر المصفوفة $A$ مع حذف الصف $i$ والعمود $j$ كالتالي : 
+</div>
+<br>
+
+**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
+
+<div dir="rtl">
+ ملاحظة: $A$ يكون لديه معكوذ إذا وفقط إذا $\neq 0 |A|$. كذلك $|A B| = |A| |B|$ و $|A^T| = |A|$. 
+</div>
+<br>
+
+**30. Matrix properties**
+
+<div dir="rtl">
+خواص المصفوفات
+</div>
+<br>
+
+**31. Definitions**
+
+<div dir="rtl">
+التعريفات
+</div>
+<br>
+
+**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
+
+<div dir="rtl">
+  التفكيك المتماثل (Symmetric Decomposition)- المصفوفة $A$ يمكن التعبير عنها بإستخدام جزئين مثماثل (Symmetric) وغير متماثل(Antisymmetric) كالتالي : 
+</div>
+<br>
+
+**33. [Symmetric, Antisymmetric]**
+
+<div dir="rtl">
+[متماثل، غير متماثل]
+</div>
+
+<br>
+
+**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
+
+<div dir="rtl">
+المعيار (Norm) - المعيار يعتبر دالة $N: V \to [0, +\infity)$ حيث $V$ يعتبر فضاء متجه (Vector Space)، حيث أن لكل $x,y \in V$ لدينا :
+</div>
+<br>
+
+**35. N(ax)=|a|N(x) for a scalar**
+
+<div dir="rtl">
+لأي عدد $a$ فإن $N(ax) = |a| N(x)$
+</div>
+<br>
+
+**36. if N(x)=0, then x=0**
+
+<div dir="rtl">
+$N(x) =0 \implies x = 0$
+</div>
+<br>
+
+**37. For x∈V, the most commonly used norms are summed up in the table below:**
+
+<div dir="rtl">
+لأي $x \in V$ المعايير الأكثر إستخداماً ملخصة في الجدول التالي:
+</div>
+<br>
+
+**38. [Norm, Notation, Definition, Use case]**
+
+<div dir="rtl">
+[المعيار، الرمز، التعريف، مثال للإستخدام]
+</div>
+<br>
+
+**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
+
+<div dir="rtl">
+ الارتباط الخطي (Linear Dependence): مجموعة المتجهات تعتبر تابعة خطياً إذا وفقط إذا كل متجه يمكن كتابته بشكل خطي بإسخدام مجموعة من المتجهات الأخرى. 
+</div>
+<br>
+
+**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
+
+<div dir="rtl">
+ملاحظة: إذا لم يتحقق هذا الشرط فإنها تسمى مستقلة خطياً . 
+</div>
+<br>
+
+**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
+
+<div dir="rtl">
+ رتبة المصفوفة (Rank) - رتبة المصفوفة $A$ يرمز له ب $text{rank}(A)\$ وهو يصف حجم الفضاء المتجهي الذي نتج من أعمدة المصفوفة. يمكن وصفه كذلك بأقصى عدد من أعمدة المصفوفة $A$ التي تمتلك خاصية أنها مستقلة خطياً. 
+</div>
+<br>
+
+**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
+
+<div dir="rtl">
+  مصفوفة شبه معرفة موجبة (Positive semi-definite) - المصفوفة  $A \in \mathbb{R}^{n \times n}$ تعتبر مصفوفة شبه معرفة موجبة (PSD) ويرمز لها بالرمز  $A \succed 0  $ إذا : 
+</div>
+<br>
+
+**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
+
+<div dir="rtl">
+  ملاحظة: المصفوفة $A$ تعتبر مصفوفة معرفة موجبة إذا $A \succ 0  $  وهي تعتبر مصفوفة (PSD) والتي تستوفي الشرط : لكل متجه غير الصفر $x$ حيث $x^TAx>0 $.
+</div>
+<br>
+
+**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+<div dir="rtl">
+  القيم الذايتة (eigenvalue), المتجه الذاتي (eigenvector) - إذا كان لدينا مصفوفة $A \in \mathbb{R}^{n \times n}$، القيمة $\lambda$  تعتبر قيمة ذاتية للمصفوفة $A$ إذا وجد متجه $z \in \mathbb{R}^n \\ \{0\}$ يسمى متجه ذاتي حيث أن : 
+</div>
+<br>
+
+**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+<div dir="rtl">
+  النظرية الطيفية (spectral theorem) - نفرض $A \in \mathbb{R}^{n \times n}$ إذا كانت المصفوفة $A$ متماثلة فإن $A$ تعتبر مصفوفة قطرية بإستخدام مصفوفة  متعامدة (orthogonal) $U \in \mathbb{R} ^{n \times n}$ ويرمز لها بالرمز  $\Lambda = \diag(\lambda_1, \dots, \lambda_n)$ حيث أن:
+</div>
+<br>
+
+**46. diagonal**
+
+<div dir="rtl">
+  قطرية 
+</div>
+<br>
+
+**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
+
+<div dir="rtl">
+  مجزئ القيمة المفرده (singular value decomposition) : لأي مصفوفة $A$ من الشكل $n\times m$ ، تفكيك القيمة المنفردة (SVD) يعتبر طريقة تحليل تضمن وجود $U \in \mathbb{R}^{m \times m}$ , مصفوفة قطرية  $\Sigma \in \mathbb{R}^{m \times n}$ و $V \in \mathbb{R}^{n \times n}$ حيث أن : 
+</div>
+<br>
+
+**48. Matrix calculus**
+
+<div dir="rtl">
+  حساب المصفوفات 
+</div>
+<br>
+
+**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
+
+<div dir="rtl">
+   المشتقة في فضاءات عالية (gradient) - افترض $f: \mathbb{R}^{m \times n} \rightarrow \mathbb{R}$ تعتبر دالة و $f: \mathbb{R}^{m \times n} \rightarrow \mathbb{R}$ تعتبر مصفوفة. المشتقة العليا ل $f$ بالنسبة ل $A$  يعتبر مصفوفة $n\times m$ يرمز له $nabla_A f(A)\$ حيث أن:
+</div>
+<br>
+
+**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
+
+<div dir="rtl">
+ملاحظة : المشتقة العليا معرفة فقط إذا كانت الدالة $f$ لديها مدى ضمن الأعداد الحقيقية.
+</div>
+<br>
+
+**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
+
+<div dir="rtl">
+هيشيان (Hessian) - افترض $f: \mathbb{R}^n \rightarrow \mathbb{R}$ تعتبر دالة و $x \in \mathbb{R}^n$ يعتبر متجه. الهيشيان ل $f$ بالنسبة ل $x$ تعتبر مصفوفة متماثلة من الشكل $n \times n$ يرمز لها بالرمز $nabla^2_x f(x)\$ حيثب أن : 
+</div>
+<br>
+
+**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
+
+<div dir="rtl">
+  ملاحظة : الهيشيان معرفة فقط إذا كانت الدالة $f$ لديها مدى ضمن الأعداد الحقيقية.
+
+</div>
+<br>
+
+**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
+
+<div dir="rtl">
+  الحساب في مشتقة الفضاءات العالية- لأي مصفوفات $A,B,C$ فإن الخواص التالية مهمة : 
+
+</div>
+<br>
+
+**54. [General notations, Definitions, Main matrices]**
+
+<div dir="rtl">
+    [الرموز العامة، التعاريف، المصفوفات الرئيسية]
+</div>
+
+<br>
+
+**55. [Matrix operations, Multiplication, Other operations]**
+
+<div dir="rtl">
+  [عمليات المصفوفات، الضرب، عمليات أخرى]
+</div>
+<br>
+
+**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
+
+<div dir="rtl">
+  [خواص المصفوفات، المعيار، قيمة ذاتية/متجه ذاتي، تفكيك القيمة المنفردة]
+</div>
+<br>
+
+**57. [Matrix calculus, Gradient, Hessian, Operations]**
+
+<div dir="rtl">
+  [حساب المصفوفات، مشتقة الفضاءات العالية، الهيشيان، العمليات]
+</div>
diff --git a/ar/cs-229-machine-learning-tips-and-tricks.md b/ar/cs-229-machine-learning-tips-and-tricks.md
new file mode 100644
index 000000000..d48445a75
--- /dev/null
+++ b/ar/cs-229-machine-learning-tips-and-tricks.md
@@ -0,0 +1,338 @@
+**Machine Learning tips and tricks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks)
+
+<br>
+
+**1. Machine Learning tips and tricks cheatsheet**
+
+<div dir="rtl">
+مرجع سريع لنصائح وحيل تعلّم الآلة
+</div>
+<br>
+
+**2. Classification metrics**
+
+<div dir="rtl">
+مقاييس التصنيف
+</div>
+<br>
+
+**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
+
+<div dir="rtl">
+في سياق التصنيف الثنائي، هذه المقاييس (metrics) المهمة التي يجدر مراقبتها من أجل تقييم آداء النموذج.
+</div>
+<br>
+
+**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
+
+<div dir="rtl">
+مصفوفة الدقّة (confusion matrix) - تستخدم مصفوفة الدقّة لأخذ تصور شامل عند تقييم أداء النموذج. وهي تعرّف كالتالي: 
+</div>
+<br>
+
+**5. [Predicted class, Actual class]**
+
+<div dir="rtl">
+[التصنيف المتوقع، التصنيف الفعلي]
+</div>
+<br>
+
+**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
+
+<div dir="rtl">
+المقاييس الأساسية - المقاييس التالية تستخدم في العادة لتقييم أداء نماذج التصنيف:
+</div>
+<br>
+
+**7. [Metric, Formula, Interpretation]**
+
+<div dir="rtl">
+[المقياس، المعادلة، التفسير]
+</div>
+<br>
+
+**8. Overall performance of model**
+
+<div dir="rtl">
+الأداء العام للنموذج
+</div>
+<br>
+
+**9. How accurate the positive predictions are**
+
+<div dir="rtl">
+دقّة التوقعات الإيجابية (positive)
+</div>
+<br>
+
+**10. Coverage of actual positive sample**
+
+<div dir="rtl">
+تغطية عينات التوقعات الإيجابية الفعلية
+</div>
+<br>
+
+**11. Coverage of actual negative sample**
+
+<div dir="rtl">
+تغطية عينات التوقعات السلبية الفعلية
+</div>
+<br>
+
+**12. Hybrid metric useful for unbalanced classes**
+
+<div dir="rtl">
+مقياس هجين مفيد للأصناف غير المتوازنة (unbalanced)
+</div>
+<br>
+
+**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
+
+<div dir="rtl">
+منحنى دقّة الأداء (ROC) - منحنى دقّة الآداء، ويطلق عليه ROC، هو رسمة لمعدل التصنيفات الإيجابية الصحيحة (TPR) مقابل معدل التصنيفات الإيجابية الخاطئة (FPR) باستخدام قيم حد (threshold) متغيرة. هذه المقاييس ملخصة في الجدول التالي:
+</div>
+<br>
+
+**14. [Metric, Formula, Equivalent]**
+
+<div dir="rtl">
+[المقياس، المعادلة، مرادف]
+</div>
+<br>
+
+**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
+
+<div dir="rtl">
+المساحة تحت منحنى دقة الأداء (المساحة تحت المنحنى) (AUC) - المساحة تحت منحنى دقة الأداء (المساحة تحت المنحنى)، ويطلق عليها  AUC أو AUROC، هي المساحة تحت ROC كما هو موضح في الرسمة التالية:
+</div>
+<br>
+
+**16. [Actual, Predicted]**
+
+<div dir="rtl">
+[الفعلي، المتوقع]
+</div>
+<br>
+
+**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
+
+<div dir="rtl">
+المقاييس الأساسية - إذا كان لدينا نموذج الانحدار f، فإن المقاييس التالية غالباً ما تستخدم لتقييم أداء النموذج:
+</div>
+<br>
+
+**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
+
+<div dir="rtl">
+[المجموع الكلي للمربعات، مجموع المربعات المُفسَّر، مجموع المربعات المتبقي]
+</div>
+<br>
+
+**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
+
+<div dir="rtl">
+مُعامل التحديد (Coefficient of determination) - مُعامل التحديد، وغالباً يرمز له بـ R2 أو r2، يعطي قياس لمدى مطابقة النموذج للنتائج الملحوظة، ويعرف كما يلي:
+</div>
+<br>
+
+**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
+
+<div dir="rtl">
+المقاييس الرئيسية - المقاييس التالية تستخدم غالباً لتقييم أداء نماذج الانحدار، وذلك بأن يتم الأخذ في الحسبان عدد المتغيرات n المستخدمة فيها:
+</div>
+<br>
+
+**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
+
+<div dir="rtl">
+حيث L هو الأرجحية، و ˆσ2 تقدير التباين الخاص بكل نتيجة.
+</div>
+<br>
+
+**22. Model selection**
+
+<div dir="rtl">
+اختيار النموذج
+</div>
+<br>
+
+**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+<div dir="rtl">
+مفردات - عند اختيار النموذج، نفرق بين 3 أجزاء من البيانات التي لدينا كالتالي:
+</div>
+<br>
+
+**24. [Training set, Validation set, Testing set]**
+
+<div dir="rtl">
+[مجموعة تدريب، مجموعة تحقق، مجموعة اختبار]
+</div>
+<br>
+
+**25. [Model is trained, Model is assessed, Model gives predictions]**
+
+<div dir="rtl">
+[يتم تدريب النموذج، يتم تقييم النموذج، النموذج يعطي التوقعات]
+</div>
+<br>
+
+**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
+
+<div dir="rtl">
+[غالباً 80% من مجموعة البيانات، غالباً 20% من مجموعة البيانات]
+</div>
+<br>
+
+**27. [Also called hold-out or development set, Unseen data]**
+
+<div dir="rtl">
+[يطلق عليها كذلك المجموعة المُجنّبة أو مجموعة التطوير، بيانات لم يسبق رؤيتها من قبل]
+</div>
+<br>
+
+**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+<div dir="rtl">
+بمجرد اختيار النموذج، يتم تدريبه على مجموعة البيانات بالكامل ثم يتم اختباره على مجموعة اختبار لم يسبق رؤيتها من قبل. كما هو موضح في الشكل التالي:
+</div>
+<br>
+
+**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
+
+<div dir="rtl">
+التحقق المتقاطع (Cross-validation) - التحقق المتقاطع، وكذلك يختصر بـ CV، هو طريقة تستخدم لاختيار نموذج بحيث لا يعتمد بشكل كبير على مجموعة بيانات التدريب المبدأية. أنواع التحقق المتقاطع المختلفة ملخصة في الجدول التالي:
+</div>
+<br>
+
+**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
+
+<div dir="rtl">
+[التدريب على k-1 جزء والتقييم باستخدام الجزء الباقي، التدريب على n−p عينة والتقييم باستخدام الـ p عينات المتبقية]
+</div>
+<br>
+
+**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
+
+<div dir="rtl">
+[بشكل عام k=5 أو 10، الحالة p=1 يطلق عليها الإبقاء على واحد (leave-one-out)]
+</div>
+<br>
+
+**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
+
+<div dir="rtl">
+الطريقة الأكثر استخداماً يطلق عليها التحقق المتقاطع س جزء/أجزاء (k-fold)، ويتم فيها تقسيم البيانات إلى k جزء، بحيث يتم تدريب النموذج باستخدام k−1 والتحقق باستخدام الجزء المتبقي، ويتم تكرار ذلك k مرة. يتم بعد ذلك حساب معدل الأخطاء في الأجزاء k ويسمى خطأ التحقق المتقاطع.
+</div>
+<br>
+
+**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+<div dir="rtl">
+ضبط (Regularization) - عمليه الضبط تهدف إلى تفادي فرط التخصيص (overfit) للنموذج، وهو بذلك يتعامل مع مشاكل التباين العالي. الجدول التالي يلخص أنواع وطرق الضبط الأكثر استخداماً:
+</div>
+<br>
+
+**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+<div dir="rtl">
+[يقلص المُعاملات إلى 0، جيد لاختيار المتغيرات، يجعل المُعاملات أصغر، المفاضلة بين اختيار المتغيرات والمُعاملات الصغيرة]
+</div>
+<br>
+
+**35. Diagnostics**
+
+<div dir="rtl">
+التشخيصات
+</div>
+<br>
+
+**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
+
+<div dir="rtl">
+الانحياز (Bias) - الانحياز للنموذج هو الفرق بين التنبؤ المتوقع والنموذج الحقيقي الذي نحاول تنبؤه للبيانات المعطاة.
+</div>
+<br>
+
+**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
+
+<div dir="rtl">
+التباين (Variance) - تباين النموذج هو مقدار التغير في تنبؤ النموذج لنقاط البيانات المعطاة.
+</div>
+<br>
+
+**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
+
+<div dir="rtl">
+موازنة الانحياز/التباين (Bias/variance tradeoff) - كلما زادت بساطة النموذج، زاد الانحياز، وكلما زاد تعقيد النموذج، زاد التباين.
+</div>
+<br>
+
+**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
+
+<div dir="rtl">
+[الأعراض، توضيح الانحدار، توضيح التصنيف، توضيح التعلم العميق، العلاجات الممكنة]
+</div>
+<br>
+
+**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
+
+<div dir="rtl">
+[خطأ التدريب عالي، خطأ التدريب قريب من خطأ الاختبار، انحياز عالي، خطأ التدريب أقل بقليل من خطأ الاختبار، خطأ التدريب منخفض جداً، خطأ التدريب أقل بكثير من خطأ الاختبار، تباين عالي]
+</div>
+<br>
+
+**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
+
+<div dir="rtl">
+[زيادة تعقيد النموذج، إضافة المزيد من الخصائص، تدريب لمدة أطول، إجراء الضبط (regularization)، الحصول على المزيد من البيانات]
+</div>
+<br>
+
+**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
+
+<div dir="rtl">
+تحليل الخطأ - تحليل الخطأ هو تحليل السبب الرئيسي للفرق في الأداء بين النماذج الحالية والنماذج المثالية.
+</div>
+<br>
+
+**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
+
+<div dir="rtl">
+تحليل استئصالي (Ablative analysis) - التحليل الاستئصالي هو تحليل السبب الرئيسي للفرق في الأداء بين النماذج الحالية والنماذج المبدئية (baseline).
+</div>
+<br>
+
+**44. Regression metrics**
+
+<div dir="rtl">
+مقاييس الانحدار
+</div>
+<br>
+
+**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
+
+<div dir="rtl">
+[مقاييس التصنيف، مصفوفة الدقّة، الضبط (accuracy)، الدقة (precision)، الاستدعاء (recall)، درجة F1]
+</div>
+<br>
+
+**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
+
+<div dir="rtl">
+[مقاييس الانحدار، مربع R، معيار معامل مالوس (Mallow's)، معيار آكياك المعلوماتي (AIC)، معيار المعلومات البايزي (BIC)]
+</div>
+<br>
+
+**47. [Model selection, cross-validation, regularization]**
+
+<div dir="rtl">
+[اختيار النموذج، التحقق المتقاطع، الضبط]
+</div>
+<br>
+
+**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
+
+<div dir="rtl">
+[التشخيصات، موازنة الانحياز/التباين، تحليل الخطأ/التحليل الاستئصالي]
+</div>
diff --git a/ar/cs-229-supervised-learning.md b/ar/cs-229-supervised-learning.md
new file mode 100644
index 000000000..9104d46a1
--- /dev/null
+++ b/ar/cs-229-supervised-learning.md
@@ -0,0 +1,663 @@
+**1. Supervised Learning cheatsheet**
+
+<div dir="rtl">
+مرجع سريع للتعلّم المُوَجَّه
+</div> 
+<br>
+
+**2. Introduction to Supervised Learning**
+
+<div dir="rtl">
+مقدمة للتعلّم المُوَجَّه
+</div> 
+<br>
+
+**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
+
+<div dir="rtl">
+إذا كان لدينا مجموعة من نقاط البيانات {x(1),...,x(m)} مرتبطة بمجموعة مخرجات {y(1),...,y(m)}، نريد أن نبني مُصَنِّف يتعلم كيف يتوقع y من x.
+</div> 
+<br>
+
+**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
+
+<div dir="rtl">
+نوع التوقّع - أنواع نماذج التوقّع المختلفة موضحة في الجدول التالي:
+</div> 
+<br>
+
+**5. [Regression, Classifier, Outcome, Examples]**
+
+<div dir="rtl">
+[الانحدار (Regression)، التصنيف (Classification)، المُخرَج، أمثلة]
+</div> 
+<br>
+
+**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
+
+<div dir="rtl">
+[مستمر، صنف، انحدار خطّي (Linear regression)، انحدار لوجستي (Logistic regression)، آلة المتجهات الداعمة (SVM)، بايز البسيط (Naive Bayes)]
+</div> 
+<br>
+
+**7. Type of model ― The different models are summed up in the table below:**
+
+<div dir="rtl">
+نوع النموذج - أنواع النماذج المختلفة موضحة في الجدول التالي:
+</div> 
+<br>
+
+**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
+
+<div dir="rtl">
+[نموذج تمييزي (discriminative)، نموذج توليدي (Generative)، الهدف، ماذا يتعلم، توضيح، أمثلة]
+</div> 
+<br>
+
+**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, آلة المتجهات الداعمة (SVM), GDA, Naive Bayes]**
+
+<div dir="rtl">
+[التقدير المباشر لـ P(y|x)، تقدير P(x|y) ثم استنتاج P(y|x)، حدود القرار، التوزيع الاحتمالي للبيانات، الانحدار (Regression)، آلة المتجهات الداعمة (SVM)، GDA، بايز البسيط (Naive Bayes)]
+</div> 
+<br>
+
+**10. Notations and general concepts**
+
+<div dir="rtl">
+الرموز ومفاهيم أساسية
+</div> 
+<br>
+
+**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
+
+<div dir="rtl">
+الفرضية (Hypothesis) - الفرضية، ويرمز لها بـ hθ، هي النموذج الذي نختاره. إذا كان لدينا المدخل x(i)، فإن المخرج الذي سيتوقعه النموذج هو hθ(x(i)).
+</div> </div> 
+<br>
+
+**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
+
+<div dir="rtl">
+دالة الخسارة (Loss function) - دالة الخسارة هي الدالة L:(z,y)∈R×Y⟼L(z,y)∈R التي تأخذ كمدخلات القيمة المتوقعة z والقيمة الحقيقية y وتعطينا الاختلاف بينهما. الجدول التالي يحتوي على بعض دوال الخسارة الشائعة:
+</div> 
+<br>
+
+**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
+
+<div dir="rtl">
+[خطأ أصغر تربيع (Least squared error)، خسارة لوجستية (Logistic loss)، خسارة مفصلية (Hinge loss)، الانتروبيا التقاطعية (Cross-entropy)]
+</div> 
+<br>
+
+**14. [Linear regression, Logistic regression, SVM, Neural Network]**
+
+<div dir="rtl">
+[الانحدار الخطّي (Linear regression)، الانحدار اللوجستي (Logistic regression)، آلة المتجهات الداعمة (SVM)، الشبكات العصبية (Neural Network)]
+</div> 
+<br>
+
+**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
+
+<div dir="rtl">
+دالة التكلفة (Cost function) - دالة التكلفة J تستخدم عادة لتقييم أداء نموذج ما، ويتم تعريفها مع دالة الخسارة L كالتالي:
+</div> 
+<br>
+
+**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
+
+<div dir="rtl">
+النزول الاشتقاقي (Gradient descent) - لنعرّف معدل التعلّم α∈R، يمكن تعريف القانون الذي يتم تحديث خوارزمية النزول الاشتقاقي من خلاله باستخدام معدل التعلّم ودالة التكلفة J كالتالي:
+</div> 
+<br>
+
+**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
+
+<div dir="rtl">
+ملاحظة: في النزول الاشتقاقي العشوائي (Stochastic gradient descent (SGD)) يتم تحديث المُعاملات (parameters) بناءاً على كل عينة تدريب على حدة، بينما في النزول الاشتقاقي الحُزَمي (batch gradient descent) يتم تحديثها باستخدام حُزَم من عينات التدريب.
+</div> 
+<br>
+
+**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
+
+<div dir="rtl">
+الأرجحية (Likelihood) - تستخدم أرجحية النموذج L(θ)، حيث أن θ هي المُدخلات، للبحث عن المُدخلات θ الأحسن عن طريق تعظيم (maximizing) الأرجحية. عملياً يتم استخدام الأرجحية اللوغاريثمية (log-likelihood) ℓ(θ)=log(L(θ)) حيث أنها أسهل في التحسين (optimize). فيكون لدينا:
+</div> 
+<br>
+
+**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
+
+<div dir="rtl">
+خوارزمية نيوتن (Newton's algorithm) - خوارزمية نيوتن هي طريقة حسابية للعثور على θ بحيث يكون ℓ′(θ)=0. قاعدة التحديث للخوارزمية كالتالي:
+</div> 
+<br>
+
+**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
+
+<div dir="rtl">
+ملاحظة: هناك خوارزمية أعم وهي متعددة الأبعاد (multidimensional)، يطلق عليها خوارزمية نيوتن-رافسون (Newton-Raphson)، ويتم تحديثها عبر القانون التالي:
+</div> 
+<br>
+
+**21. Linear models**
+
+<div dir="rtl">
+النماذج الخطيّة (Linear models)
+</div> 
+<br>
+
+**22. Linear regression**
+
+<div dir="rtl">
+الانحدار الخطّي (Linear regression)
+</div> 
+<br>
+
+**23. We assume here that y|x;θ∼N(μ,σ2)**
+
+<div dir="rtl">
+هنا نفترض أن y|x;θ∼N(μ,σ2)
+</div> 
+<br>
+
+**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
+
+<div dir="rtl">
+المعادلة الطبيعية/الناظمية (Normal) - إذا كان لدينا المصفوفة X، القيمة θ التي تقلل من دالة التكلفة يمكن حلها رياضياً بشكل مغلق (closed-form) عن طريق:
+</div> 
+<br>
+
+**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
+
+<div dir="rtl">
+خوارزمية أصغر معدل تربيع LMS - إذا كان لدينا معدل التعلّم α، فإن قانون التحديث لخوارزمية أصغر معدل تربيع (Least Mean Squares (LMS)) لمجموعة بيانات من m عينة، ويطلق عليه قانون تعلم ويدرو-هوف (Widrow-Hoff)، كالتالي:
+</div> 
+<br>
+
+**26. Remark: the update rule is a particular case of the gradient ascent.**
+
+<div dir="rtl">
+ملاحظة: قانون التحديث هذا يعتبر حالة خاصة من النزول الاشتقاقي (Gradient descent).
+</div> 
+<br>
+
+**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
+
+<div dir="rtl">
+الانحدار الموزون محليّاً (LWR) - الانحدار الموزون محليّاً (Locally Weighted Regression)، ويعرف بـ LWR، هو نوع من الانحدار الخطي يَزِن كل عينة تدريب أثناء حساب دالة التكلفة باستخدام w(i)(x)، التي يمكن تعريفها باستخدام المُدخل (parameter) τ∈R كالتالي:
+</div> 
+<br>
+
+**28. Classification and logistic regression**
+
+<div dir="rtl">
+التصنيف والانحدار اللوجستي
+</div> 
+<br>
+
+**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
+
+<div dir="rtl">
+دالة سيجمويد (Sigmoid) - دالة سيجمويد g، وتعرف كذلك بالدالة اللوجستية، تعرّف كالتالي:
+</div> 
+<br>
+
+**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
+
+<div dir="rtl">
+الانحدار اللوجستي (Logistic regression) - نفترض هنا أن  y|x;θ∼Bernoulli(ϕ). فيكون لدينا:
+</div> 
+<br>
+
+**31. Remark: there is no closed form solution for the case of logistic regressions.**
+
+<div dir="rtl">
+ملاحظة: ليس هناك حل رياضي مغلق للانحدار اللوجستي.
+</div> 
+<br>
+
+**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
+
+<div dir="rtl">
+انحدار سوفت ماكس (Softmax) - ويطلق عليه الانحدار اللوجستي متعدد الأصناف (multiclass logistic regression)، يستخدم لتعميم الانحدار اللوجستي إذا كان لدينا أكثر من صنفين. في العرف يتم تعيين θK=0، بحيث تجعل مُدخل بيرنوللي (Bernoulli) ϕi لكل فئة i يساوي:
+</div> 
+<br>
+
+**33. Generalized Linear Models**
+
+<div dir="rtl">
+النماذج الخطية العامة (Generalized Linear Models - GLM)
+</div> 
+<br>
+
+**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
+
+<div dir="rtl">
+العائلة الأُسيّة (Exponential family) - يطلق على صنف من التوزيعات (distributions) بأنها تنتمي إلى العائلة الأسيّة إذا كان يمكن كتابتها بواسطة مُدخل قانوني (canonical parameter) η، إحصاء كافٍ (sufficient statistic) T(y)، ودالة تجزئة لوغاريثمية a(η)، كالتالي:
+</div> 
+<br>
+
+**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
+
+<div dir="rtl">
+ملاحظة: كثيراً ما سيكون T(y)=y. كذلك فإن exp(−a(η)) يمكن أن تفسر كمُدخل تسوية (normalization) للتأكد من أن الاحتمالات يكون حاصل جمعها يساوي واحد.
+</div> 
+<br>
+
+**36. Here are the most common exponential distributions summed up in the following table:**
+
+<div dir="rtl">
+تم تلخيص أكثر التوزيعات الأسيّة استخداماً في الجدول التالي:
+</div> 
+<br>
+
+**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
+
+<div dir="rtl">
+[التوزيع، بِرنوللي (Bernoulli)، جاوسي (Gaussian)، بواسون (Poisson)، هندسي (Geometric)]
+</div> 
+<br>
+
+**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
+
+<div dir="rtl">
+افتراضات GLMs - تهدف النماذج الخطيّة العامة (GLM) إلى توقع المتغير العشوائي y كدالة لـ x∈Rn+1، وتستند إلى ثلاثة افتراضات:
+</div> 
+<br>
+
+**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
+
+<div dir="rtl">
+ملاحظة: أصغر تربيع (least squares) الاعتيادي و الانحدار اللوجستي يعتبران من الحالات الخاصة للنماذج الخطيّة العامة.
+</div> 
+<br>
+
+**40. Support Vector Machines**
+
+<div dir="rtl">
+آلة المتجهات الداعمة (Support Vector Machines)
+</div> 
+<br>
+
+**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
+
+<div dir="rtl">
+تهدف آلة المتجهات الداعمة (SVM) إلى العثور على الخط الذي يعظم أصغر مسافة إليه:
+</div> 
+<br>
+
+**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
+
+<div dir="rtl">
+مُصنِّف الهامش الأحسن (Optimal margin classifier) - يعرَّف مُصنِّف الهامش الأحسن h كالتالي:
+</div> 
+<br>
+
+**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
+
+<div dir="rtl">
+حيث (w,b)∈Rn×R هو الحل لمشكلة التحسين (optimization) التالية:
+</div> 
+<br>
+
+**44. such that**
+
+<div dir="rtl">
+بحيث أن
+</div> 
+<br>
+
+**45. support vectors**
+
+<div dir="rtl">
+المتجهات الداعمة (support vectors)
+</div> 
+<br>
+
+**46. Remark: the line is defined as wTx−b=0.**
+
+<div dir="rtl">
+ملاحظة: يتم تعريف الخط بهذه المعادلة wTx−b=0.
+</div> 
+<br>
+
+**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
+
+<div dir="rtl">
+الخسارة المفصلية (Hinge loss) - تستخدم الخسارة المفصلية في حل SVM ويعرف على النحو التالي:
+</div> 
+<br>
+
+**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
+
+<div dir="rtl">
+النواة (Kernel) - إذا كان لدينا دالة ربط الخصائص (features) ϕ، يمكننا تعريف النواة K كالتالي:
+</div> 
+<br>
+
+**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
+
+<div dir="rtl">
+عملياً، يمكن أن تُعَرَّف الدالة K عن طريق المعادلة K(x,z)=exp(−||x−z||22σ2)، ويطلق عليها النواة الجاوسية (Gaussian kernel)، وهي تستخدم بكثرة.
+</div> 
+<br>
+
+**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
+
+<div dir="rtl">
+[قابلية الفصل غير الخطي، استخدام ربط النواة، حد القرار في الفضاء الأصلي]
+</div> 
+<br>
+
+**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
+
+<div dir="rtl">
+ملاحظة: نقول أننا نستخدم "حيلة النواة" (kernel trick) لحساب دالة التكلفة عند استخدام النواة لأننا في الحقيقة لا نحتاج أن نعرف التحويل الصريح ϕ، الذي يكون في الغالب شديد التعقيد. ولكن، نحتاج أن فقط أن نحسب القيم K(x,z).
+</div> 
+<br>
+
+**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
+
+<div dir="rtl">
+اللّاغرانجي (Lagrangian) - يتم تعريف اللّاغرانجي L(w,b) على النحو التالي: 
+</div> 
+<br>
+
+**53. Remark: the coefficients βi are called the Lagrange multipliers.**
+
+<div dir="rtl">
+ملاحظة: المعامِلات (coefficients) βi يطلق عليها مضروبات لاغرانج (Lagrange multipliers).
+</div> 
+<br>
+
+**54. Generative Learning**
+
+<div dir="rtl">
+التعلم التوليدي (Generative Learning)
+</div> 
+<br>
+
+**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
+
+<div dir="rtl">
+النموذج التوليدي في البداية يحاول أن يتعلم كيف تم توليد البيانات عن طريق تقدير P(x|y)، التي يمكن حينها استخدامها لتقدير P(y|x) باستخدام قانون بايز (Bayes' rule).
+</div> 
+<br>
+
+**56. Gaussian Discriminant Analysis**
+
+<div dir="rtl">
+تحليل التمايز الجاوسي (Gaussian Discriminant Analysis)
+</div> 
+<br>
+
+**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
+
+<div dir="rtl">
+الإطار - تحليل التمايز الجاوسي يفترض أن y و x|y=0 و x|y=1 بحيث يكونوا كالتالي:
+</div> 
+<br>
+
+**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
+
+<div dir="rtl">
+التقدير - الجدول التالي يلخص التقديرات التي يمكننا التوصل لها عند تعظيم الأرجحية (likelihood):
+</div> 
+<br>
+
+**59. Naive Bayes**
+
+<div dir="rtl">
+بايز البسيط (Naive Bayes)
+</div> 
+<br>
+
+**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
+
+<div dir="rtl">
+الافتراض - يفترض نموذج بايز البسيط أن جميع الخصائص لكل عينة بيانات مستقلة (independent):
+</div> 
+<br>
+
+**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
+
+<div dir="rtl">
+الحل - تعظيم الأرجحية اللوغاريثمية (log-likelihood) يعطينا الحلول التالية إذا كان k∈{0,1}، l∈[[1,L]]:
+</div> 
+<br>
+
+**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
+
+<div dir="rtl">
+ملاحظة: بايز البسيط يستخدم بشكل واسع لتصنيف النصوص واكتشاف البريد الإلكتروني المزعج.
+</div> 
+<br>
+
+**63. Tree-based and ensemble methods**
+
+<div dir="rtl">
+الطرق الشجرية (tree-based) والتجميعية (ensemble)
+</div> 
+<br>
+
+**64. These methods can be used for both regression and classification problems.**
+
+<div dir="rtl">
+هذه الطرق يمكن استخدامها لكلٍ من مشاكل الانحدار (regression) والتصنيف (classification).
+</div> 
+<br>
+
+**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
+
+<div dir="rtl">
+التصنيف والانحدار الشجري (CART) - والاسم الشائع له أشجار القرار (decision trees)، يمكن أن يمثل كأشجار ثنائية (binary trees). من المزايا لهذه الطريقة إمكانية تفسيرها بسهولة.
+</div> 
+<br>
+
+**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
+
+<div dir="rtl">
+الغابة العشوائية (Random forest) - هي أحد الطرق الشجرية التي تستخدم عدداً كبيراً من أشجار القرار مبنية باستخدام مجموعة عشوائية من الخصائص. بخلاف شجرة القرار البسيطة لا يمكن تفسير النموذج بسهولة، ولكن أدائها العالي جعلها أحد الخوارزمية المشهورة.
+</div> 
+<br>
+
+**67. Remark: random forests are a type of ensemble methods.**
+
+<div dir="rtl">
+ملاحظة: أشجار القرار نوع من الخوارزميات التجميعية (ensemble).
+</div> 
+<br>
+
+**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
+
+<div dir="rtl">
+التعزيز (Boosting) - فكرة خوارزميات التعزيز هي دمج عدة خوارزميات تعلم ضعيفة لتكوين نموذج قوي. الطرق الأساسية ملخصة في الجدول التالي:
+</div> 
+<br>
+
+**69. [Adaptive boosting, Gradient boosting]**
+
+<div dir="rtl">
+[التعزيز التَكَيُّفي (Adaptive boosting)، التعزيز الاشتقاقي (Gradient boosting)]
+</div> 
+<br>
+
+**70. High weights are put on errors to improve at the next boosting step**
+
+<div dir="rtl">
+يتم التركيز على مواطن الخطأ لتحسين النتيجة في الخطوة التالية.
+</div> 
+<br>
+
+**71. Weak learners trained on remaining errors**
+
+<div dir="rtl">
+يتم تدريب خوارزميات التعلم الضعيفة على الأخطاء المتبقية.
+</div> 
+<br>
+
+**72. Other non-parametric approaches**
+
+<div dir="rtl">
+طرق أخرى غير بارامترية (non-parametric)
+</div> 
+<br>
+
+**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
+
+<div dir="rtl">
+خوارزمية أقرب الجيران (k-nearest neighbors) - تعتبر خوارزمية أقرب الجيران، وتعرف بـ k-NN، طريقة غير بارامترية، حيث يتم تحديد نتيجة عينة من البيانات من خلال عدد k من البيانات المجاورة في مجموعة التدريب. ويمكن استخدامها للتصنيف والانحدار.
+</div> 
+<br>
+
+**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
+
+<div dir="rtl">
+ملاحظة: كلما زاد المُدخل k، كلما زاد الانحياز (bias)، وكلما نقص k، زاد التباين (variance).
+</div> 
+<br>
+
+**75. Learning Theory**
+
+<div dir="rtl">
+نظرية التعلُّم
+</div> 
+<br>
+
+**76. Union bound ― Let A1,...,Ak be k events. We have:**
+
+<div dir="rtl">
+حد الاتحاد (Union bound) - لنجعل A1,...,Ak تمثل k حدث. فيكون لدينا:
+</div> 
+<br>
+
+**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
+
+<div dir="rtl">
+متراجحة هوفدينج (Hoeffding) - لنجعل Z1,..,Zm تمثل m متغير مستقلة وموزعة بشكل مماثل (iid) مأخوذة من توزيع بِرنوللي (Bernoulli distribution) ذا مُدخل ϕ. لنجعل ˆϕ متوسط العينة (sample mean) و γ>0 ثابت. فيكون لدينا:
+</div> 
+<br>
+
+**78. Remark: this inequality is also known as the Chernoff bound.**
+
+<div dir="rtl">
+ملاحظة: هذه المتراجحة تعرف كذلك بحد تشرنوف (Chernoff bound).
+</div> 
+<br>
+
+**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
+
+<div dir="rtl">
+خطأ التدريب - ليكن لدينا المُصنِّف h، يمكن تعريف خطأ التدريب ˆϵ(h)، ويعرف كذلك بالخطر التجريبي أو الخطأ التجريبي، كالتالي:
+</div> 
+<br>
+
+**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
+
+<div dir="rtl">
+تقريباً صحيح احتمالياً (Probably Approximately Correct (PAC)) - هو إطار يتم من خلاله إثبات العديد من نظريات التعلم، ويحتوي على الافتراضات التالية:
+</div> 
+<br>
+
+**81: the training and testing sets follow the same distribution **
+
+<div dir="rtl">
+مجموعتي التدريب والاختبار يتبعان نفس التوزيع.
+</div> 
+<br>
+
+**82. the training examples are drawn independently**
+
+<div dir="rtl">
+عينات التدريب تؤخذ بشكل مستقل.
+</div> 
+<br>
+
+**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
+
+<div dir="rtl">
+مجموعة تكسيرية (Shattering Set) - إذا كان لدينا المجموعة S={x(1),...,x(d)}، ومجموعة مُصنٍّفات H، نقول أن H تكسر S (H shatters S) إذا كان لكل مجموعة علامات (labels) {y(1),...,y(d)} لدينا:
+</div> 
+<br>
+
+**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
+
+<div dir="rtl">
+مبرهنة الحد الأعلى (Upper bound theorem) - لنجعل H فئة فرضية محدودة (finite hypothesis class) بحيث |H|=k، و δ وحجم العينة m ثابتين. حينها سيكون لدينا، مع احتمال على الأقل 1−δ، التالي:
+</div> 
+<br>
+
+**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
+
+<div dir="rtl">
+بُعْد فابنيك-تشرفونيكس (Vapnik-Chervonenkis - VC) لفئة فرضية غير محدودة (infinite hypothesis class) H، ويرمز له بـ VC(H)، هو حجم أكبر مجموعة (set) التي تم تكسيرها بواسطة H (shattered by H).
+</div> 
+<br>
+
+**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
+
+<div dir="rtl">
+ملاحظة: بُعْد فابنيك-تشرفونيكس VC لـ H = {مجموعة التصنيفات الخطية في بُعدين} يساوي 3.
+</div> 
+<br>
+
+**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
+
+<div dir="rtl">
+مبرهنة فابنيك (Vapnik theorem) - ليكن لدينا H، مع VC(H)=d وعدد عيّنات التدريب m. سيكون لدينا، مع احتمال على الأقل 1−δ، التالي:
+</div> 
+<br>
+
+**88. [Introduction, Type of prediction, Type of model]**
+
+<div dir="rtl">
+[مقدمة، نوع التوقع، نوع النموذج]
+</div> 
+<br>
+
+**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
+
+<div dir="rtl">
+[الرموز ومفاهيم أساسية، دالة الخسارة، النزول الاشتقاقي، الأرجحية]
+</div> 
+<br>
+
+**90. [Linear models, linear regression, logistic regression, generalized linear models]**
+
+<div dir="rtl">
+[النماذج الخطيّة، الانحدار الخطّي، الانحدار اللوجستي، النماذج الخطية العامة]
+</div> 
+<br>
+
+**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
+
+<div dir="rtl">
+[آلة المتجهات الداعمة (SVM)، مُصنِّف الهامش الأحسن، الفرق المفصلي، النواة]
+</div> 
+<br>
+
+**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
+
+<div dir="rtl">
+[التعلم التوليدي، تحليل التمايز الجاوسي، بايز البسيط]
+</div> 
+<br>
+
+**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
+
+<div dir="rtl">
+[الطرق الشجرية والتجميعية، التصنيف والانحدار الشجري (CART)، الغابة العشوائية (Random forest)، التعزيز (Boosting)]
+</div> 
+<br>
+
+**94. [Other methods, k-NN]**
+
+<div dir="rtl">
+[طرق أخرى، خوارزمية أقرب الجيران (k-NN)]
+</div> 
+<br>
+
+**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
+
+<div dir="rtl">
+[نظرية التعلُّم، متراجحة هوفدنك، تقريباً صحيح احتمالياً (PAC)، بُعْد فابنيك-تشرفونيكس (VC dimension)]
+</div> 
diff --git a/ar/cheatsheet-unsupervised-learning.md b/ar/cs-229-unsupervised-learning.md
similarity index 99%
rename from ar/cheatsheet-unsupervised-learning.md
rename to ar/cs-229-unsupervised-learning.md
index d98e37ea2..6e309b36d 100644
--- a/ar/cheatsheet-unsupervised-learning.md
+++ b/ar/cs-229-unsupervised-learning.md
@@ -8,7 +8,7 @@
 
 **2. Introduction to Unsupervised Learning**
 
-<div dir=\"rtl\">
+<div dir="rtl">
   مقدمة للتعلّم غير المُوَجَّه
 </div>
 
@@ -16,9 +16,9 @@
 
 **3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
 
-<div dir=\"rtl\"> 
-  {x(1),...,x(m)} الحافز ― الهدف من التعلّم غير المُوَجَّه هو إيجاد الأنماط الخفية في البيانات غير المٌعلمّة 
-</div> 
+<div dir=\"rtl\">
+  {x(1),...,x(m)} الحافز ― الهدف من التعلّم غير المُوَجَّه هو إيجاد الأنماط الخفية في البيانات غير المٌعلمّة
+</div>
 
 <br>
 
@@ -269,7 +269,7 @@ dimensions by maximizing the variance of the data as follows:**
 **39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
 
 <div dir="rtl">
-الخطوة 3: حساب u1,...,uk∈Rn المتجهات الذاتية الرئيسية المتعامدة لـ Σ وعددها k ، بعبارة أخرى، k من المتجهات الذاتية المتعامدة ذات القيم الذاتية الأكبر. 
+الخطوة 3: حساب u1,...,uk∈Rn المتجهات الذاتية الرئيسية المتعامدة لـ Σ وعددها k ، بعبارة أخرى، k من المتجهات الذاتية المتعامدة ذات القيم الذاتية الأكبر.
 </div>
 <br>
 
@@ -387,7 +387,7 @@ dimensions by maximizing the variance of the data as follows:**
 **56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
 
 <div dir="rtl">
-[التجميع، تعظيم القيمة المتوقعة، تجميع k-متوسطات، التجميع الهرمي، مقاييس] 
+[التجميع، تعظيم القيمة المتوقعة، تجميع k-متوسطات، التجميع الهرمي، مقاييس]
 </div>
 <br>
 
diff --git a/ar/refresher-linear-algebra.md b/ar/refresher-linear-algebra.md
deleted file mode 100644
index a6b440d1e..000000000
--- a/ar/refresher-linear-algebra.md
+++ /dev/null
@@ -1,339 +0,0 @@
-**1. Linear Algebra and Calculus refresher**
-
-&#10230;
-
-<br>
-
-**2. General notations**
-
-&#10230;
-
-<br>
-
-**3. Definitions**
-
-&#10230;
-
-<br>
-
-**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
-
-&#10230;
-
-<br>
-
-**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
-
-&#10230;
-
-<br>
-
-**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
-
-&#10230;
-
-<br>
-
-**7. Main matrices**
-
-&#10230;
-
-<br>
-
-**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
-
-&#10230;
-
-<br>
-
-**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we also note D as diag(d1,...,dn).**
-
-&#10230;
-
-<br>
-
-**12. Matrix operations**
-
-&#10230;
-
-<br>
-
-**13. Multiplication**
-
-&#10230;
-
-<br>
-
-**14. Vector-vector ― There are two types of vector-vector products:**
-
-&#10230;
-
-<br>
-
-**15. inner product: for x,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**16. outer product: for x∈Rm,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
-
-&#10230;
-
-<br>
-
-**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
-
-&#10230;
-
-<br>
-
-**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
-
-&#10230;
-
-<br>
-
-**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
-
-&#10230;
-
-<br>
-
-**21. Other operations**
-
-&#10230;
-
-<br>
-
-**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
-
-&#10230;
-
-<br>
-
-**23. Remark: for matrices A,B, we have (AB)T=BTAT**
-
-&#10230;
-
-<br>
-
-**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
-
-&#10230;
-
-<br>
-
-**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
-
-&#10230;
-
-<br>
-
-**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
-
-&#10230;
-
-<br>
-
-**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
-
-&#10230;
-
-<br>
-
-**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
-
-&#10230;
-
-<br>
-
-**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
-
-&#10230;
-
-<br>
-
-**30. Matrix properties**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
-
-&#10230;
-
-<br>
-
-**33. [Symmetric, Antisymmetric]**
-
-&#10230;
-
-<br>
-
-**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
-
-&#10230;
-
-<br>
-
-**35. N(ax)=|a|N(x) for a scalar**
-
-&#10230;
-
-<br>
-
-**36. if N(x)=0, then x=0**
-
-&#10230;
-
-<br>
-
-**37. For x∈V, the most commonly used norms are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**38. [Norm, Notation, Definition, Use case]**
-
-&#10230;
-
-<br>
-
-**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
-
-&#10230;
-
-<br>
-
-**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
-
-&#10230;
-
-<br>
-
-**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
-
-&#10230;
-
-<br>
-
-**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
-
-&#10230;
-
-<br>
-
-**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
-
-&#10230;
-
-<br>
-
-**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**46. diagonal**
-
-&#10230;
-
-<br>
-
-**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
-
-&#10230;
-
-<br>
-
-**48. Matrix calculus**
-
-&#10230;
-
-<br>
-
-**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
-
-&#10230;
-
-<br>
-
-**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
-
-&#10230;
-
-<br>
-
-**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
-
-&#10230;
-
-<br>
-
-**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
-
-&#10230;
-
-<br>
-
-**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
-
-&#10230;
-
-<br>
-
-**54. [General notations, Definitions, Main matrices]**
-
-&#10230;
-
-<br>
-
-**55. [Matrix operations, Multiplication, Other operations]**
-
-&#10230;
-
-<br>
-
-**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
-
-&#10230;
-
-<br>
-
-**57. [Matrix calculus, Gradient, Hessian, Operations]**
-
-&#10230;
diff --git a/ar/refresher-probability.md b/ar/refresher-probability.md
deleted file mode 100644
index 5c9b34656..000000000
--- a/ar/refresher-probability.md
+++ /dev/null
@@ -1,381 +0,0 @@
-**1. Probabilities and Statistics refresher**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Probability and Combinatorics**
-
-&#10230;
-
-<br>
-
-**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-
-<br>
-
-**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-
-<br>
-
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-
-<br>
-
-**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-
-<br>
-
-**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-
-<br>
-
-**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-
-<br>
-
-**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-
-<br>
-
-**12. Conditional Probability**
-
-&#10230;
-
-<br>
-
-**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-
-<br>
-
-**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-
-<br>
-
-**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-
-<br>
-
-**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-
-<br>
-
-**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-
-<br>
-
-**18. Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-
-<br>
-
-**19. Random Variables**
-
-&#10230;
-
-<br>
-
-**20. Definitions**
-
-&#10230;
-
-<br>
-
-**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-
-<br>
-
-**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-
-<br>
-
-**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-
-<br>
-
-**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-
-<br>
-
-**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-
-<br>
-
-**26. [Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-
-<br>
-
-**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-
-<br>
-
-**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-
-<br>
-
-**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-
-<br>
-
-**32. Probability Distributions**
-
-&#10230;
-
-<br>
-
-**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-
-<br>
-
-**34. Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-
-<br>
-
-**35. [Type, Distribution]**
-
-&#10230;
-
-<br>
-
-**36. Jointly Distributed Random Variables**
-
-&#10230;
-
-<br>
-
-**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-
-<br>
-
-**38. [Case, Marginal density, Cumulative function]**
-
-&#10230;
-
-<br>
-
-**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**40. Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-
-<br>
-
-**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-
-<br>
-
-**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-
-<br>
-
-**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-
-<br>
-
-**44. Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-
-<br>
-
-**45. Parameter estimation**
-
-&#10230;
-
-<br>
-
-**46. Definitions**
-
-&#10230;
-
-<br>
-
-**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-
-<br>
-
-**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-
-<br>
-
-**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-
-<br>
-
-**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-
-<br>
-
-**51. Estimating the mean**
-
-&#10230;
-
-<br>
-
-**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
-
-&#10230;
-
-<br>
-
-**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-
-<br>
-
-**55. Estimating the variance**
-
-&#10230;
-
-<br>
-
-**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-
-<br>
-
-**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-
-<br>
-
-**59. [Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-
-<br>
-
-**60. [Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-
-<br>
-
-**61. [Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-
-<br>
-
-**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-
-<br>
-
-**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-
-<br>
-
-**64. [Parameter estimation, Mean, Variance]**
-
-&#10230;
diff --git a/de/cheatsheet-deep-learning.md b/de/cheatsheet-deep-learning.md
deleted file mode 100644
index a5aa3756c..000000000
--- a/de/cheatsheet-deep-learning.md
+++ /dev/null
@@ -1,321 +0,0 @@
-**1. Deep Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Neural Networks**
-
-&#10230;
-
-<br>
-
-**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
-
-&#10230;
-
-<br>
-
-**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
-
-&#10230;
-
-<br>
-
-**5. [Input layer, hidden layer, output layer]**
-
-&#10230;
-
-<br>
-
-**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
-
-&#10230;
-
-<br>
-
-**7. where we note w, b, z the weight, bias and output respectively.**
-
-&#10230;
-
-<br>
-
-**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
-
-&#10230;
-
-<br>
-
-**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
-
-&#10230;
-
-<br>
-
-**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
-
-&#10230;
-
-<br>
-
-**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
-
-&#10230;
-
-<br>
-
-**13. As a result, the weight is updated as follows:**
-
-&#10230;
-
-<br>
-
-**14. Updating weights ― In a neural network, weights are updated as follows:**
-
-&#10230;
-
-<br>
-
-**15. Step 1: Take a batch of training data.**
-
-&#10230;
-
-<br>
-
-**16. Step 2: Perform forward propagation to obtain the corresponding loss.**
-
-&#10230;
-
-<br>
-
-**17. Step 3: Backpropagate the loss to get the gradients.**
-
-&#10230;
-
-<br>
-
-**18. Step 4: Use the gradients to update the weights of the network.**
-
-&#10230;
-
-<br>
-
-**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
-
-&#10230;
-
-<br>
-
-**20. Convolutional Neural Networks**
-
-&#10230;
-
-<br>
-
-**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
-
-&#10230;
-
-<br>
-
-**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
-
-&#10230;
-
-<br>
-
-**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
-
-&#10230;
-
-<br>
-
-**24. Recurrent Neural Networks**
-
-&#10230;
-
-<br>
-
-**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
-
-&#10230;
-
-<br>
-
-**26. [Input gate, forget gate, gate, output gate]**
-
-&#10230;
-
-<br>
-
-**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
-
-&#10230;
-
-<br>
-
-**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
-
-&#10230;
-
-<br>
-
-**29. Reinforcement Learning and Control**
-
-&#10230;
-
-<br>
-
-**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
-
-&#10230;
-
-<br>
-
-**33. S is the set of states**
-
-&#10230;
-
-<br>
-
-**34. A is the set of actions**
-
-&#10230;
-
-<br>
-
-**35. {Psa} are the state transition probabilities for s∈S and a∈A**
-
-&#10230;
-
-<br>
-
-**36. γ∈[0,1[ is the discount factor**
-
-&#10230;
-
-<br>
-
-**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
-
-&#10230;
-
-<br>
-
-**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
-
-&#10230;
-
-<br>
-
-**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
-
-&#10230;
-
-<br>
-
-**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
-
-&#10230;
-
-<br>
-
-**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
-
-&#10230;
-
-<br>
-
-**42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
-
-&#10230;
-
-<br>
-
-**43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
-
-&#10230;
-
-<br>
-
-**44. 1) We initialize the value:**
-
-&#10230;
-
-<br>
-
-**45. 2) We iterate the value based on the values before:**
-
-&#10230;
-
-<br>
-
-**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
-
-&#10230;
-
-<br>
-
-**47. times took action a in state s and got to s′**
-
-&#10230;
-
-<br>
-
-**48. times took action a in state s**
-
-&#10230;
-
-<br>
-
-**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
-
-&#10230;
-
-<br>
-
-**50. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-**51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
-
-&#10230;
-
-<br>
-
-**52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
-
-&#10230;
-
-<br>
-
-**53. [Recurrent Neural Networks, Gates, LSTM]**
-
-&#10230;
-
-<br>
-
-**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
-
-&#10230;
diff --git a/de/cheatsheet-machine-learning-tips-and-tricks.md b/de/cheatsheet-machine-learning-tips-and-tricks.md
deleted file mode 100644
index 9712297b8..000000000
--- a/de/cheatsheet-machine-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,285 +0,0 @@
-**1. Machine Learning tips and tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Classification metrics**
-
-&#10230;
-
-<br>
-
-**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
-
-&#10230;
-
-<br>
-
-**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**5. [Predicted class, Actual class]**
-
-&#10230;
-
-<br>
-
-**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
-
-&#10230;
-
-<br>
-
-**7. [Metric, Formula, Interpretation]**
-
-&#10230;
-
-<br>
-
-**8. Overall performance of model**
-
-&#10230;
-
-<br>
-
-**9. How accurate the positive predictions are**
-
-&#10230;
-
-<br>
-
-**10. Coverage of actual positive sample**
-
-&#10230;
-
-<br>
-
-**11. Coverage of actual negative sample**
-
-&#10230;
-
-<br>
-
-**12. Hybrid metric useful for unbalanced classes**
-
-&#10230;
-
-<br>
-
-**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**14. [Metric, Formula, Equivalent]**
-
-&#10230;
-
-<br>
-
-**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
-
-&#10230;
-
-<br>
-
-**16. [Actual, Predicted]**
-
-&#10230;
-
-<br>
-
-**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
-
-&#10230;
-
-<br>
-
-**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
-
-&#10230;
-
-<br>
-
-**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
-
-&#10230;
-
-<br>
-
-**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
-
-&#10230;
-
-<br>
-
-**22. Model selection**
-
-&#10230;
-
-<br>
-
-**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
-
-&#10230;
-
-<br>
-
-**24. [Training set, Validation set, Testing set]**
-
-&#10230;
-
-<br>
-
-**25. [Model is trained, Model is assessed, Model gives predictions]**
-
-&#10230;
-
-<br>
-
-**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
-
-&#10230;
-
-<br>
-
-**27. [Also called hold-out or development set, Unseen data]**
-
-&#10230;
-
-<br>
-
-**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
-
-&#10230;
-
-<br>
-
-**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
-
-&#10230;
-
-<br>
-
-**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
-
-&#10230;
-
-<br>
-
-**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
-
-&#10230;
-
-<br>
-
-**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
-
-&#10230;
-
-<br>
-
-**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-**35. Diagnostics**
-
-&#10230;
-
-<br>
-
-**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
-
-&#10230;
-
-<br>
-
-**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
-
-&#10230;
-
-<br>
-
-**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
-
-&#10230;
-
-<br>
-
-**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
-
-&#10230;
-
-<br>
-
-**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
-
-&#10230;
-
-<br>
-
-**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
-
-&#10230;
-
-<br>
-
-**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
-
-&#10230;
-
-<br>
-
-**44. Regression metrics**
-
-&#10230;
-
-<br>
-
-**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-**47. [Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/de/cheatsheet-unsupervised-learning.md b/de/cheatsheet-unsupervised-learning.md
deleted file mode 100644
index 1bf117d72..000000000
--- a/de/cheatsheet-unsupervised-learning.md
+++ /dev/null
@@ -1,340 +0,0 @@
-**1. Unsupervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Unsupervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
-
-&#10230;
-
-<br>
-
-**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
-
-&#10230;
-
-<br>
-
-**5. Clustering**
-
-&#10230;
-
-<br>
-
-**6. Expectation-Maximization**
-
-&#10230;
-
-<br>
-
-**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
-
-&#10230;
-
-<br>
-
-**8. [Setting, Latent variable z, Comments]**
-
-&#10230;
-
-<br>
-
-**9. [Mixture of k Gaussians, Factor analysis]**
-
-&#10230;
-
-<br>
-
-**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
-
-&#10230;
-
-<br>
-
-**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
-
-&#10230;
-
-<br>
-
-**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
-
-&#10230;
-
-<br>
-
-**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
-
-&#10230;
-
-<br>
-
-**14. k-means clustering**
-
-&#10230;
-
-<br>
-
-**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
-
-&#10230;
-
-<br>
-
-**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
-
-&#10230;
-
-<br>
-
-**17. [Means initialization, Cluster assignment, Means update, Convergence]**
-
-&#10230;
-
-<br>
-
-**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
-
-&#10230;
-
-<br>
-
-**19. Hierarchical clustering**
-
-&#10230;
-
-<br>
-
-**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
-
-&#10230;
-
-<br>
-
-**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**22. [Ward linkage, Average linkage, Complete linkage]**
-
-&#10230;
-
-<br>
-
-**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
-
-&#10230;
-
-<br>
-
-**24. Clustering assessment metrics**
-
-&#10230;
-
-<br>
-
-**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
-
-&#10230;
-
-<br>
-
-**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
-
-&#10230;
-
-<br>
-
-**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
-
-&#10230;
-
-<br>
-
-**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Dimension reduction**
-
-&#10230;
-
-<br>
-
-**30. Principal component analysis**
-
-&#10230;
-
-<br>
-
-**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
-
-&#10230;
-
-<br>
-
-**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**34. diagonal**
-
-&#10230;
-
-<br>
-
-**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
-
-&#10230;
-
-<br>
-
-**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
-dimensions by maximizing the variance of the data as follows:**
-
-&#10230;
-
-<br>
-
-**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
-
-&#10230;
-
-<br>
-
-**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
-
-&#10230;
-
-<br>
-
-**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
-
-&#10230;
-
-<br>
-
-**40. Step 4: Project the data on spanR(u1,...,uk).**
-
-&#10230;
-
-<br>
-
-**41. This procedure maximizes the variance among all k-dimensional spaces.**
-
-&#10230;
-
-<br>
-
-**42. [Data in feature space, Find principal components, Data in principal components space]**
-
-&#10230;
-
-<br>
-
-**43. Independent component analysis**
-
-&#10230;
-
-<br>
-
-**44. It is a technique meant to find the underlying generating sources.**
-
-&#10230;
-
-<br>
-
-**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
-
-&#10230;
-
-<br>
-
-**46. The goal is to find the unmixing matrix W=A−1.**
-
-&#10230;
-
-<br>
-
-**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
-
-&#10230;
-
-<br>
-
-**48. Write the probability of x=As=W−1s as:**
-
-&#10230;
-
-<br>
-
-**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
-
-&#10230;
-
-<br>
-
-**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
-
-&#10230;
-
-<br>
-
-**51. The Machine Learning cheatsheets are now available in German.**
-
-&#10230;
-
-<br>
-
-**52. Original authors**
-
-&#10230;
-
-<br>
-
-**53. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**54. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**55. [Introduction, Motivation, Jensen's inequality]**
-
-&#10230;
-
-<br>
-
-**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
-
-&#10230;
-
-<br>
-
-**57. [Dimension reduction, PCA, ICA]**
-
-&#10230;
diff --git a/es/cheatsheet-deep-learning.md b/es/cs-229-deep-learning.md
similarity index 100%
rename from es/cheatsheet-deep-learning.md
rename to es/cs-229-deep-learning.md
diff --git a/es/refresher-linear-algebra.md b/es/cs-229-linear-algebra.md
similarity index 100%
rename from es/refresher-linear-algebra.md
rename to es/cs-229-linear-algebra.md
diff --git a/es/cheatsheet-machine-learning-tips-and-tricks.md b/es/cs-229-machine-learning-tips-and-tricks.md
similarity index 100%
rename from es/cheatsheet-machine-learning-tips-and-tricks.md
rename to es/cs-229-machine-learning-tips-and-tricks.md
diff --git a/es/refresher-probability.md b/es/cs-229-probability.md
similarity index 100%
rename from es/refresher-probability.md
rename to es/cs-229-probability.md
diff --git a/es/cheatsheet-supervised-learning.md b/es/cs-229-supervised-learning.md
similarity index 100%
rename from es/cheatsheet-supervised-learning.md
rename to es/cs-229-supervised-learning.md
diff --git a/es/cheatsheet-unsupervised-learning.md b/es/cs-229-unsupervised-learning.md
similarity index 100%
rename from es/cheatsheet-unsupervised-learning.md
rename to es/cs-229-unsupervised-learning.md
diff --git a/fa/cheatsheet-deep-learning.md b/fa/cs-229-deep-learning.md
similarity index 100%
rename from fa/cheatsheet-deep-learning.md
rename to fa/cs-229-deep-learning.md
diff --git a/fa/refresher-linear-algebra.md b/fa/cs-229-linear-algebra.md
similarity index 100%
rename from fa/refresher-linear-algebra.md
rename to fa/cs-229-linear-algebra.md
diff --git a/fa/cheatsheet-machine-learning-tips-and-tricks.md b/fa/cs-229-machine-learning-tips-and-tricks.md
similarity index 100%
rename from fa/cheatsheet-machine-learning-tips-and-tricks.md
rename to fa/cs-229-machine-learning-tips-and-tricks.md
diff --git a/fa/refresher-probability.md b/fa/cs-229-probability.md
similarity index 100%
rename from fa/refresher-probability.md
rename to fa/cs-229-probability.md
diff --git a/fa/cheatsheet-supervised-learning.md b/fa/cs-229-supervised-learning.md
similarity index 100%
rename from fa/cheatsheet-supervised-learning.md
rename to fa/cs-229-supervised-learning.md
diff --git a/fa/cheatsheet-unsupervised-learning.md b/fa/cs-229-unsupervised-learning.md
similarity index 100%
rename from fa/cheatsheet-unsupervised-learning.md
rename to fa/cs-229-unsupervised-learning.md
diff --git a/fa/cs-230-convolutional-neural-networks.md b/fa/cs-230-convolutional-neural-networks.md
new file mode 100644
index 000000000..ee4201100
--- /dev/null
+++ b/fa/cs-230-convolutional-neural-networks.md
@@ -0,0 +1,923 @@
+**Convolutional Neural Networks translation**
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+<div dir="rtl">
+راهنمای کوتاه شبکه‌های عصبی پیچشی (کانولوشنی)
+</div>  
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+<div dir="rtl">
+کلاس CS 230 - یادگیری عمیق
+</div>
+<br>
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+<div dir="rtl">
+[نمای کلی، ساختار معماری]
+</div>
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+<div dir="rtl">
+[انواع لایه، کانولوشنی، ادغام، تمام‌متصل]
+</div>
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+<div dir="rtl">
+[ابرفراسنج‌های فیلتر، ابعاد، گام، حاشیه] 
+</div>
+<br>
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+<div dir="rtl">
+[تنظیم ابرفراسنج‌ها، سازش‌پذیری فراسنج، پیچیدگی مدل،  ناحیه‌ی تاثیر]
+</div>
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+<div dir="rtl">
+[توابع فعال‌سازی، تابع یکسوساز خطی، تابع بیشینه‌ی هموار] 
+</div>
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+<div dir="rtl">
+[شناسایی شیء، انواع مدل‌ها، شناسایی، نسبت هم‌پوشانی اشتراک به اجتماع، فروداشت غیربیشینه، YOLO، R-CNN]
+</div>
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+<div dir="rtl">
+[تایید/بازشناسایی چهره، یادگیری یک‌باره‌ای (One shot)، شبکه‌ی Siamese، خطای سه‌گانه]
+</div> 
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+<div dir="rtl">
+[انتقالِ سبکِ عصبی، فعال سازی، ماتریسِ سبک، تابع هزینه‌ی محتوا/سبک]
+</div>
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+<div dir="rtl">
+[معماری‌های با ترفندهای محاسباتی، شبکه‌ی هم‌آوردِ مولد، ResNet، شبکه‌ی Inception]
+</div>
+
+<br>
+
+
+**12. Overview**
+
+<div dir="rtl">
+نمای کلی
+</div>
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+<div dir="rtl">
+معماری یک CNN سنتی – شبکه‌های عصبی مصنوعی پیچشی، که همچنین با عنوان CNN شناخته می شوند، یک نوع خاص از شبکه های عصبی هستند که عموما از لایه‌های زیر تشکیل شده‌اند:
+</div>
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+<div dir="rtl">
+لایه‌ی کانولوشنی و لایه‌ی ادغام می‌توانند به نسبت ابرفراسنج‌هایی که در بخش‌های بعدی بیان شده‌اند تنظیم و تعدیل شوند.
+</div>
+
+<br>
+
+
+**15. Types of layer**
+
+<div dir="rtl">
+انواع لایه‌ها
+</div> 
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+<div dir="rtl">
+لایه کانولوشنی (CONV) - لایه کانولوشنی (CONV) از فیلترهایی استفاده می‌کند که عملیات کانولوشنی را در هنگام پویش ورودی I به نسبت ابعادش، اجرا می‌کند. ابرفراسنج‌های آن شامل اندازه فیلتر F و گام S هستند. خروجی حاصل شده O نگاشت ویژگی یا نگاشت فعال‌سازی نامیده می‌شود.
+</div>
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+<div dir="rtl">
+نکته: مرحله کانولوشنی همچنین می‌تواند به موارد یک بُعدی و سه بُعدی تعمیم داده شود.
+</div>
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+<div dir="rtl">
+لایه ادغام (POOL) - لایه ادغام (POOL) یک عمل نمونه‌کاهی است، که معمولا بعد از یک لایه کانولوشنی اعمال می‌شود، که تا حدی منجر به ناوردایی مکانی می‌شود. به طور خاص، ادغام بیشینه و میانگین انواع خاص ادغام هستند که به ترتیب مقدار بیشینه و میانگین گرفته می‌شود.
+</div>
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+<div dir="rtl">
+[نوع، هدف، نگاره، توضیحات]
+</div>
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+<div dir="rtl">
+[ادغام بیشینه، ادغام میانگین، هر عمل ادغام مقدار بیشینه‌ی نمای فعلی را انتخاب می‌کند، هر عمل ادغام مقدار میانگینِ نمای فعلی را انتخاب می‌کند]
+</div>
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+<div dir="rtl">
+[ویژگی‌های شناسایی شده را حفظ می‌کند، اغلب مورد استفاده قرار می‌گیرد، کاستن نگاشت ویژگی، در (معماری) LeNet استفاده شده است]
+</div>
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+<div dir="rtl">
+تمام‌متصل (FC) - لایه‌ی تمام‌متصل (FC) بر روی یک ورودی مسطح به طوری ‌که هر ورودی به تمامی نورون‌ها متصل است، عمل می‌کند. در صورت وجود، لایه‌های FC معمولا در انتهای معماری‌های CNN یافت می‌شوند و می‌توان آن‌ها را برای بهینه‌سازی اهدافی مثل امتیازات کلاس به‌ کار برد.
+</div>
+<br>
+
+
+**23. Filter hyperparameters**
+
+<div dir="rtl">
+ابرفراسنج‌های فیلتر
+</div>
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+<div dir="rtl">
+لایه کانولوشنی شامل فیلترهایی است که دانستن مفهوم نهفته در فراسنج‌های آن اهمیت دارد.
+</div>
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+<div dir="rtl">
+ابعاد یک فیلتر - یک فیلتر به اندازه F×F اعمال شده بر روی یک ورودیِ حاوی C کانال، یک توده F×F×C است که (عملیات) پیچشی بر روی یک ورودی به اندازه I×I×C اعمال می‌کند و یک نگاشت ویژگی خروجی (که همچنین نگاشت فعال‌سازی نامیده می‌شود) به اندازه O×O×1 تولید می‌کند.
+</div>
+
+<br>
+
+
+**26. Filter**
+
+<div dir="rtl">
+فیلتر
+</div>
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+<div dir="rtl">
+نکته: اعمال K فیلتر به اندازه‌ی F×F، منتج به یک نگاشت ویژگی خروجی به اندازه O×O×K می‌شود.
+</div>
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+<div dir="rtl">
+گام – در یک عملیات ادغام یا پیچشی، اندازه گام S به تعداد پیکسل‌هایی که پنجره بعد از هر عملیات جابه‌جا می‌شود، اشاره دارد.
+</div>
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+<div dir="rtl">
+حاشیه‌ی صفر – حاشیه‌ی صفر به فرآیند افزودن P صفر به هر طرف از کرانه‌های ورودی اشاره دارد. این مقدار می‌تواند به طور دستی مشخص شود یا به طور خودکار به سه روش زیر تعیین گردد:
+</div>
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+<div dir="rtl">
+[نوع، مقدار، نگاره، هدف، Valid، Same، Full]
+</div>
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+<div dir="rtl">
+[فاقد حاشیه، اگر ابعاد مطابقت ندارد آخرین کانولوشنی را رها کن، (اعمال) حاشیه به طوری که اندازه نگاشت ویژگی ⌈IS⌉ باشد، (محاسبه) اندازه خروجی به لحاظ ریاضیاتی آسان است، همچنین حاشیه‌ی 'نیمه' نامیده می‌شود، بالاترین حاشیه (اعمال می‌شود) به طوری که (عملیات) کانولوشنی انتهایی بر روی مرزهای ورودی اعمال می‌شود، فیلتر ورودی را به صورت پکپارچه 'می‌پیماید']
+</div>
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+<div dir="rtl">
+تنظیم ابرفراسنج‌ها
+</div>
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+<div dir="rtl">
+سازش‌پذیری فراسنج در لایه کانولوشنی – با ذکر I به عنوان طول اندازه توده ورودی، F طول فیلتر، P میزان حاشیه‌ی صفر، S گام، اندازه خروجی نگاشت ویژگی O در امتداد ابعاد خواهد بود:
+</div>
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+<div dir="rtl">
+[ورودی، فیلتر، خروجی]
+</div>
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+<div dir="rtl">
+نکته: اغلب Pstart=Pend≜P است، در این صورت Pstart+Pend را می‌توان با  2 Pدر فرمول بالا جایگزین کرد.
+</div>
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+<div dir="rtl">
+درک پیچیدگی مدل – برای برآورد پیچیدگی مدل، اغلب تعیین تعداد فراسنج‌هایی که معماری آن می‌تواند داشته باشد، مفید است. در یک لایه مفروض شبکه پیچشی عصبی این امر به صورت زیر انجام می‌شود:
+</div>
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+<div dir="rtl">
+[نگاره، اندازه ورودی، اندازه خروجی، تعداد فراسنج‌ها، ملاحظات]
+</div>
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+<div dir="rtl">
+[یک پیش‌قدر به ازای هر فیلتر، در بیشتر موارد S&lt;F است، یک انتخاب رایج برای K، 2C است]
+</div>
+
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+<div dir="rtl">
+[عملیات ادغام به صورت کانال‌به‌کانال انجام میشود، در بیشتر موارد S=F است]
+</div>
+
+<br>
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+<div dir="rtl">
+[ورودی مسطح شده است، یک پیش‌قدر به ازای هر نورون، تعداد نورون‌های FC فاقد محدودیت‌های ساختاری‌ست]
+</div>
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+<div dir="rtl">
+ناحیه تاثیر – ناحیه تاثیر در لایه k محدوده‌ای از ورودی Rk×Rk است که هر پیکسلِ kاٌم نگاشت ویژگی می‌تواند 'ببیند'. با ذکر Fj به عنوان اندازه فیلتر لایه j و Si مقدار گام لایه i و با این توافق که S0=1 است، ناحیه تاثیر در لایه k با فرمول زیر محاسبه می‌شود:
+</div>
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+<div dir="rtl">
+در مثال زیر داریم، F1=F2=3 و S1=S2=1 که منتج به R2=1+2⋅1+2⋅1=5 می‌شود.
+</div>
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+<div dir="rtl">
+توابع فعال‌سازی پرکاربرد
+</div>
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+<div dir="rtl">
+تابع یکسوساز خطی – تابع یکسوساز خطی (ReLU) یک تابع فعال‌سازی g است که بر روی تمامی عناصر توده اعمال می‌شود. هدف آن ارائه (رفتار) غیرخطی به شبکه است. انواع آن در جدول زیر به‌صورت خلاصه آمده‌اند:
+</div>
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+<div dir="rtl">
+[ReLU ، ReLUنشت‌دار، ELU، با]
+</div>
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+<div dir="rtl">
+[پیچیدگی‌های غیر خطی که از دیدگاه زیستی قابل تفسیر هستند، مسئله افول ReLU برای مقادیر منفی را مهار می‌کند، در تمامی نقاط مشتق‌پذیر است]
+</div>
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+<div dir="rtl">
+بیشینه‌ی هموار – مرحله بیشینه‌ی هموار را می‌توان به عنوان یک تابع لجستیکی تعمیم داده شده که یک بردار x∈Rn را از ورودی می‌گیرد و یک بردار خروجی احتمال p∈Rn، به‌واسطه‌ی تابع بیشینه‌ی هموار در انتهای معماری، تولید می‌کند. این تابع به‌صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**48. where**
+
+<div dir="rtl">
+که
+</div>
+
+<br>
+
+
+**49. Object detection**
+
+<div dir="rtl">
+شناسایی شیء
+</div>
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+<div dir="rtl">
+انواع مدل‌ – سه نوع اصلی از الگوریتم‌های بازشناسایی وجود دارد، که ماهیت آنچه‌که شناسایی شده متفاوت است. این الگوریتم‌ها در جدول زیر توضیح داده شده‌اند:
+</div>
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+<div dir="rtl">
+[دسته‌بندی تصویر، دسته‌بندی با موقعیت‌یابی، شناسایی]
+</div>
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+<div dir="rtl">
+[خرس تدی، کتاب]
+</div>
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+<div dir="rtl">
+[یک عکس را دسته‌بندی می‌کند، احتمال شیء را پیش‌بینی می‌کند، یک شیء را در یک عکس شناسایی می‌کند، احتمال یک شیء و موقعیت آن را پیش‌بینی میکند، چندین شیء در یک عکس را شناسایی می‌کند، احتمال اشیاء و موقعیت آنها را پیش‌بینی می‌کند]
+</div>
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+<div dir="rtl">
+[CNN سنتی، YOLO ساده شده، R-CNN، YOLO، R-CNN]
+</div>
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+<div dir="rtl">
+شناسایی – در مضمون شناسایی شیء، روشهای مختلفی بسته به اینکه آیا فقط می‌خواهیم موقعیت قرارگیری شیء را پیدا کنیم یا شکل پیچیده‌تری در تصویر را شناسایی کنیم، استفاده می‌شوند. دو مورد از اصلی ترین آنها در جدول زیر به‌صورت خلاصه آورده‌ شده‌اند:
+</div>
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+<div dir="rtl">
+[پیش‌بینی کادر محصورکننده، ]شناسایی نقاط(برجسته)
+</div>
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+<div dir="rtl">
+[بخشی از تصویر که شیء در آن قرار گرفته را شناسایی می‌کند، یک شکل یا مشخصات یک شیء (مثل چشم‌ها) را شناسایی می‌کند، موشکافانه‌تر]
+</div>
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+<div dir="rtl">
+[مرکزِ کادر (bx,by)، ارتفاع bh و عرض bw، نقاط مرجع (l1x,l1y), ..., (lnx,lny)]
+</div>
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+<div dir="rtl">
+نسبت هم‌پوشانی اشتراک به اجتماع - نسبت هم‌پوشانی اشتراک به اجتماع، همچنین به عنوان IoU شناخته می‌شود، تابعی‌ است که میزان موقعیت دقیق کادر محصورکننده Bp نسبت به کادر محصورکننده حقیقی Ba را می‌سنجد. این تابع به‌صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+<div dir="rtl">
+نکته: همواره داریم IoU∈[0,1]. به صورت قرارداد، یک کادر محصورکننده Bp را می‌توان نسبتا خوب در نظر گرفت اگر IoU(Bp,Ba)⩾0.5 باشد.
+</div>
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+<div dir="rtl">
+کادرهای محوری – کادر بندی محوری روشی است که برای پیش‌بینی کادرهای محصورکننده هم‌پوشان استفاده می‌شود. در عمل، شبکه این اجازه را دارد که بیش از یک کادر به‌صورت هم‌زمان پیش‌بینی کند جایی‌که هر پیش‌بینی کادر مقید به داشتن یک مجموعه خصوصیات هندسی مفروض است. به عنوان مثال، اولین پیش‌بینی می‌تواند یک کادر مستطیلی با قالب خاص باشد حال آنکه کادر دوم، یک کادر مستطیلی محوری با قالب هندسی متفاوتی خواهد بود.
+</div>
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+<div dir="rtl">
+فروداشت غیربیشینه – هدف روش فروداشت غیربیشینه، حذف کادرهای محصورکننده هم‌پوشان تکراریِ دسته یکسان با انتخاب معرف‌ترین‌ها است. بعد از حذف همه کادرهایی که احتمال پیش‌بینی پایین‌تر از 0.6 دارند، مراحل زیر  با وجود آنکه کادرهایی باقی می‌مانند، تکرار می‌شوند:
+</div>
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+<div dir="rtl">
+[برای یک دسته مفروض، گام اول: کادر با بالاترین احتمال پیش‌بینی را انتخاب کن، گام دوم: هر کادری که IoU≥0.5 نسبت به کادر پیشین دارد را رها کن.]
+</div>
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+<div dir="rtl">
+[پیش‌بینی کادرها، انتخاب کادرِ با احتمال بیشینه، حذف (کادر) همپوشان دسته یکسان، کادرهای محصورکننده نهایی]
+</div>
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+<div dir="rtl">
+YOLO -  «شما فقط یک‌بار نگاه می‌کنید» (YOLO) یک الگوریتم شناسایی شیء است که مراحل زیر را اجرا می‌کند:
+</div>
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+<div dir="rtl">
+[گام اول: تصویر ورودی را به یک مشبک G×G تقسیم کن، گام دوم: برای هر سلول مشبک، یک CNN که y را به شکل زیر پیش‌بینی می‌کند، اجرا کن:، k مرتبه تکرارشده]
+</div>
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+<div dir="rtl">
+که pc احتمال شناسایی یک شیء است، bx,by,bh,bw اندازه‌های نسبی کادر محیطی شناسایی شده است، c1,...,cp نمایش «تک‌فعال» یک دسته از p دسته که تشخیص داده شده است، و k تعداد کادرهای محوری است.
+
+</div>
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+<div dir="rtl">
+گام سوم: الگوریتم فروداشت غیربیشینه را برای حذف هر کادر محصورکننده هم‌پوشان تکراری بالقوه، اجرا کن.
+</div>
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+<div dir="rtl">
+[تصویر اصلی، تقسیم به GxG مشبک، پیش‌بینی کادر محصورکننده، فروداشت غیربیشینه]
+</div>
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+<div dir="rtl">
+نکته: زمانی‌که pc=0 است، شبکه هیچ شیئی را شناسایی نمی‌کند. در چنین حالتی، پیش‌بینی‌های متناظر bx,…,cp بایستی نادیده گرفته شوند.
+</div>
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+<div dir="rtl">
+R-CNN - ناحیه با شبکه‌های عصبی پیچشی (R-CNN) یک الگوریتم شناسایی شیء است که ابتدا تصویر را برای یافتن کادرهای محصورکننده مربوط بالقوه قطعه‌بندی می‌کند و سپس الگوریتم شناسایی را برای یافتن محتمل‌ترین اشیاء در این کادرهای محصور کننده اجرا می‌کند.
+</div>
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+<div dir="rtl">
+[تصویر اصلی، قطعه بندی، پیش‌بینی کادر محصور کننده، فروداشت غیربیشینه]
+</div>
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+<div dir="rtl">
+نکته: هرچند الگوریتم اصلی به لحاظ محاسباتی پرهزینه و کند است، معماری‌های جدید از قبیل Fast R-CNN و Faster R-CNN باعث شدند که الگوریتم سریعتر اجرا شود.
+</div>
+
+<br>
+
+
+**74. Face verification and recognition**
+
+<div dir="rtl">
+تایید چهره و بازشناسایی
+</div>
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+<div dir="rtl">
+انواع مدل – دو نوع اصلی از مدل در جدول زیر به‌صورت خلاصه آورده‌ شده‌اند:
+</div>
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+<div dir="rtl">
+[تایید چهره، بازشناسایی چهره، جستار، مرجع، پایگاه داده]
+</div>
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+<div dir="rtl">
+[فرد مورد نظر است؟، جستجوی یک‌به‌یک، این فرد یکی از K فرد پایگاه داده است؟، جستجوی یک‌به‌چند]
+</div>
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+<div dir="rtl">
+یادگیری یک‌باره‌ای – یادگیری یک‌باره‌ای یک الگوریتم تایید چهره است که از یک مجموعه آموزشی محدود برای یادگیری یک تابع مشابهت که میزان اختلاف دو تصویر مفروض را تعیین می‌کند، بهره می‌برد. تابع مشابهت اعمال‌شده بر روی دو تصویر اغلب با نماد  d(image 1, image 2) نمایش داده می‌شود.
+</div>
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+<div dir="rtl">
+شبکه‌ی Siamese - هدف شبکه‌ی Siamese یادگیری طریقه رمزنگاری تصاویر و سپس تعیین اختلاف دو تصویر است. برای یک تصویر مفروض ورودی x(i)، خروجی رمزنگاری شده اغلب با نماد f(x(i)) نمایش داده می‌شود.
+</div>
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+<div dir="rtl">
+خطای سه‌گانه – خطای سه‌گانه ℓ یک تابع خطا است که بر روی بازنمایی تعبیه‌ی سه‌گانه‌ی تصاویر A (محور)، P (مثبت) و N (منفی)  محاسبه می‌شود. نمونه‌های محور (anchor) و مثبت به دسته یکسانی تعلق دارند، حال آنکه نمونه منفی به دسته دیگری تعلق دارد. با نامیدن α∈R+ (به عنوان) فراسنج حاشیه، این خطا به‌صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**81. Neural style transfer**
+
+<div dir="rtl">
+انتقالِ سبک عصبی
+</div>
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+<div dir="rtl">
+انگیزه – هدف انتقالِ سبک عصبی تولید یک تصویر G بر مبنای یک محتوای مفروض C و سبک مفروض S است.
+</div>
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+<div dir="rtl">
+[محتوای  C، سبک S، تصویر تولیدشده‌ی  G]
+</div>
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+<div dir="rtl">
+فعال‌سازی – در یک لایه مفروض l، فعال‌سازی با a[l] نمایش داده می‌شود و به ابعاد nH×nw×nc است
+</div>
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+<div dir="rtl">
+تابع هزینه‌ی محتوا – تابع هزینه‌ی محتوا Jcontent(C,G) برای تعیین میزان اختلاف تصویر تولیدشده G از تصویر اصلی C استفاده می‌شود. این تابع به‌صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+<div dir="rtl">
+ماتریسِ سبک - ماتریسِ سبک G[l] یک لایه مفروض l، یک ماتریس گرَم (Gram) است که هر کدام از عناصر G[l]kk′ میزان همبستگی کانال‌های k و k′ را می‌سنجند. این ماتریس نسبت به فعال‌سازی‌های a[l] به‌صورت زیر محاسبه می‌شود:
+</div>
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+<div dir="rtl">
+نکته: ماتریس سبک برای تصویر سبک و تصویر تولید شده، به ترتیب با G[l] (S) و G[l] (G) نمایش داده می‌شوند.
+</div>
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+<div dir="rtl">
+تابع هزینه‌ی سبک – تابع هزینه‌ی سبک Jstyle(S,G) برای تعیین میزان اختلاف تصویر تولیدشده G و سبک S استفاده می‌شود. این تابع به صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+<div dir="rtl">
+تابع هزینه‌ی کل – تابع هزینه‌ی کل به صورت ترکیبی از توابع هزینه‌ی سبک و محتوا تعریف شده است که با فراسنج‌های α,β, به شکل زیر وزن‌دار شده است:
+</div>
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+<div dir="rtl">
+نکته: مقدار بیشتر α مدل را به توجه بیشتر به محتوا وا می‌دارد حال آنکه مقدار بیشتر β مدل را به توجه بیشتر به سبک وا می‌دارد.
+</div>
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+<div dir="rtl">
+معماری‌هایی که از ترفندهای محاسباتی استفاده می‌کنند.
+</div>
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+<div dir="rtl">
+شبکه‌ی هم‌آوردِ مولد – شبکه‌ی هم‌آوردِ مولد، همچنین با نام GANs شناخته می‌شوند، ترکیبی از یک مدل مولد و تمیزدهنده هستند، جایی‌که مدل مولد هدفش تولید واقعی‌ترین خروجی است که به (مدل) تمیزدهنده تغذیه می‌شود و این (مدل) هدفش تفکیک بین تصویر تولیدشده و واقعی است.
+</div>
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+<div dir="rtl">
+[آموزش، نویز، تصویر دنیای واقعی، مولد، تمیز دهنده، واقعی بدلی]
+</div>
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+<div dir="rtl">
+نکته: موارد استفاده متنوع GAN ها شامل تبدیل متن به تصویر، تولید موسیقی و تلفیقی از آنهاست.
+</div>
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+<div dir="rtl">
+ResNet – معماری شبکه‌ی پسماند (همچنین با عنوان ResNet شناخته می‌شود) از بلاک‌های پسماند با تعداد لایه‌های زیاد به منظور کاهش خطای آموزش استفاده می‌کند. بلاک پسماند معادله‌ای با خصوصیات زیر دارد:
+</div>
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+<div dir="rtl">
+شبکه‌ی Inception – این معماری از ماژول‌های inception استفاده می‌کند و هدفش فرصت دادن به (عملیات) کانولوشنی مختلف برای افزایش کارایی از طریق تنوع‌بخشی ویژگی‌ها است. به طور خاص، این معماری از ترفند کانولوشنی 1×1 برای محدود سازی بار محاسباتی استفاده می‌کند.
+</div>
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+<div dir="rtl">
+راهنمای یادگیری عمیق هم اکنون به زبان ]فارسی[ در دسترس است.
+</div>
+
+<br>
+
+
+**98. Original authors**
+
+<div dir="rtl">
+نویسندگان اصلی
+</div>
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+<div dir="rtl">
+ترجمه شده توسط X،Y و Z
+</div>
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+<div dir="rtl">
+بازبینی شده توسط توسط X،Y و Z
+</div>
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+<div dir="rtl">
+نسخه پی‌دی‌اف را در گیت‌هاب ببینید
+</div>
+
+<br>
+
+
+**102. By X and Y**
+
+<div dir="rtl">
+توسط X و Y
+</div>
+
+<br>
+
diff --git a/fa/cs-230-deep-learning-tips-and-tricks.md b/fa/cs-230-deep-learning-tips-and-tricks.md
new file mode 100644
index 000000000..1248a06bf
--- /dev/null
+++ b/fa/cs-230-deep-learning-tips-and-tricks.md
@@ -0,0 +1,586 @@
+
+**Deep Learning Tips and Tricks translation**
+
+<br>
+
+**1. Deep Learning Tips and Tricks cheatsheet**
+
+<div dir="rtl">
+راهنمای کوتاه نکات و ترفندهای یادگیری عمیق
+</div>
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+<div dir="rtl">
+کلاس CS 230 - یادگیری عمیق
+</div>
+
+<br>
+
+
+**3. Tips and tricks**
+
+<div dir="rtl">
+نکات و ترفندها
+</div>
+
+<br>
+
+
+**4. [Data processing, Data augmentation, Batch normalization]**
+
+<div dir="rtl">
+[پردازش داده، داده‌افزایی، نرمال‌سازی دسته‌ای]
+</div>
+
+<br>
+
+
+**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
+
+<div dir="rtl">
+[آموزش یک شبکه‌ی عصبی، تکرار(Epoch)، دسته‌ی کوچک، خطای آنتروپی متقاطع، انتشار معکوس، گرادیان نزولی، به‌روزرسانی وزن‌ها، وارسی گرادیان]
+</div>
+
+<br>
+
+
+**6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
+
+<div dir="rtl">
+[تنظیم فراسنج، مقداردهی اولیه Xavier،یادگیری انتقالی، نرخ یادگیری، نرخ یادگیری سازگارشونده]
+</div>
+
+<br>
+
+
+**7. [Regularization, Dropout, Weight regularization, Early stopping]**
+
+<div dir="rtl">
+[نظام‌بخشی، برون‌اندازی، نظام‌بخشی وزن، توقف زودهنگام]
+</div>
+
+<br>
+
+
+**8. [Good practices, Overfitting small batch, Gradient checking]**
+
+<div dir="rtl">
+[عادت‌های خوب، بیش‌برارزش دسته‌ی کوچک، وارسی گرادیان]
+</div>
+
+<br>
+
+
+**9. View PDF version on GitHub**
+
+<div dir="rtl">
+نسخه پی‌دی‌اف را در گیت‌هاب ببینید 
+</div>
+
+<br>
+
+
+**10. Data processing**
+
+<div dir="rtl">
+پردازش داده
+</div>
+
+<br>
+
+
+**11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
+
+<div dir="rtl">
+داده‌افزایی ― مدل‌های یادگیری عمیق معمولا به داده‌های زیادی نیاز دارند تا بتوانند به خوبی آموزش ببینند. اغلب، استفاده از روش‌های داده‌افزایی برای گرفتن داده‌ی بیشتر از داده‌های موجود، مفید است. اصلی‌ترین آنها در جدول زیر به اختصار آمده‌اند. به عبارت دقیق‌تر، با در نظر گرفتن تصویر ورودی زیر، روش‌هایی که می‌توان اعمال کرد بدین شرح هستند:
+</div>
+
+<br>
+
+**12. [Original, Flip, Rotation, Random crop]**
+
+<div dir="rtl">
+[تصویر اصلی، قرینه، چرخش، برش تصادفی]
+</div>
+
+<br>
+
+
+**13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
+
+<div dir="rtl">
+[تصویر (آغازین) بدون هیچ‌گونه تغییری، قرینه‌شده نسبت به محوری که معنای (محتوای) تصویر را حفظ می‌کند، چرخش با زاویه‌ی اندک، خط افق نادرست را شبیه‌سازی می‌کند، روی ناحیه‌ای تصادفی از تصویر متمرکز می‌شود، چندین برش تصادفی را میتوان پشت‌سرهم انجام داد]
+</div>
+<br>
+
+
+**14. [Color shift, Noise addition, Information loss, Contrast change]**
+
+<div dir="rtl">
+[تغییر رنگ، اضافه‌کردن نویز، هدررفت اطلاعات، تغییر تباین(کُنتراست)]
+</div>
+
+<br>
+
+
+**15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
+
+<div dir="rtl">
+[عناصر RGB کمی تغییر کرده است، نویزی که در هنگام مواجه شدن با نور رخ می‌دهد را شبیه‌سازی می‌کند، افزودگی نویز، مقاومت بیشتر نسبت به تغییر کیفیت تصاویر ورودی، بخش‌هایی از تصویر نادیده گرفته می‌شوند، تقلید (شبیه سازی) هدررفت بالقوه بخش‌هایی از تصویر، تغییر درخشندگی، با توجه به زمان روز تفاوت نمایش (تصویر) را کنترل می‌کند]
+</div>
+
+<br>
+
+
+**16. Remark: data is usually augmented on the fly during training.**
+
+<div dir="rtl">
+نکته: داده‌ها معمولا در فرآیند آموزش (به صورت درجا) افزایش پیدا می‌کنند.
+</div>
+
+<br>
+
+
+**17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
+
+<div dir="rtl">
+نرمال‌سازی دسته‌ای ― یک مرحله از فراسنج‌های γ و β که دسته‌ی {xi} را نرمال می‌کند. نماد μB و σ2B به میانگین و وردایی دسته‌ای که می‌خواهیم آن را اصلاح کنیم اشاره دارد که به صورت زیر است:
+</div>
+
+<br>
+
+
+**18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
+
+<div dir="rtl">
+معمولا بعد از یک لایه‌ی تمام‌متصل یا لایه‌ی کانولوشنی و قبل از یک لایه‌ی غیرخطی اعمال می‌شود و امکان استفاده از نرخ یادگیری بالاتر را می‌دهد و همچنین باعث می‌شود که وابستگی شدید مدل به مقداردهی اولیه کاهش یابد.
+</div>
+
+<br>
+
+
+**19. Training a neural network**
+
+<div dir="rtl">
+آموزش یک شبکه‌ی عصبی
+</div>
+
+<br>
+
+
+**20. Definitions**
+
+<div dir="rtl">
+تعاریف
+</div>
+
+<br>
+
+
+**21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
+
+<div dir="rtl">
+تکرار (epoch) ― در مضمون آموزش یک مدل، تکرار اصطلاحی است که مدل در یک دوره تکرار تمامی نمونه‌های آموزشی را برای به‌روزرسانی وزن‌ها می‌بیند.
+</div>
+
+<br>
+
+
+**22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
+
+<div dir="rtl">
+گرادیان نزولی دسته‌ی‌کوچک ―  در فاز آموزش، به‌روزرسانی وزن‌ها معمولا بر مبنای تمامی مجموعه آموزش به علت پیچیدگی‌های محاسباتی، یا یک نمونه داده به علت مشکل نویز، نیست. در عوض، گام به‌روزرسانی بر روی دسته‌های کوچک انجام می شود، که تعداد نمونه‌های داده در یک دسته یک ابرفراسنج است که میتوان آن را تنظیم کرد.
+</div>
+
+<br>
+
+
+**23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
+
+<div dir="rtl">
+تابع خطا ―  به منظور سنجش کارایی یک مدل مفروض، معمولا از تابع خطای L برای ارزیابی اینکه تا چه حد خروجی حقیقی y به شکل صحیح توسط خروجی z مدل پیش‌بینی شده‌اند، استفاده می‌شود. 
+</div>
+
+<br>
+
+
+**24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
+
+<div dir="rtl">
+خطای آنتروپی متقاطع – در مضمون دسته‌بندی دودویی در شبکه‌های عصبی، عموما از تابع خطای آنتروپی متقاطع L(z,y) استفاده و به صورت زیر تعریف میشود:
+</div>
+
+<br>
+
+
+**25. Finding optimal weights**
+
+<div dir="rtl">
+یافتن وزن‌های بهینه
+</div>
+
+<br>
+
+
+**26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
+
+<div dir="rtl">
+انتشار معکوس ―  انتشار معکوس روشی برای به‌روزرسانی وزن‌ها با توجه به خروجی واقعی و خروجی مورد انتظار در شبکه‌ی عصبی است. مشتق نسبت به هر وزن w توسط قاعده‌ی زنجیری محاسبه می‌شود.
+</div>
+
+<br>
+
+
+**27. Using this method, each weight is updated with the rule:**
+
+<div dir="rtl">
+با استفاده از این روش، هر وزن با قانون زیر به‌روزرسانی می‌شود:
+</div>
+
+<br>
+
+
+**28. Updating weights ― In a neural network, weights are updated as follows:**
+
+<div dir="rtl">
+به‌روزرسانی وزن‌ها – در یک شبکه‌ی عصبی، وزن‌ها به شکل زیر به‌روزرسانی می‌شوند:
+</div>
+
+<br>
+
+
+**29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
+
+<div dir="rtl">
+[گام 1: یک دسته از داده‌های آموزشی گرفته شده و با استفاده از انتشار مستقیم خطا محاسبه می‌شود، گام 2: با استفاده از انتشار معکوس مشتق خطا نسبت به هر وزن محاسبه می‌شود، گام 3: با استفاده از مشتقات، وزن‌های شبکه به‌روزرسانی می‌شوند.]
+</div>
+
+<br>
+
+
+**30. [Forward propagation, Backpropagation, Weights update]**
+
+<div dir="rtl">
+[انتشار مستقیم، انتشار معکوس، به‌روزرسانی وزنها]
+</div>
+
+<br>
+
+
+**31. Parameter tuning**
+
+<div dir="rtl">
+تنظیم فراسنج
+</div>
+
+<br>
+
+
+**32. Weights initialization**
+
+<div dir="rtl">
+مقداردهی اولیه‌ی وزن‌ها
+</div>
+
+<br>
+
+
+**33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
+
+<div dir="rtl">
+مقداردهی‌ اولیه Xavier ―  به‌جای مقداردهی اولیه‌ی وزن‌ها به شیوه‌ی کاملا تصادفی، مقداردهی اولیه Xavier  این امکان را فراهم می‌سازد تا وزن‌های اولیه‌ای داشته باشیم که ویژگی‌های منحصر به فرد معماری را به حساب می‌آورند.
+</div>
+
+<br>
+
+
+**34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
+
+<div dir="rtl">
+یادگیری انتقالی ― آموزش یک مدل یادگیری عمیق به داده‌های زیاد و مهم‌تر از آن به زمان زیادی احتیاج دارد. اغلب بهتر است که از وزن‌های پیش‌آموخته روی پایگاه داده‌های عظیم که آموزش بر روی آن‌ها روزها یا هفته‌ها طول می‌کشند استفاده کرد، و آن‌ها را برای موارد استفاده‌ی خود به کار برد. بسته به میزان داده‌هایی که در اختیار داریم، در زیر روش‌های مختلفی که می‌توان از آنها بهره جست آورده شده‌اند:
+</div>
+
+<br>
+
+
+**35. [Training size, Illustration, Explanation]**
+
+<div dir="rtl">
+[تعداد داده‌های آموزش، نگاره، توضیح]
+</div>
+
+<br>
+
+
+**36. [Small, Medium, Large]**
+
+<div dir="rtl">
+[کوچک، متوسط، بزرگ]
+</div>
+
+<br>
+
+
+**37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
+
+<div dir="rtl">
+[منجمد کردن تمامی لایه‌ها، آموزش وزن‌ها در بیشینه‌ی هموار، منجمد کردن اکثر لایه‌ها، آموزش وزن‌ها در لایه‌های آخر و بیشینه‌ی هموار، آموزش وزن‌ها در (تمامی) لایه‌ها و بیشینه‌ی هموار با مقداردهی‌اولیه‌ی وزن‌ها بر طبق مقادیر پیش‌آموخته]
+</div>
+
+<br>
+
+
+**38. Optimizing convergence**
+
+<div dir="rtl">
+بهینه‌سازی همگرایی
+</div>
+
+<br>
+
+
+**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.
+**
+
+<div dir="rtl">
+نرخ یادگیری – نرخ یادگیری اغلب با نماد α و گاهی اوقات با نماد η نمایش داده می‌شود و بیانگر سرعت (گام) به‌روزرسانی وزن‌ها است که می‌تواند مقداری ثابت داشته باشد یا به صورت سازگارشونده تغییر کند. محبوب‌ترین روش حال حاضر Adam نام دارد، روشی است که نرخ یادگیری را در حین فرآیند آموزش تنظیم می‌کند.
+</div>
+
+<br>
+
+
+**40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
+
+<div dir="rtl">
+نرخ‌های یادگیری سازگارشونده ― داشتن نرخ یادگیری متغیر در فرآیند آموزش یک مدل،  می‌تواند زمان آموزش را کاهش دهد و راه‌حل بهینه عددی را بهبود ببخشد. با آنکه بهینه‌ساز Adam محبوب‌ترین روش مورد استفاده است، دیگر روش‌ها نیز می‌توانند مفید باشند. این روش‌ها در جدول زیر به اختصار آمده‌اند:
+</div>
+
+<br>
+
+
+**41. [Method, Explanation, Update of w, Update of b]**
+
+<div dir="rtl">
+[روش، توضیح، به‌روزرسانی w، به‌روزرسانی  b]
+</div>
+
+<br>
+
+
+**42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
+
+<div dir="rtl">
+[تکانه، نوسانات را تعدیل می‌دهد، بهبود SGD، دو  فراسنج که نیاز به تنظیم دارند]
+</div>
+
+<br>
+
+
+**43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
+
+<div dir="rtl">
+[RMSprop، انتشار جذر میانگین مربعات، سرعت بخشیدن به الگوریتم یادگیری با کنترل نوسانات]
+</div>
+
+<br>
+
+
+**44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
+
+<div dir="rtl">
+[Adam، تخمین سازگارشونده ممان، محبوب‌ترین روش، چهار فراسنج که نیاز به تنظیم دارند]
+</div>
+
+<br>
+
+
+**45. Remark: other methods include Adadelta, Adagrad and SGD.**
+
+<div dir="rtl">
+نکته: سایر متدها  شامل Adadelta، Adagrad و SGD هستند.
+</div>
+
+<br>
+
+
+**46. Regularization**
+
+<div dir="rtl">
+نظام‌بخشی
+</div>
+
+<br>
+
+
+**47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
+
+<div dir="rtl">
+برون‌اندازی – برون‌اندازی روشی است که در شبکه‌های عصبی برای جلوگیری از بیش‌برارزش بر روی داده‌های آموزشی با حذف تصادفی نورون‌ها با احتمال p>0 استفاده می‌شود. این روش مدل را مجبور می‌کند تا از تکیه کردن بیش‌از‌حد بر روی مجموعه خاصی از ویژگی‌ها خودداری کند.
+</div>
+
+<br>
+
+
+**48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
+
+<div dir="rtl">
+نکته: بیشتر کتابخانه‌های یادگیری عمیق برون‌اندازی را با استفاده از فراسنج 'نگه‌داشتن' 1-p کنترل می‌کنند.
+</div>
+
+<br>
+
+
+**49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
+
+<div dir="rtl">
+نظام‌بخشی وزن – برای اطمینان از اینکه (مقادیر) وزن‌ها بیش‌ازحد بزرگ نیستند و مدل به مجموعه‌ی آموزش بیش‌برارزش نمی‌کند، روشهای نظام‌بخشی معمولا بر روی وزن‌های مدل اجرا می‌شوند. اصلی‌ترین آنها در جدول زیر به اختصار آمده‌اند:
+</div>
+
+<br>
+
+
+**50. [LASSO, Ridge, Elastic Net]**
+
+<div dir="rtl">
+[LASSO, Ridge, Elastic Net]
+</div>
+<br>
+
+**50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+<div dir="rtl">
+ضرایب را تا صفر کاهش می‌دهد، برای انتخاب متغیر مناسب است، ضرایب را کوچکتر می‌کند، بین انتخاب متغیر و ضرایب کوچک مصالحه می‌کند
+</div>
+
+<br>
+
+**51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
+
+<div dir="rtl">
+توقف زودهنگام ― این روش نظام‌بخشی، فرآیند آموزش را به محض اینکه خطای اعتبارسنجی ثابت می‌شود یا شروع به افزایش پیدا کند، متوقف می‌کند.
+</div>
+
+<br>
+
+
+**52. [Error, Validation, Training, early stopping, Epochs]**
+
+<div dir="rtl">
+[خطا، اعتبارسنجی، آموزش، توقف زودهنگام، تکرارها]
+</div>
+
+<br>
+
+
+**53. Good practices**
+
+<div dir="rtl">
+عادت‌های خوب
+</div>
+
+<br>
+
+
+**54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
+
+<div dir="rtl">
+بیش‌برارزش روی دسته‌ی ‌کوچک ―  هنگام اشکال‌زدایی یک مدل، اغلب مفید است که یک سری آزمایش‌های سریع برای اطمینان از اینکه هیچ مشکل عمده‌ای در معماری مدل وجود ندارد، انجام شود. به طورخاص، برای اطمینان از اینکه مدل می‌تواند به شکل صحیح آموزش ببیند، یک دسته‌ی‌ کوچک (از داده‌ها) به شبکه داده می‌شود تا دریابیم که مدل می‌تواند به آنها بیش‌برارزش کند. اگر نتواند، بدین معناست که مدل از پیچیدگی بالایی برخوردار است یا پیچیدگی کافی برای بیش‌برارزش شدن روی دسته‌ی‌ کوچک ندارد، چه برسد به یک مجموعه آموزشی با اندازه عادی.
+</div>
+
+<br>
+
+
+**55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
+
+<div dir="rtl">
+وارسی گرادیان – وارسی گرادیان روشی است که در طول پیاده‌سازی گذر روبه‌عقبِ یک شبکه‌ی عصبی استفاده می‌شود. این روش مقدار گرادیان تحلیلی را با گرادیان عددی در نقطه‌های مفروض مقایسه می‌کند و نقش بررسی درستی را ایفا میکند. 
+</div>
+
+<br>
+
+
+**56. [Type, Numerical gradient, Analytical gradient]**
+
+<div dir="rtl">
+[نوع، گرادیان عددی، گرادیان تحلیلی]
+</div>
+
+<br>
+
+
+**57. [Formula, Comments]**
+
+<div dir="rtl">
+[فرمول، توضیحات]
+</div>
+
+<br>
+
+
+**58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
+
+<div dir="rtl">
+[پرهزینه (از نظر محاسباتی)،  خطا باید دو بار برای هر بُعد محاسبه شود، برای تایید صحت پیاده‌سازی تحلیلی استفاده می‌شود، مصالحه در انتخاب h: نه بسیار کوچک (ناپایداری عددی) و نه خیلی بزرگ (تخمین گرادیان ضعیف) باشد]
+</div>
+
+<br>
+
+
+**59. ['Exact' result, Direct computation, Used in the final implementation]**
+
+<div dir="rtl">
+[نتیجه 'عینی'، محاسبه مستقیم، در پیاده‌سازی نهایی استفاده می‌شود]
+</div>
+
+<br>
+
+
+**60. The Deep Learning cheatsheets are now available in [target language].**
+
+<div dir="rtl">
+راهنمای یادگیری عمیق هم اکنون به زبان [فارسی] در دسترس است.
+</div>
+
+**61. Original authors**
+
+<div dir="rtl">
+نویسندگان اصلی
+</div>
+
+<br>
+
+**62.Translated by X, Y and Z**
+
+<div dir="rtl">
+ترجمه شده توسط X،Y و Z
+</div>
+
+<br>
+
+**63.Reviewed by X, Y and Z**
+
+<div dir="rtl">
+بازبینی شده توسط توسط X،Y و Z
+</div>
+
+<br>
+
+**64.View PDF version on GitHub**
+
+<div dir="rtl">
+نسخه پی‌دی‌اف را در گیت‌هاب ببینید
+</div>
+
+<br>
+
+**65.By X and Y**
+
+<div dir="rtl">
+توسط X و Y
+</div>
+
+<br>
diff --git a/fa/cs-230-recurrent-neural-networks.md b/fa/cs-230-recurrent-neural-networks.md
new file mode 100644
index 000000000..22a1e2106
--- /dev/null
+++ b/fa/cs-230-recurrent-neural-networks.md
@@ -0,0 +1,868 @@
+**Recurrent Neural Networks translation**
+
+<br>
+
+**1. Recurrent Neural Networks cheatsheet**
+
+<div dir="rtl">
+راهنمای کوتاه شبکه‌های عصبی برگشتی 
+</div>
+ 
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+<div dir="rtl">
+کلاس CS 230 - یادگیری عمیق
+</div>
+
+<br>
+
+
+**3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
+
+<div dir="rtl">
+[نمای کلی، ساختار معماری، کاربردهایRNN  ها، تابع خطا، انتشار معکوس]
+</div>
+
+<br>
+
+
+**4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
+
+<div dir="rtl">
+[کنترل وابستگی‌های بلندمدت، توابع فعال‌سازی رایج، مشتق صفرشونده/منفجرشونده، برش گرادیان، GRU/LSTM، انواع دروازه، RNN دوسویه، RNN عمیق]
+</div>
+
+<br>
+
+
+**5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
+
+<div dir="rtl">
+[یادگیری بازنمائی کلمه، نمادها، ماتریس تعبیه، Word2vec،skip-gram، نمونه‌برداری منفی، GloVe]
+</div>
+
+<br>
+
+
+**6. [Comparing words, Cosine similarity, t-SNE]**
+
+<div dir="rtl">
+[مقایسه‌ی کلمات، شباهت کسینوسی، t-SNE]
+</div>
+
+<br>
+
+
+**7. [Language model, n-gram, Perplexity]**
+
+<div dir="rtl">
+[مدل زبانی،ان‌گرام، سرگشتگی]
+</div>
+
+<br>
+
+
+**8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
+
+<div dir="rtl">
+[ترجمه‌ی ماشینی، جستجوی پرتو، نرمال‌سازی طول، تحلیل خطا، امتیاز Bleu]
+</div>
+
+<br>
+
+
+**9. [Attention, Attention model, Attention weights]**
+
+<div dir="rtl">
+[ژرف‌نگری، مدل ژرف‌نگری، وزن‌های ژرف‌نگری]
+</div>
+
+<br>
+
+
+**10. Overview**
+
+<div dir="rtl">
+نمای کلی
+</div>
+
+<br>
+
+
+**11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
+
+<div dir="rtl">
+معماری RNN سنتی ــ شبکه‌های عصبی برگشتی که همچنین با عنوان RNN شناخته می‌شوند، دسته‌ای از شبکه‌های عصبی‌اند که این امکان را می‌دهند خروجی‌های قبلی به‌عنوان ورودی استفاده شوند و در عین حال حالت‌های نهان داشته باشند. این شبکه‌ها به‌طور معمول عبارت‌اند از:</div>
+
+<br>
+
+
+**12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
+
+<div dir="rtl">
+به‌ازای هر گام زمانی t، فعال‌سازی a<t> و خروجی y<t> به‌صورت زیر بیان می‌شود:
+ </div>
+
+<br>
+
+
+**13. and**
+
+<div dir="rtl">
+و
+</div>
+
+<br>
+
+
+**14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
+
+<div dir="rtl">
+که در آن Wax,Waa,Wya,ba,by ضرایبی‌اند که در راستای زمان به ‌اشتراک گذاشته می‌شوند و g1، g2 توابع فعال‌سازی‌ هستند.
+</div>
+
+<br>
+
+
+**15. The pros and cons of a typical RNN architecture are summed up in the table below:**
+
+<div dir="rtl">
+مزایا و معایب معماری RNN به‌صورت خلاصه در جدول زیر آورده شده‌اند:
+</div>
+
+<br>
+
+
+**16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
+
+<div dir="rtl">
+مزایا، امکان پردازش ورودی با هر طولی، اندازه‌ی مدل مطابق با اندازه‌ی ورودی افزایش نمی‌یابد، اطلاعات (زمان‌های) گذشته در محاسبه در نظر گرفته می‌شود، وزن‌ها در طول زمان به‌ اشتراک گذاشته می‌شوند]
+</div>
+
+<br>
+
+
+**17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
+
+<div dir="rtl">
+[معایب، محاسبه کند می‌شود، دشوار بودن دسترسی به اطلاعات مدت‌ها پیش، در نظر نگرفتن ورودی‌های بعدی در وضعیت جاری]
+</div>
+
+<br>
+
+
+**18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
+
+<div dir="rtl">
+کاربردهایRNN  ها ــ مدل‌های RNN غالباً در حوزه‌ی پردازش زبان طبیعی و حوزه‌ی بازشناسایی گفتار به کار می‌روند. کاربردهای مختلف آنها به صورت خلاصه در جدول زیر آورده شده‌اند:
+</div>
+
+<br>
+
+
+**19. [Type of RNN, Illustration, Example]**
+
+<div dir="rtl">
+[نوع RNN، نگاره، مثال]
+</div>
+
+<br>
+
+
+**20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
+
+<div dir="rtl">
+[یک به یک، یک به چند، چند به یک، چند به چند]
+</div>
+
+<br>
+
+
+**21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
+
+<div dir="rtl">
+[شبکه‌ی عصبی سنتی، تولید موسیقی، دسته‌بندی حالت احساسی، بازشناسایی موجودیت اسمی، ترجمه ماشینی]
+</div>
+
+<br>
+
+
+**22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
+
+<div dir="rtl">
+تابع خطا ــ در شبکه عصبی برگشتی، تابع خطا L برای همه‌ی گام‌های زمانی براساس خطا در هر گام به صورت زیر محاسبه می‌شود:
+</div>
+
+<br>
+
+
+**23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
+
+<div dir="rtl">
+انتشار معکوس در طول زمان ـــ انتشار معکوس در هر نقطه از زمان انجام می‌شود. در گام زمانی T، مشتق خطا L با توجه به ماتریس وزن W به‌صورت زیر بیان می‌شود:
+</div>
+
+<br>
+
+
+**24. Handling long term dependencies**
+
+<div dir="rtl">
+کنترل وابستگی‌های بلندمدت
+</div>
+
+<br>
+
+
+**25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
+
+<div dir="rtl">
+توابع فعال‌سازی پرکاربرد ـــ رایج‌ترین توابع فعال‌سازی به‌کاررفته در ماژول‌های RNN به شرح زیر است:
+</div>
+
+<br>
+
+
+**26. [Sigmoid, Tanh, RELU]**
+
+<div dir="rtl">
+[سیگموید، تانژانت هذلولوی، یکسو ساز]
+</div>
+
+<br>
+
+
+**27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
+
+<div dir="rtl">
+مشتق صفرشونده/منفجرشونده ــ  پدیده مشتق صفرشونده و منفجرشونده غالبا در بستر RNNها رخ می‌دهند. علت چنین رخدادی این است که به دلیل گرادیان ضربی، که می‌تواند با توجه به تعداد لایه‌ها به صورت نمایی کاهش/افزایش می‌یابد، به‌دست آوردن وابستگی‌های بلندمدت سخت است.
+</div>
+
+
+<br>
+
+
+**28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
+
+<div dir="rtl">
+برش گرادیان ــ یک روش برای مقابله با انفجار گرادیان است که گاهی اوقات هنگام انتشار معکوس رخ می‌دهد. با تعیین حداکثر مقدار برای گرادیان، این پدیده در عمل کنترل می‌شود.
+</div>
+
+<br>
+
+
+**29. clipped**
+
+<div dir="rtl">
+برش ‌داده‌شده
+</div>
+
+<br>
+
+
+**30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
+
+<div dir="rtl">
+انواع دروازه ـــ برای حل مشکل مشتق صفرشونده/منفجرشونده، در برخی از انواع RNN ها، دروازه‌های خاصی استفاده می‌شود و این دروازه‌ها عموما هدف معینی دارند. این  دروازه‌ها عموما با نمادΓ  نمایش داده می‌شوند و برابرند با:
+</div>
+
+<br>
+
+
+**31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
+
+<div dir="rtl">
+که W,U,b ضرایب خاص دروازه و σ تابع سیگموید است. دروازه‌های اصلی به صورت خلاصه در جدول زیر آورده شده‌اند:
+</div>
+
+<br>
+
+
+**32. [Type of gate, Role, Used in]**
+
+<div dir="rtl">
+[نوع دروازه، نقش، به‌کار رفته در]
+</div>
+
+<br>
+
+
+**33. [Update gate, Relevance gate, Forget gate, Output gate]**
+
+<div dir="rtl">
+33. [دروازه‌ی به‌روزرسانی، دروازه‌ی ربط(میزان اهمیت)، دروازه‌ی فراموشی، دروازه‌ی خروجی]
+</div>
+
+<br>
+
+
+**34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
+
+<div dir="rtl">
+34. [چه میزان از گذشته اکنون اهمیت دارد؟ اطلاعات گذشته رها شوند؟ سلول حذف شود یا خیر؟ چه میزان از (محتوای) سلول آشکار شود؟]
+</div>
+
+<br>
+
+
+**35. [LSTM, GRU]**
+
+<div dir="rtl">
+[LSTM، GRU]
+</div>
+
+<br>
+
+
+**36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
+
+<div dir="rtl">
+GRU/LSTM ـــ واحد برگشتی دروازه‌دار (GRU) و واحدهای حافظه‌ی کوتاه‌-مدت طولانی (LSTM) مشکل مشتق صفرشونده که در RNNهای سنتی رخ می‌دهد، را بر طرف می‌کنند، درحالی‌که LSTM شکل عمومی‌تر  GRU است. در جدول زیر، معادله‌های توصیف‌کنندهٔ هر معماری به صورت خلاصه آورده شده‌اند:
+</div>
+
+<br>
+
+
+**37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
+
+<div dir="rtl">
+37. [توصیف، واحد برگشتی دروازه‌دار (GRU)، حافظه‌ی کوتاه-مدت طولانی (LSTM)، وابستگی‌ها]
+</div>
+<br>
+
+
+**38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
+
+<div dir="rtl">
+نکته: نشانه‌ی * نمایان‌گر ضرب عنصربه‌عنصر دو بردار است.
+</div>
+
+<br>
+
+
+**39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
+
+<div dir="rtl">
+انواع RNN ها ــ جدول زیر سایر معماری‌های پرکاربرد RNN را به صورت خلاصه نشان می‌دهد.
+</div>
+
+<br>
+
+
+**40. [Bidirectional (BRNN), Deep (DRNN)]**
+
+<div dir="rtl">
+[دوسویه  (BRNN)، عمیق (DRNN)]
+</div>
+
+<br>
+
+
+**41. Learning word representation**
+
+<div dir="rtl">
+یادگیری بازنمائی کلمه
+</div>
+
+<br>
+
+
+**42. In this section, we note V the vocabulary and |V| its size.**
+
+<div dir="rtl">
+در این بخش، برای اشاره به واژگان از V و برای اشاره به اندازه‌ی آن از |V| استفاده می‌کنیم.
+</div>
+
+<br>
+
+
+**43. Motivation and notations**
+
+<div dir="rtl">
+انگیزه و نمادها
+</div>
+
+<br>
+
+
+**44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
+
+<div dir="rtl">
+روش‌های بازنمائی ― دو روش اصلی برای بازنمائی کلمات به صورت خلاصه در جدول زیر آورده شده‌اند:
+</div>
+
+<br>
+
+
+**45. [1-hot representation, Word embedding]**
+
+<div dir="rtl">
+[بازنمائی تک‌فعال، تعبیه‌ی کلمه]
+</div>
+
+<br>
+
+
+**46. [teddy bear, book, soft]**
+
+<div dir="rtl">
+[خرس تدی، کتاب، نرم]
+</div>
+
+<br>
+
+
+**47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
+
+<div dir="rtl">
+[نشان داده شده با نماد ow، رویکرد ساده، فاقد اطلاعات تشابه، نشان داده شده با نماد ew، به‌حساب‌آوردن تشابه کلمات]
+</div>
+
+<br>
+
+
+**48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
+
+<div dir="rtl">
+ماتریس تعبیه ـــ به‌ ازای کلمه‌ی مفروض w ، ماتریس تعبیه E ماتریسی است که بازنمائی تک‌فعال  ow را به نمایش تعبیه‌ی ew نگاشت می‌دهد:
+</div>
+
+<br>
+
+
+**49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
+
+<div dir="rtl">
+نکته: یادگیری ماتریس تعبیه را می‌توان با استفاده از مدل‌های درست‌نمایی هدف/متن(زمینه) انجام داد.
+</div>
+
+<br>
+
+
+**50. Word embeddings**
+
+<div dir="rtl">
+(نمایش) تعبیه‌ی کلمه
+</div>
+
+<br>
+
+
+**51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
+
+<div dir="rtl">
+Word2vec ― Word2vec چهارچوبی است که با محاسبه‌ی احتمال قرار گرفتن یک کلمه‌ی خاص در میان سایر کلمات، تعبیه‌های کلمه را یاد می‌گیرد. مدل‌های متداول شامل Skip-gram، نمونه‌برداری منفی و CBOW هستند.
+</div>
+
+<br>
+
+
+**52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
+
+<div dir="rtl">
+[یک خرس تدی بامزه در حال مطالعه است، خرس تدی، نرم، شعر فارسی، هنر]
+</div>
+
+<br>
+
+
+**53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
+
+<div dir="rtl">
+[آموزش شبکه بر روی مسئله‌ی جایگزین، استخراج بازنمائی سطح بالا، محاسبه‌ی نمایش تعبیه‌ی کلمات]
+</div>
+
+<br>
+
+
+**54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
+
+<div dir="rtl">
+Skip-gram ــ مدل اسکیپ‌گرام word2vec یک وظیفه‌ی یادگیری بانظارت است که تعبیه‌های کلمه را با ارزیابی احتمال وقوع کلمه‌ی t هدف با کلمه‌ی زمینه c یاد می‌گیرد. با توجه به اینکه نماد θt پارامتری مرتبط با t است، احتمال P(t|c) به‌صورت زیر به‌دست می‌آید:
+</div>
+
+<br>
+
+
+**55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
+
+<div dir="rtl">
+نکته: جمع کل واژگان در بخش مقسوم‌الیه بیشینه‌ی‌هموار باعث می‌شود که این مدل از لحاظ محاسباتی گران شود. مدل CBOW مدل word2vec دیگری ست که از کلمات اطراف برای پیش‌بینی یک کلمهٔ مفروض استفاده می‌کند.
+</div>
+
+<br>
+
+
+**56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
+
+<div dir="rtl">
+نمونه‌گیری منفی ― مجموعه‌ای از دسته‌بندی‌های دودویی با استفاده از رگرسیون لجستیک است که مقصودش ارزیابی احتمال ظهور همزمان کلمه‌ی مفروض هدف و کلمه‌ی مفروض زمینه است، که در اینجا مدل‌ها براساس مجموعه k مثال منفی و 1 مثال مثبت آموزش می‌بینند. با توجه به کلمه‌ی مفروض زمینه c و کلمه‌ی مفروض هدف t، پیش‌بینی به صورت زیر بیان می‌شود:
+</div>
+
+<br>
+
+
+**57. Remark: this method is less computationally expensive than the skip-gram model.**
+
+<div dir="rtl">
+نکته: این روش از لحاظ محاسباتی ارزان‌تر از مدل skip-gram است.
+</div>
+
+<br>
+
+
+**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
+
+<div dir="rtl">
+GloVe ― مدل GloVe، مخفف بردارهای سراسری بازنمائی کلمه، یکی از روش‌های تعبیه کلمه است که از ماتریس هم‌رویدادی X استفاده می‌کند که در آن هر Xi,j به تعداد دفعاتی اشاره دارد که هدف i با زمینهٔ j رخ می‌دهد. تابع هزینه‌ی J به‌صورت زیر است:
+</div>
+
+<br>
+
+
+**58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
+Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
+
+<div dir="rtl">
+که در آن f تابع وزن‌دهی است، به‌طوری که Xi,j=0⟹f(Xi,j)=0. با توجه به تقارنی که e و θ در این مدل دارند، نمایش تعبیه‌ی نهایی کلمه‌ e(final)w به صورت زیر محاسبه می‌شود:
+</div>
+
+<br>
+
+
+**59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
+
+<div dir="rtl">
+تذکر: مولفه‌های مجزا در نمایش تعبیه‌ی یادگرفته‌شده‌ی کلمه الزاما قابل تفسیر نیستند.
+</div>
+
+<br>
+
+
+**60. Comparing words**
+
+<div dir="rtl">
+مقایسه‌ی کلمات
+</div>
+
+<br>
+
+
+**61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
+
+<div dir="rtl">
+شباهت کسینوسی - شباهت کسینوسی بین کلمات w1 و w2 به ‌صورت زیر بیان می‌شود:
+</div>
+
+<br>
+
+
+**62. Remark: θ is the angle between words w1 and w2.**
+
+<div dir="rtl">
+نکته: θ زاویهٔ بین کلمات w1 و w2 است.
+</div>
+
+<br>
+
+
+**63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
+
+<div dir="rtl">
+t-SNE ― t-SNE (نمایش تعبیه‌ی همسایه‌ی تصادفی توزیع‌شده توسط توزیع t) روشی است که هدف آن کاهش تعبیه‌های ابعاد بالا به فضایی با ابعاد پایین‌تر است. این روش در تصویرسازی بردارهای کلمه در فضای 2 بعدی کاربرد فراوانی دارد.
+</div>
+
+<br>
+
+
+**64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
+
+<div dir="rtl">
+[ادبیات، هنر، کتاب، فرهنگ، شعر، دانش، مفرح، دوست‌داشتنی، دوران کودکی، مهربان، خرس تدی، نرم، آغوش، بامزه، ناز]
+</div>
+
+<br>
+
+
+**65. Language model**
+
+<div dir="rtl">
+مدل زبانی
+</div>
+
+<br>
+
+
+**66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
+
+<div dir="rtl">
+نمای کلی ـــ هدف مدل زبان تخمین احتمال جمله‌ی P(y) است.
+</div>
+
+<br>
+
+
+**67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
+
+<div dir="rtl">
+مدل  ان‌گرام ــ این مدل یک رویکرد ساده با هدف اندازه‌گیری احتمال نمایش یک عبارت در یک نوشته است که با دفعات تکرار آن در داده‌های آموزشی محاسبه می‌شود.
+</div>
+
+<br>
+
+
+**68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
+
+<div dir="rtl">
+سرگشتگی ـــ مدل‌های زبانی معمولاً با معیار سرگشتی، که با PP هم نمایش داده می‌شود، سنجیده می‌شوند، که مقدار آن معکوس احتمال یک مجموعه‌ داده است که تقسیم بر تعداد کلمات T می‌شود. هر چه سرگشتگی کمتر باشد بهتر است و به صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**69. Remark: PP is commonly used in t-SNE.**
+
+<div dir="rtl">
+نکته: PP عموما در t-SNE کاربرد دارد.
+</div>
+
+<br>
+
+
+**70. Machine translation**
+
+<div dir="rtl">
+ترجمه ماشینی
+</div>
+
+<br>
+
+
+**71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
+
+<div dir="rtl">
+نمای کلی ― مدل ترجمه‌ی ماشینی مشابه مدل زبانی است با این تفاوت که یک شبکه‌ی رمزنگار قبل از آن قرار گرفته است. به همین دلیل، گاهی اوقات به آن مدل زبان شرطی می‌گویند. هدف آن یافتن جمله y است بطوری که:
+</div>
+
+<br>
+
+
+**72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
+
+<div dir="rtl">
+جستجوی پرتو ― یک الگوریتم جستجوی اکتشافی است که در ترجمه‌ی ماشینی و بازتشخیص گفتار برای یافتن محتمل‌ترین جمله‌ی y باتوجه به ورودی مفروض x بکار برده می‌شود.
+</div>
+
+<br>
+
+
+**73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
+
+<div dir="rtl">
+[گام 1: یافتن B کلمه‌ی محتمل برتر y<1>، گام 2: محاسبه احتمالات شرطی y|x,y<1>,...,y<k−1>، گام 3: نگه‌داشتن B ترکیب برتر x,y<1>,…,y، خاتمه فرآیند با کلمه‌ی توقف]
+ </div>
+
+<br>
+
+
+**74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
+
+<div dir="rtl">
+نکته: اگر پهنای پرتو 1 باشد، آنگاه با جست‌وجوی حریصانهٔ ساده برابر خواهد بود.
+</div>
+
+<br>
+
+
+**75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
+
+<div dir="rtl">
+پهنای پرتو ـــ پهنای پرتوی B پارامتری برای جستجوی پرتو است. مقادیر بزرگ B به نتیجه بهتر منتهی می‌شوند اما عملکرد آهسته‌تری دارند و حافظه را افزایش می‌دهند. مقادیر کوچک B به نتایج بدتر منتهی می‌شوند اما بار محاسباتی پایین‌تری دارند. مقدار استاندارد B حدود 10 است.
+</div>
+
+<br>
+
+
+**76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
+
+<div dir="rtl">
+نرمال‌سازی طول ―‌ برای بهبود ثبات عددی، جستجوی پرتو معمولا با تابع هدف نرمال‌شده‌ی زیر اعمال می‌شود، که اغلب اوقات هدف درست‌نمایی لگاریتمی نرمال‌شده نامیده می‌شود و به‌صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
+
+<div dir="rtl">
+تذکر: پارامتر α را می‌توان تعدیل‌کننده نامید و مقدارش معمولا بین 0.5 و 1 است.
+</div>
+
+<br>
+
+
+**78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
+
+<div dir="rtl">
+تحلیل خطا ―زمانی‌که ترجمه‌ی پیش‌بینی‌شده‌ی ^y ی به‌دست می‌آید که مطلوب نیست، می‌توان با انجام تحلیل خطای زیر از خود پرسید که چرا ترجمه y* خوب نیست:
+</div>
+
+<br>
+
+
+**79. [Case, Root cause, Remedies]**
+
+<div dir="rtl">
+[قضیه، ریشه‌ی مشکل، راه‌حل]
+</div>
+
+<br>
+
+
+**80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
+
+<div dir="rtl">
+[جستجوی پرتوی معیوب، RNN معیوب، افزایش پهنای پرتو، امتحان معماری‌های مختلف، استفاده از تنظیم‌کننده، جمع‌آوری داده‌های بیشتر]</div>
+
+<br>
+
+
+**81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
+
+<div dir="rtl">
+امتیاز Bleu ― جایگزین ارزشیابی دوزبانه  (bleu) میزان خوب بودن ترجمه ماشینی را با محاسبه‌ی امتیاز تشابه برمبنای دقت ان‌گرام اندازه‌گیری می‌کند. (این امتیاز) به صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**82. where pn is the bleu score on n-gram only defined as follows:**
+
+<div dir="rtl">
+که pn امتیاز bleu تنها براساس ان‌گرام است و به صورت زیر تعریف می‌شود:
+</div>
+
+<br>
+
+
+**83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
+
+<div dir="rtl">
+تذکر: ممکن است برای پیشگیری از امتیاز اغراق آمیز تصنعیbleu ، برای ترجمه‌های پیش‌بینی‌شده‌ی کوتاه از جریمه اختصار استفاده شود.</div>
+
+<br>
+
+
+**84. Attention**
+
+<div dir="rtl">
+ژرف‌نگری
+</div>
+
+<br>
+
+
+**85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
+
+<div dir="rtl">
+مدل ژرف‌نگری ― این مدل به RNN این امکان را می‌دهد که به بخش‌های خاصی از ورودی که حائز اهمیت هستند توجه نشان دهد که در عمل باعث بهبود عملکرد مدل حاصل‌شده خواهد شد. اگر α<t,t′> به معنای مقدار توجهی باشد که خروجی y باید به فعال‌سازی a<t′>  داشته باشد و c نشان‌دهنده‌ی زمینه (متن) در زمان t باشد، داریم:
+ </div>
+
+<br>
+
+
+**86. with**
+
+<div dir="rtl">
+با
+</div>
+
+<br>
+
+
+**87. Remark: the attention scores are commonly used in image captioning and machine translation.**
+
+<div dir="rtl">
+نکته: امتیازات ژرف‌نگری عموما در عنوان‌سازی متنی برای تصویر (image captioning) و ترجمه ماشینی کاربرد دارد.
+</div>
+
+<br>
+
+
+**88. A cute teddy bear is reading Persian literature.**
+
+<div dir="rtl">
+یک خرس تدی بامزه در حال خواندن ادبیات فارسی است.
+</div>
+
+<br>
+
+
+**89. Attention weight ― The amount of attention that the output y<t> should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
+
+<div dir="rtl">
+وزن ژرف‌نگری ― مقدار توجهی که خروجی y باید به فعال‌سازی a<t′> داشته باشد به‌وسیله‌ی α<t,t′> به‌دست می‌آید که به‌صورت زیر محاسبه می‌شود:
+</div>
+
+<br>
+
+
+**90. Remark: computation complexity is quadratic with respect to Tx.**
+
+<div dir="rtl">
+نکته: پیچیدگی محاسباتی به نسبت Tx از نوع درجه‌ی دوم است.
+</div>
+
+<br>
+
+
+**91. The Deep Learning cheatsheets are now available in [target language].**
+
+<div dir="rtl">
+راهنمای یادگیری عمیق هم اکنون به زبان [فارسی] در دسترس است.
+</div>
+
+<br>
+
+**92. Original authors**
+
+<div dir="rtl">
+نویسندگان اصلی
+</div>
+
+<br>
+
+**93. Translated by X, Y and Z**
+
+<div dir="rtl">
+ترجمه شده توسط X،Y و Z
+</div>
+
+<br>
+
+**94. Reviewed by X, Y and Z**
+
+<div dir="rtl">
+بازبینی شده توسط توسط X،Y و Z
+</div>
+
+<br>
+
+**95. View PDF version on GitHub**
+
+<div dir="rtl">
+نسخه پی‌دی‌اف را در گیت‌هاب ببینید
+</div>
+
+<br>
+
+**96. By X and Y**
+
+<div dir="rtl">
+توسط X و Y
+</div>
+
+<br>
diff --git a/fr/cs-221-logic-models.md b/fr/cs-221-logic-models.md
new file mode 100644
index 000000000..aa03a9b9a
--- /dev/null
+++ b/fr/cs-221-logic-models.md
@@ -0,0 +1,462 @@
+**Logic-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-logic-models)
+
+<br>
+
+**1. Logic-based models with propositional and first-order logic**
+
+&#10230; Modèles basés sur la logique : logique propositionnelle et calcul des prédicats du premier ordre
+
+<br>
+
+
+**2. Basics**
+
+&#10230; Bases
+
+<br>
+
+
+**3. Syntax of propositional logic ― By noting f,g formulas, and ¬,∧,∨,→,↔ connectives, we can write the following logical expressions:**
+
+&#10230; Syntaxe de la logique propositionnelle - En notant f et g formules et ¬,∧,∨,→,↔ opérateurs, on peut écrire les expressions logiques suivantes :
+
+<br>
+
+
+**4. [Name, Symbol, Meaning, Illustration]**
+
+&#10230; [Nom, Symbole, Signification, Illustration]
+
+<br>
+
+
+**5. [Affirmation, Negation, Conjunction, Disjunction, Implication, Biconditional]**
+
+&#10230; [Affirmation, Négation, Conjonction, Disjonction, Implication, Biconditionnel]
+
+<br>
+
+
+**6. [not f, f and g, f or g, if f then g, f, that is to say g]**
+
+&#10230; [non f, f et g, f ou g, si f alors g, f, c'est à dire g]
+
+<br>
+
+
+**7. Remark: formulas can be built up recursively out of these connectives.**
+
+&#10230; Remarque : n'importe quelle formule peut être construite de manière récursive à partir de ces opérateurs.
+
+<br>
+
+
+**8. Model ― A model w denotes an assignment of binary weights to propositional symbols.**
+
+&#10230; [Modèle - Un modèle w dénote une combinaison de valeurs binaires liées à des symboles propositionnels]
+
+<br>
+
+
+**9. Example: the set of truth values w={A:0,B:1,C:0} is one possible model to the propositional symbols A, B and C.**
+
+&#10230; Exemple : l'ensemble de valeurs de vérité w={A:0,B:1,C:0} est un modèle possible pour les symboles propositionnels A, B et C.
+
+<br>
+
+
+**10. Interpretation function ― The interpretation function I(f,w) outputs whether model w satisfies formula f:**
+
+&#10230; Interprétation - L'interprétation I(f,w) nous renseigne si le modèle w satisfait la formule f :
+
+<br>
+
+
+**11. Set of models ― M(f) denotes the set of models w that satisfy formula f. Mathematically speaking, we define it as follows:**
+
+&#10230; Ensemble de modèles - M(f) dénote l'ensemble des modèles w qui satisfont la formule f. Sa définition mathématique est donnée par :
+
+<br>
+
+
+**12. Knowledge base**
+
+&#10230; Base de connaissance
+
+<br>
+
+
+**13. Definition ― The knowledge base KB is the conjunction of all formulas that have been considered so far. The set of models of the knowledge base is the intersection of the set of models that satisfy each formula. In other words:**
+
+&#10230; Définition - La base de connaissance KB est la conjonction de toutes les formules considérées jusqu'à présent. L'ensemble des modèles de la base de connaissance est l'intersection de l'ensemble des modèles satisfaisant chaque formule. En d'autres termes :
+
+<br>
+
+
+**14. Probabilistic interpretation ― The probability that query f is evaluated to 1 can be seen as the proportion of models w of the knowledge base KB that satisfy f, i.e.:**
+
+&#10230; Interprétation en termes de probabilités - La probabilité que la requête f soit évaluée à 1 peut être vue comme la proportion des modèles w de la base de connaissance KB qui satisfait f, i.e. :
+
+<br>
+
+
+**15. Satisfiability ― The knowledge base KB is said to be satisfiable if at least one model w satisfies all its constraints. In other words:**
+
+&#10230; Satisfaisabilité - La base de connaissance KB est dite satisfaisable si au moins un modèle w satisfait toutes ses contraintes. En d'autres termes :
+
+<br>
+
+
+**16. satisfiable**
+
+&#10230; satisfaisable
+
+<br>
+
+
+**17. Remark: M(KB) denotes the set of models compatible with all the constraints of the knowledge base.**
+
+&#10230; Remarque : M(KB) dénote l'ensemble des modèles compatibles avec toutes les contraintes de la base de connaissance.
+
+<br>
+
+
+**18. Relation between formulas and knowledge base - We define the following properties between the knowledge base KB and a new formula f:**
+
+&#10230; Relation entre formules et base de connaissance - On définit les propriétés suivantes entre la base de connaissance KB et une nouvelle formule f :
+
+<br>
+
+
+**19. [Name, Mathematical formulation, Illustration, Notes]**
+
+&#10230; [Nom, Formulation mathématique, Illustration, Notes]
+
+<br>
+
+
+**20. [KB entails f, KB contradicts f, f contingent to KB]**
+
+&#10230; [KB déduit f, KB contredit f, f est contingent à KB]
+
+<br>
+
+
+**21. [f does not bring any new information, Also written KB⊨f, No model satisfies the constraints after adding f, Equivalent to KB⊨¬f, f does not contradict KB, f adds a non-trivial amount of information to KB]**
+
+&#10230; [f n'apporte aucune nouvelle information, Aussi écrit KB⊨f, Aucun modèle ne satisfait les contraintes après l'ajout de f, Équivalent à KB⊨¬f, f ne contredit pas KB, f ajoute une quantité non-triviale d'information à KB]
+
+<br>
+
+
+**22. Model checking ― A model checking algorithm takes as input a knowledge base KB and outputs whether it is satisfiable or not.**
+
+&#10230; Vérification de modèles - Un algorithme de vérification de modèles (model checking en anglais) prend comme argument une base de connaissance KB et nous renseigne si celle-ci est satisfaisable ou pas.
+
+<br>
+
+
+**23. Remark: popular model checking algorithms include DPLL and WalkSat.**
+
+&#10230; Remarque : DPLL et WalkSat sont des exemples populaires d'algorithmes de vérification de modèles.
+
+<br>
+
+
+**24. Inference rule ― An inference rule of premises f1,...,fk and conclusion g is written:**
+
+&#10230;  Règle d'inférence - Une règle d'inférence de prémisses f1,...,fk et de conclusion g s'écrit :
+
+<br>
+
+
+**25. Forward inference algorithm ― From a set of inference rules Rules, this algorithm goes through all possible f1,...,fk and adds g to the knowledge base KB if a matching rule exists. This process is repeated until no more additions can be made to KB.**
+
+&#10230; Algorithme de chaînage avant - Partant d'un ensemble de règles d'inférence Rules, l'algorithme de chaînage avant (en anglais forward inference algorithm) parcourt tous les f1,...,fk et ajoute g à la base de connaissance KB si une règle parvient à une telle conclusion. Cette démarche est répétée jusqu'à ce qu'aucun autre ajout ne puisse être fait à KB.
+
+<br>
+
+
+**26. Derivation ― We say that KB derives f (written KB⊢f) with rules Rules if f already is in KB or gets added during the forward inference algorithm using the set of rules Rules.**
+
+&#10230; Dérivation - On dit que KB dérive f (noté KB⊢f) par le biais des règles Rules soit si f est déjà dans KB ou si elle se fait ajouter pendant l'application du chaînage avant utilisant les règles Rules.
+
+<br>
+
+
+**27. Properties of inference rules ― A set of inference rules Rules can have the following properties:**
+
+&#10230; Propriétés des règles d'inférence - Un ensemble de règles d'inférence Rules peut avoir les propriétés suivantes :
+
+<br>
+
+
+**28. [Name, Mathematical formulation, Notes]**
+
+&#10230; [Nom, Formulation mathématique, Notes]
+
+<br>
+
+
+**29. [Soundness, Completeness]**
+
+&#10230; [Validité, Complétude]
+
+<br>
+
+
+**30. [Inferred formulas are entailed by KB, Can be checked one rule at a time, "Nothing but the truth", Formulas entailing KB are either already in the knowledge base or inferred from it, "The whole truth"]**
+
+&#10230; [Les formules inférées sont déduites par KB, Peut être vérifiée une règle à la fois, "Rien que la vérité", Les formules déduites par KB sont soit déjà dans la base de connaissance, soit inférées de celle-ci, "La vérité dans sa totalité"]
+
+<br>
+
+
+**31. Propositional logic**
+
+&#10230; Logique propositionnelle
+
+<br>
+
+
+**32. In this section, we will go through logic-based models that use logical formulas and inference rules. The idea here is to balance expressivity and computational efficiency.**
+
+&#10230; Dans cette section, nous allons parcourir les modèles logiques utilisant des formules logiques et des règles d'inférence. L'idée est de trouver le juste milieu entre expressivité et efficacité.
+
+<br>
+
+
+**33. Horn clause ― By noting p1,...,pk and q propositional symbols, a Horn clause has the form:**
+
+&#10230; Clause de Horn - En notant p1,...,pk et q des symboles propositionnels, une clause de Horn s'écrit :
+
+<br>
+
+
+**34. Remark: when q=false, it is called a "goal clause", otherwise we denote it as a "definite clause".**
+
+&#10230; Remarque : quand q=false, cette clause de Horn est "négative", autrement elle est appelée "stricte".
+
+<br>
+
+
+**35. Modus ponens ― For propositional symbols f1,...,fk and p, the modus ponens rule is written:**
+
+&#10230; Modus ponens - Sur les symboles propositionnels f1,...,fk et p, la règle de modus ponens est écrite :
+
+<br>
+
+
+**36. Remark: it takes linear time to apply this rule, as each application generate a clause that contains a single propositional symbol.**
+
+&#10230; Remarque : l'application de cette règle se fait en temps linéaire, puisque chaque exécution génère une clause contenant un symbole propositionnel.
+
+<br>
+
+
+**37. Completeness ― Modus ponens is complete with respect to Horn clauses if we suppose that KB contains only Horn clauses and p is an entailed propositional symbol. Applying modus ponens will then derive p.**
+
+&#10230; Complétude - Modus ponens est complet lorsqu'on le munit des clauses de Horn si l'on suppose que KB contient uniquement des clauses de Horn et que p est un symbole propositionnel qui est déduit. L'application de modus ponens dérivera alors p.
+
+<br>
+
+
+**38. Conjunctive normal form ― A conjunctive normal form (CNF) formula is a conjunction of clauses, where each clause is a disjunction of atomic formulas.**
+
+&#10230; Forme normale conjonctive - La forme normale conjonctive (en anglais conjunctive normal form ou CNF) d'une formule est une conjonction de clauses, chacune d'entre elles étant une disjonction de formules atomiques.
+
+<br>
+
+
+**39. Remark: in other words, CNFs are ∧ of ∨.**
+
+&#10230; Remarque : en d'autres termes, les CNFs sont des ∧ de ∨.
+
+<br>
+
+
+**40. Equivalent representation ― Every formula in propositional logic can be written into an equivalent CNF formula. The table below presents general conversion properties:**
+
+&#10230; Représentation équivalente - Chaque formule en logique propositionnelle peut être écrite de manière équivalente sous la forme d'une formule CNF. Le tableau ci-dessous présente les propriétés principales permettant une telle conversion :
+
+<br>
+
+
+**41. [Rule name, Initial, Converted, Eliminate, Distribute, over]**
+
+&#10230; [Nom de la règle, Début, Résultat, Élimine, Distribue, sur]
+
+<br>
+
+
+**42. Resolution rule ― For propositional symbols f1,...,fn, and g1,...,gm as well as p, the resolution rule is written:**
+
+&#10230; Règle de résolution - Pour des symboles propositionnels f1,...,fn, et g1,...,gm ainsi que p, la règle de résolution s'écrit :
+
+<br>
+
+
+**43. Remark: it can take exponential time to apply this rule, as each application generates a clause that has a subset of the propositional symbols.**
+
+&#10230; Remarque : l'application de cette règle peut prendre un temps exponentiel, vu que chaque itération génère une clause constituée d'une partie des symboles propositionnels.
+
+<br>
+
+
+**44. [Resolution-based inference ― The resolution-based inference algorithm follows the following steps:, Step 1: Convert all formulas into CNF, Step 2: Repeatedly apply resolution rule, Step 3: Return unsatisfiable if and only if False, is derived]**
+
+&#10230; [Inférence basée sur la règle de résolution - L'algorithme d'inférence basée sur la règle de résolution se déroule en plusieurs étapes :, Étape 1 : Conversion de toutes les formules vers leur forme CNF, Étape 2 : Application répétée de la règle de résolution, Étape 3 : Renvoyer "non satisfaisable" si et seulement si False est dérivé]
+
+<br>
+
+
+**45. First-order logic**
+
+&#10230; Calcul des prédicats du premier ordre
+
+<br>
+
+
+**46. The idea here is to use variables to yield more compact knowledge representations.**
+
+&#10230; L'idée ici est d'utiliser des variables et ainsi permettre une représentation des connaissances plus compacte.
+
+<br>
+
+
+**47. [Model ― A model w in first-order logic maps:, constant symbols to objects, predicate symbols to tuple of objects]**
+
+&#10230; [Modèle - Un modèle w en calcul des prédicats du premier ordre lie :, des symboles constants à des objets, des prédicats à n-uplets d'objets]
+
+<br>
+
+
+**48. Horn clause ― By noting x1,...,xn variables and a1,...,ak,b atomic formulas, the first-order logic version of a horn clause has the form:**
+
+&#10230; Clause de Horn - En notant x1,...,xn variables et a1,...,ak,b formules atomiques, une clause de Horn pour le calcul des prédicats du premier ordre a la forme :
+
+<br>
+
+
+**49. Substitution ― A substitution θ maps variables to terms and Subst[θ,f] denotes the result of substitution θ on f.**
+
+&#10230; Substitution - Une substitution θ lie les variables aux termes et Subst[θ,f] désigne le résultat de la substitution θ sur f.
+
+<br>
+
+
+**50. Unification ― Unification takes two formulas f and g and returns the most general substitution θ that makes them equal:**
+
+&#10230; Unification - Une unification prend deux formules f et g et renvoie la substitution θ la plus générale les rendant égales :
+
+<br>
+
+
+**51. such that**
+
+&#10230; tel que
+
+<br>
+
+
+**52. Note: Unify[f,g] returns Fail if no such θ exists.**
+
+&#10230; Note : Unify[f,g] renvoie Fail si un tel θ n'existe pas.
+
+<br>
+
+
+**53. Modus ponens ― By noting x1,...,xn variables, a1,...,ak and a′1,...,a′k atomic formulas and by calling θ=Unify(a′1∧...∧a′k,a1∧...∧ak) the first-order logic version of modus ponens can be written:**
+
+&#10230; Modus ponens - En notant x1,...,xn variables, a1,...,ak et a′1,...,a′k formules atomiques et en notant θ=Unify(a′1∧...∧a′k,a1∧...∧ak), modus ponens pour le calcul des prédicats du premier ordre s'écrit :
+
+<br>
+
+
+**54. Completeness ― Modus ponens is complete for first-order logic with only Horn clauses.**
+
+&#10230; Complétude - Modus ponens est complet pour le calcul des prédicats du premier ordre lorsqu'il agit uniquement sur les clauses de Horn.
+
+<br>
+
+
+**55. Resolution rule ― By noting f1,...,fn, g1,...,gm, p, q formulas and by calling θ=Unify(p,q), the first-order logic version of the resolution rule can be written:**
+
+&#10230; Règle de résolution - En notant f1,...,fn, g1,...,gm, p, q formules et en posant θ=Unify(p,q), le règle de résolution pour le calcul des prédicats du premier ordre s'écrit :
+
+<br>
+
+
+**56. [Semi-decidability ― First-order logic, even restricted to only Horn clauses, is semi-decidable., if KB⊨f, forward inference on complete inference rules will prove f in finite time, if KB⊭f, no algorithm can show this in finite time]**
+
+&#10230; [Semi-décidabilité - Le calcul des prédicats du premier ordre, même restreint aux clauses de Horn, n'est que semi-décidable., si KB⊨f, l'algorithme de chaînage avant sur des règles d'inférence complètes prouvera f en temps fini, si KB⊭f, aucun algorithme ne peut le prouver en temps fini]
+
+<br>
+
+
+**57. [Basics, Notations, Model, Interpretation function, Set of models]**
+
+&#10230; [Bases, Notations, Modèle, Interprétation, Ensemble de modèles]
+
+<br>
+
+
+**58. [Knowledge base, Definition, Probabilistic interpretation, Satisfiability, Relationship with formulas, Forward inference, Rule properties]**
+
+&#10230; [Base de connaissance, Définition, Interprétation en termes de probabilité, Satisfaisabilité, Lien avec les formules, Chaînage en avant, Propriétés des règles]
+
+<br>
+
+
+**59. [Propositional logic, Clauses, Modus ponens, Conjunctive normal form, Representation equivalence, Resolution]**
+
+&#10230; [Logique propositionnelle, Clauses, Modus ponens, Forme normale conjonctive, Représentation équivalente, Résolution]
+
+<br>
+
+
+**60. [First-order logic, Substitution, Unification, Resolution rule, Modus ponens, Resolution, Semi-decidability]**
+
+&#10230; [Calcul des prédicats du premier ordre, Substitution, Unification, Règle de résolution, Modus ponens, Résolution, Semi-décidabilité]
+
+<br>
+
+
+**61. View PDF version on GitHub**
+
+&#10230; Voir la version PDF sur GitHub
+
+<br>
+
+
+**62. Original authors**
+
+&#10230; Auteurs originaux.
+
+<br>
+
+
+**63. Translated by X, Y and Z**
+
+&#10230; Traduit par X, Y et Z.
+
+<br>
+
+
+**64. Reviewed by X, Y and Z**
+
+&#10230; Revu par X, Y et Z.
+
+<br>
+
+
+**65. By X and Y**
+
+&#10230; Par X et Y.
+
+<br>
+
+
+**66. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230; Les pense-bêtes d'intelligence artificielle sont maintenant disponibles en français.
diff --git a/fr/cs-221-reflex-models.md b/fr/cs-221-reflex-models.md
new file mode 100644
index 000000000..7a7a489e1
--- /dev/null
+++ b/fr/cs-221-reflex-models.md
@@ -0,0 +1,539 @@
+**Reflex-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-reflex-models)
+
+<br>
+
+**1. Reflex-based models with Machine Learning**
+
+&#10230; Modèles basés sur le réflex : apprentissage automatique
+
+<br>
+
+
+**2. Linear predictors**
+
+&#10230; Prédicteurs linéaires
+
+<br>
+
+
+**3. In this section, we will go through reflex-based models that can improve with experience, by going through samples that have input-output pairs.**
+
+&#10230; Dans cette section, nous allons explorer les modèles basés sur le réflex qui peuvent s'améliorer avec l'expérience s'appuyant sur des données ayant une correspondance entrée-sortie.
+
+<br>
+
+
+**4. Feature vector ― The feature vector of an input x is noted ϕ(x) and is such that:**
+
+&#10230; Vecteur caractéristique - Le vecteur caractéristique (en anglais feature vector) d'une entrée x est noté ϕ(x) et se décompose en :
+
+<br>
+
+
+**5. Score ― The score s(x,w) of an example (ϕ(x),y)∈Rd×R associated to a linear model of weights w∈Rd is given by the inner product:**
+
+&#10230; Score - Le score s(x,w) d'un exemple (ϕ(x),y)∈Rd×R associé à un modèle linéaire de paramètres w∈Rd est donné par le produit scalaire :
+
+<br>
+
+
+**6. Classification**
+
+&#10230; Classification
+
+<br>
+
+
+**7. Linear classifier ― Given a weight vector w∈Rd and a feature vector ϕ(x)∈Rd, the binary linear classifier fw is given by:**
+
+&#10230; Classifieur linéaire - Étant donnés un vecteur de paramètres w∈Rd et un vecteur caractéristique ϕ(x)∈Rd, le classifieur linéaire binaire est donné par :
+
+<br>
+
+
+**8. if**
+
+&#10230; si
+
+<br>
+
+
+**9. Margin ― The margin m(x,y,w)∈R of an example (ϕ(x),y)∈Rd×{−1,+1} associated to a linear model of weights w∈Rd quantifies the confidence of the prediction: larger values are better. It is given by:**
+
+&#10230; Marge - La marge (en anglais margin) m(x,y,w)∈R d'un exemple (ϕ(x),y)∈Rd×{−1,+1} associée à un modèle linéaire de paramètre w∈Rd quantifie la confiance associée à une prédiction : plus cette valeur est grande, mieux c'est. Cette quantité est donnée par :
+
+<br>
+
+
+**10. Regression**
+
+&#10230; Régression
+
+<br>
+
+
+**11. Linear regression ― Given a weight vector w∈Rd and a feature vector ϕ(x)∈Rd, the output of a linear regression of weights w denoted as fw is given by:**
+
+&#10230; Régression linéaire - Étant donnés un vecteur de paramètres w∈Rd et un vecteur caractéristique ϕ(x)∈Rd, le résultat d'une régression linéaire de paramètre w, notée fw, est donné par :
+
+<br>
+
+
+**12. Residual ― The residual res(x,y,w)∈R is defined as being the amount by which the prediction fw(x) overshoots the target y:**
+
+&#10230; Résidu - Le résidu res(x,y,w)∈R est défini comme étant la différence entre la prédiction fw(x) et la vraie valeur y.
+
+<br>
+
+
+**13. Loss minimization**
+
+&#10230; Minimisation de la fonction objectif
+
+<br>
+
+
+**14. Loss function ― A loss function Loss(x,y,w) quantifies how unhappy we are with the weights w of the model in the prediction task of output y from input x. It is a quantity we want to minimize during the training process.**
+
+&#10230; Fonction objectif - Une fonction objectif (en anglais loss function) Loss(x,y,w) traduit notre niveau d'insatisfaction avec les paramètres w du modèle dans la tâche de prédiction de la sortie y à partir de l'entrée x. C'est une quantité que l'on souhaite minimiser pendant la phase d'entraînement.
+
+<br>
+
+
+**15. Classification case - The classification of a sample x of true label y∈{−1,+1} with a linear model of weights w can be done with the predictor fw(x)≜sign(s(x,w)). In this situation, a metric of interest quantifying the quality of the classification is given by the margin m(x,y,w), and can be used with the following loss functions:**
+
+&#10230; Cas de la classification - Trouver la classe d'un exemple x appartenant à y∈{−1,+1} peut être faite par le biais d'un modèle linéaire de paramètre w à l'aide du prédicteur fw(x)≜sign(s(x,w)). La qualité de cette prédiction peut alors être évaluée au travers de la marge m(x,y,w) intervenant dans les fonctions objectif suivantes :
+
+<br>
+
+
+**16. [Name, Illustration, Zero-one loss, Hinge loss, Logistic loss]**
+
+&#10230; [Nom, Illustration, Fonction objectif zéro-un, Fonction objectif de Hinge, Fonction objectif logistique]
+
+<br>
+
+
+**17. Regression case - The prediction of a sample x of true label y∈R with a linear model of weights w can be done with the predictor fw(x)≜s(x,w). In this situation, a metric of interest quantifying the quality of the regression is given by the margin res(x,y,w) and can be used with the following loss functions:**
+
+&#10230; Cas de la régression - Prédire la valeur y∈R associée à l'exemple x peut être faite par le biais d'un modèle linéaire de paramètre w à l'aide du prédicteur fw(x)≜s(x,w). La qualité de cette prédiction peut alors être évaluée au travers du résidu res(x,y,w) intervenant dans les fonctions objectif suivantes :
+
+<br>
+
+
+**18. [Name, Squared loss, Absolute deviation loss, Illustration]**
+
+&#10230; [Nom, Erreur quadratique, Erreur absolue, Illustration]
+
+<br>
+
+
+**19. Loss minimization framework ― In order to train a model, we want to minimize the training loss is defined as follows:**
+
+&#10230; Processus de minimisation de la fonction objectif - Lors de l'entraînement d'un modèle, on souhaite minimiser la valeur de la fonction objectif évaluée sur l'ensemble d'entraînement :
+
+<br>
+
+
+**20. Non-linear predictors**
+
+&#10230; Prédicteurs non linéaires
+
+<br>
+
+
+**21. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
+
+&#10230; k plus proches voisins - L'algorithme des k plus proches voisins (en anglais k-nearest neighbors ou k-NN) est une approche non paramétrique où la réponse associée à un exemple est déterminée par la nature de ses k plus proches voisins de l'ensemble d'entraînement. Cette démarche peut être utilisée pour la classification et la régression.
+
+<br>
+
+
+**22. Remark: the higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
+
+&#10230; Remarque : plus le paramètre k est grand, plus le biais est élevé. À l'inverse, la variance devient plus élevée lorsque l'on réduit la valeur k.
+
+<br>
+
+
+**23. Neural networks ― Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks. The vocabulary around neural networks architectures is described in the figure below:**
+
+&#10230; Réseaux de neurones - Les réseaux de neurones (en anglais neural networks) constituent un type de modèle basés sur des couches (en anglais layers). Parmi les types de réseaux populaires, on peut compter les réseaux de neurones convolutionnels et récurrents (abbréviés respectivement en CNN et RNN en anglais). Une partie du vocabulaire associé aux réseaux de neurones est détaillée dans la figure ci-dessous :
+
+<br>
+
+
+**24. [Input layer, Hidden layer, Output layer]**
+
+&#10230; [Couche d'entrée, Couche cachée, Couche de sortie]
+
+<br>
+
+
+**25. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
+
+&#10230; En notant i la i-ème couche du réseau et j son j-ième neurone, on a :
+
+<br>
+
+
+**26. where we note w, b, x, z the weight, bias, input and non-activated output of the neuron respectively.**
+
+&#10230; où l'on note w, b, x, z le coefficient, le biais ainsi que la variable de sortie respectivement.
+
+<br>
+
+
+**27. For a more detailed overview of the concepts above, check out the Supervised Learning cheatsheets!**
+
+&#10230; Pour un aperçu plus détaillé des concepts ci-dessus, rendez-vous sur le pense-bête d'apprentissage supervisé !
+
+<br>
+
+
+**28. Stochastic gradient descent**
+
+&#10230; Algorithme du gradient stochastique
+
+<br>
+
+
+**29. Gradient descent ― By noting η∈R the learning rate (also called step size), the update rule for gradient descent is expressed with the learning rate and the loss function Loss(x,y,w) as follows:**
+
+&#10230; Descente de gradient - En notant η∈R le taux d'apprentissage (en anglais learning rate ou step size), la règle de mise à jour des coefficients pour cet algorithme utilise la fonction objectif Loss(x,y,w) de la manière suivante :
+
+<br>
+
+
+**30. Stochastic updates ― Stochastic gradient descent (SGD) updates the parameters of the model one training example (ϕ(x),y)∈Dtrain at a time. This method leads to sometimes noisy, but fast updates.**
+
+&#10230; Mises à jour stochastiques - L'algorithme du gradient stochastique (en anglais stochastic gradient descent ou SGD) met à jour les paramètres du modèle en parcourant les exemples (ϕ(x),y)∈Dtrain de l'ensemble d'entraînement un à un. Cette méthode engendre des mises à jour rapides à calculer mais qui manquent parfois de robustesse.
+
+<br>
+
+
+**31. Batch updates ― Batch gradient descent (BGD) updates the parameters of the model one batch of examples (e.g. the entire training set) at a time. This method computes stable update directions, at a greater computational cost.**
+
+&#10230; Mises à jour par lot - L'algorithme du gradient par lot (en anglais batch gradient descent ou BGD) met à jour les paramètre du modèle en utilisant des lots entiers d'exemples (e.g. la totalité de l'ensemble d'entraînement) à la fois. Cette méthode calcule des directions de mise à jour des coefficients plus stable au prix d'un plus grand nombre de calculs.
+
+<br>
+
+
+**32. Fine-tuning models**
+
+&#10230; Peaufinage de modèle
+
+<br>
+
+
+**33. Hypothesis class ― A hypothesis class F is the set of possible predictors with a fixed ϕ(x) and varying w:**
+
+&#10230; Classe d'hypothèses - Une classe d'hypothèses F est l'ensemble des prédicteurs candidats ayant un ϕ(x) fixé et dont le paramètre w peut varier.
+
+<br>
+
+
+**34. Logistic function ― The logistic function σ, also called the sigmoid function, is defined as:**
+
+&#10230; Fonction logistique - La fonction logistique σ, aussi appelée en anglais sigmoid function, est définie par :
+
+<br>
+
+
+**35. Remark: we have σ′(z)=σ(z)(1−σ(z)).**
+
+&#10230; Remarque : la dérivée de cette fonction s'écrit σ′(z)=σ(z)(1−σ(z)).
+
+<br>
+
+
+**36. Backpropagation ― The forward pass is done through fi, which is the value for the subexpression rooted at i, while the backward pass is done through gi=∂out∂fi and represents how fi influences the output.**
+
+&#10230; Rétropropagation du gradient (en anglais backpropagation) - La propagation avant (en anglais forward pass) est effectuée via fi, valeur correspondant à l'expression appliquée à l'étape i. La propagation de l'erreur vers l'arrière (en anglais backward pass) se fait via gi=∂out∂fi et décrit la manière dont fi agit sur la sortie du réseau.
+
+<br>
+
+
+**37. Approximation and estimation error ― The approximation error ϵapprox represents how far the entire hypothesis class F is from the target predictor g∗, while the estimation error ϵest quantifies how good the predictor ^f is with respect to the best predictor f∗ of the hypothesis class F.**
+
+&#10230; Erreur d'approximation et d'estimation - L'erreur d'approximation ϵapprox représente la distance entre la classe d'hypothèses F et le prédicteur optimal g∗. De son côté, l'erreur d'estimation quantifie la qualité du prédicteur ^f par rapport au meilleur prédicteur f∗ de la classe d'hypothèses F.
+
+<br>
+
+
+**38. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230; Régularisation - Le but de la régularisation est d'empêcher le modèle de surapprendre (en anglais overfit) les données en s'occupant ainsi des problèmes de variance élevée. La table suivante résume les différents types de régularisation couramment utilisés :
+
+<br>
+
+
+**39. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230; [Réduit les coefficients à 0, Bénéfique pour la sélection de variables, Rapetissit les coefficients, Compromis entre sélection de variables et coefficients de faible magnitude]
+
+<br>
+
+
+**40. Hyperparameters ― Hyperparameters are the properties of the learning algorithm, and include features, regularization parameter λ, number of iterations T, step size η, etc.**
+
+&#10230; Hyperparamètres - Les hyperparamètres sont les paramètres de l'algorithme d'apprentissage et incluent parmi d'autres le type de caractéristiques utilisé ainsi que le paramètre de régularisation λ, le nombre d'itérations T, le taux d'apprentissage η.
+
+<br>
+
+
+**41. Sets vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230; Vocabulaire ― Lors de la sélection d'un modèle, on divise les données en 3 différentes parties :
+
+<br>
+
+
+**42. [Training set, Validation set, Testing set]**
+
+&#10230; [Données d'entraînement, Données de validation, Données de test]
+
+<br>
+
+
+**43. [Model is trained, Usually 80% of the dataset, Model is assessed, Usually 20% of the dataset, Also called hold-out or development set, Model gives predictions, Unseen data]**
+
+&#10230; [Le modèle est entrainé, Constitue normalement 80% du jeu de données, Le modèle est évalué, Constitue normalement 20% du jeu de données, Aussi appelé données de développement (en anglais hold-out ou development set), Le modèle donne ses prédictions, Données jamais observées]
+
+<br>
+
+
+**44. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+&#10230; Une fois que le modèle a été choisi, il est entrainé sur le jeu de données entier et testé sur l'ensemble de test (qui n'a jamais été vu). Ces derniers sont représentés dans la figure ci-dessous :
+
+<br>
+
+
+**45. [Dataset, Unseen data, train, validation, test]**
+
+&#10230; [Jeu de données, Données inconnues, entrainement, validation, test]
+
+<br>
+
+
+**46. For a more detailed overview of the concepts above, check out the Machine Learning tips and tricks cheatsheets!**
+
+&#10230; Pour un aperçu plus détaillé des concepts ci-dessus, rendez-vous sur le pense-bête de petites astuces d'apprentissage automatique !
+
+<br>
+
+
+**47. Unsupervised Learning**
+
+&#10230; Apprentissage non supervisé
+
+<br>
+
+
+**48. The class of unsupervised learning methods aims at discovering the structure of the data, which may have of rich latent structures.**
+
+&#10230; Les méthodes d'apprentissage non supervisé visent à découvrir la structure (parfois riche) des données.
+
+<br>
+
+
+**49. k-means**
+
+&#10230; k-moyennes (en anglais k-means)
+
+<br>
+
+
+**50. Clustering ― Given a training set of input points Dtrain, the goal of a clustering algorithm is to assign each point ϕ(xi) to a cluster zi∈{1,...,k}**
+
+&#10230; Partitionnement - Étant donné un ensemble d'entraînement Dtrain, le but d'un algorithme de partitionnement (en anglais clustering) est d'assigner chaque point ϕ(xi) à une partition zi∈{1,...,k}.
+
+<br>
+
+
+**51. Objective function ― The loss function for one of the main clustering algorithms, k-means, is given by:**
+
+&#10230; Fonction objectif - La fonction objectif d'un des principaux algorithmes de partitionnement, k-moyennes, est donné par :
+
+<br>
+
+
+**52. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230; Après avoir aléatoirement initialisé les centroïdes de partitions μ1,μ2,...,μk∈Rn, l'algorithme k-moyennes répète l'étape suivante jusqu'à convergence :
+
+<br>
+
+
+**53. and**
+
+&#10230; et
+
+<br>
+
+
+**54. [Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230; [Initialisation des moyennes, Assignation de partition, Mise à jour des moyennes, Convergence]
+
+<br>
+
+
+**55. Principal Component Analysis**
+
+&#10230; Analyse des composantes principales
+
+<br>
+
+
+**56. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230; Étant donnée une matrice A∈Rn×n, λ est dite être une valeur propre de A s'il existe un vecteur z∈Rn∖{0}, appelé vecteur propre, tel que :
+
+<br>
+
+
+**57. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230; Théorème spectral ― Soit A∈Rn×n. Si A est symétrique, alors A est diagonalisable par une matrice réelle orthogonale U∈Rn×n. En notant Λ=diag(λ1,...,λn), on a :
+
+<br>
+
+
+**58. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230; Remarque : le vecteur propre associé à la plus grande valeur propre est appelé le vecteur propre principal de la matrice A.
+
+<br>
+
+
+**59. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:**
+
+&#10230; Algorithme ― La procédure d'analyse des composantes principales (en anglais PCA - Principal Component Analysis) est une technique de réduction de dimension qui projette les données sur k dimensions en maximisant la variance des données de la manière suivante :
+
+<br>
+
+
+**60. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230; Étape 1: Normaliser les données pour avoir une moyenne de 0 et un écart-type de 1.
+
+<br>
+
+
+**61. [where, and]**
+
+&#10230; [où, et]
+
+<br>
+
+
+**62. [Step 2: Compute Σ=1mm∑i=1ϕ(xi)ϕ(xi)T∈Rn×n, which is symmetric with real eigenvalues., Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues., Step 4: Project the data on spanR(u1,...,uk).]**
+
+&#10230; [Étape 2: Calculer Σ=1mm∑i=1ϕ(xi)ϕ(xi)T∈Rn×n, qui est symétrique avec des valeurs propres réelles., Étape 3: Calculer u1,...,uk∈Rn les k valeurs propres principales orthogonales de Σ, i.e. les vecteurs propres orthogonaux des k valeurs propres les plus grandes., Étape 4: Projeter les données sur spanR(u1,...,uk).]
+
+<br>
+
+
+**63. This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230; Cette procédure maximise la variance sur tous les espaces à k dimensions.
+
+<br>
+
+
+**64. [Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230; [Données dans l'espace initial, Trouve les composantes principales, Données dans l'espace des composantes principales]
+
+<br>
+
+
+**65. For a more detailed overview of the concepts above, check out the Unsupervised Learning cheatsheets!**
+
+&#10230; Pour un aperçu plus détaillé des concepts ci-dessus, rendez-vous sur le pense-bête d'apprentissage non supervisé !
+
+<br>
+
+
+**66. [Linear predictors, Feature vector, Linear classifier/regression, Margin]**
+
+&#10230; [Prédicteurs linéaires, Vecteur caractéristique, Classification/régression linéaire, Marge]
+
+<br>
+
+
+**67. [Loss minimization, Loss function, Framework]**
+
+&#10230; [Minimisation de la fonction objectif, Fonction objectif, Cadre]
+
+<br>
+
+
+**68. [Non-linear predictors, k-nearest neighbors, Neural networks]**
+
+&#10230; [Prédicteurs non linéaires, k plus proches voisins, Réseaux de neurones]
+
+<br>
+
+
+**69. [Stochastic gradient descent, Gradient, Stochastic updates, Batch updates]**
+
+&#10230; [Algorithme du gradient stochastique, Gradient, Mises à jour stochastiques, Mises à jour par lots]
+
+<br>
+
+
+**70. [Fine-tuning models, Hypothesis class, Backpropagation, Regularization, Sets vocabulary]**
+
+&#10230; [Peaufiner les modèles, Classe d'hypothèses, Rétropropagation du gradient, Régularisation, Vocabulaire]
+
+<br>
+
+
+**71. [Unsupervised Learning, k-means, Principal components analysis]**
+
+&#10230; [Apprentissage non supervisé, k-means, Analyse des composantes principales]
+
+<br>
+
+
+**72. View PDF version on GitHub**
+
+&#10230; Voir la version PDF sur GitHub
+
+<br>
+
+
+**73. Original authors**
+
+&#10230; Auteurs d'origine
+
+<br>
+
+
+**74. Translated by X, Y and Z**
+
+&#10230; Traduit par X, Y et Z
+
+<br>
+
+
+**75. Reviewed by X, Y and Z**
+
+&#10230; Revu par X, Y et Z
+
+<br>
+
+
+**76. By X and Y**
+
+&#10230; De X et Y
+
+<br>
+
+
+**77. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230; Les pense-bêtes d'intelligence artificielle sont maintenant disponibles en français.
diff --git a/fr/cs-221-states-models.md b/fr/cs-221-states-models.md
new file mode 100644
index 000000000..20be6ebb7
--- /dev/null
+++ b/fr/cs-221-states-models.md
@@ -0,0 +1,980 @@
+**States-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-states-models)
+
+<br>
+
+**1. States-based models with search optimization and MDP**
+
+&#10230; Modèles basés sur les états : optimisation de parcours et MDPs
+
+<br>
+
+
+**2. Search optimization**
+
+&#10230; Optimisation de parcours
+
+<br>
+
+
+**3. In this section, we assume that by accomplishing action a from state s, we deterministically arrive in state Succ(s,a). The goal here is to determine a sequence of actions (a1,a2,a3,a4,...) that starts from an initial state and leads to an end state. In order to solve this kind of problem, our objective will be to find the minimum cost path by using states-based models.**
+
+&#10230; Dans cette section, nous supposons qu'en effectuant une action a à partir d'un état s, on arrive de manière déterministe à l'état Succ(s,a). Le but de cette étude est de déterminer une séquence d'actions (a1,a2,a3,a4,...) démarrant d'un état initial et aboutissant à un état final. Pour y parvenir, notre objectif est de minimiser le coût associés à ces actions à l'aide de modèles basés sur les états (state-based model en anglais).
+
+<br>
+
+
+**4. Tree search**
+
+&#10230; Parcours d'arbre
+
+<br>
+
+
+**5. This category of states-based algorithms explores all possible states and actions. It is quite memory efficient, and is suitable for huge state spaces but the runtime can become exponential in the worst cases.**
+
+&#10230; Cette catégorie d'algorithmes explore tous les états et actions possibles. Même si leur consommation en mémoire est raisonnable et peut supporter des espaces d'états de taille très grande, ce type d'algorithmes est néanmoins susceptible d'engendrer des complexités en temps exponentielles dans le pire des cas.
+
+<br>
+
+
+**6. [Self-loop, More than a parent, Cycle, More than a root, Valid tree]**
+
+&#10230; [Boucle, Plus d'un parent, Cycle, Plus d'une racine, Arbre valide]
+
+<br>
+
+
+**7. [Search problem ― A search problem is defined with:, a starting state sstart, possible actions Actions(s) from state s, action cost Cost(s,a) from state s with action a, successor Succ(s,a) of state s after action a, whether an end state was reached IsEnd(s)]**
+
+&#10230; [Problème de recherche - Un problème de recherche est défini par :, un état de départ sstart, des actions Actions(s) pouvant être effectuées depuis l'état s, le coût de l'action Cost(s,a) depuis l'état s pour effectuer l'action a, le successeur Succ(s,a) de l'état s après avoir effectué l'action a, la connaissance d'avoir atteint ou non un état final IsEnd(s)]
+
+<br>
+
+
+**8. The objective is to find a path that minimizes the cost.**
+
+&#10230; L'objectif est de trouver un chemin minimisant le coût total des actions utilisées.
+
+<br>
+
+
+**9. Backtracking search ― Backtracking search is a naive recursive algorithm that tries all possibilities to find the minimum cost path. Here, action costs can be either positive or negative.**
+
+&#10230; Retour sur trace - L'algorithme de retour sur trace (en anglais backtracking search) est un algorithme récursif explorant naïvement toutes les possibilités jusqu'à trouver le chemin de coût minimal.
+
+<br>
+
+
+**10. Breadth-first search (BFS) ― Breadth-first search is a graph search algorithm that does a level-by-level traversal. We can implement it iteratively with the help of a queue that stores at each step future nodes to be visited. For this algorithm, we can assume action costs to be equal to a constant c⩾0.**
+
+&#10230; Parcours en largeur (BFS) - L'algorithme de parcours en largeur (en anglais breadth-first search ou BFS) est un algorithme de parcours de graphe traversant chaque niveau de manière successive. On peut le coder de manière itérative à l'aide d'une queue stockant à chaque étape les prochains nœuds à visiter. Cet algorithme suppose que le coût de toutes les actions est égal à une constante c⩾0.
+
+<br>
+
+
+**11. Depth-first search (DFS) ― Depth-first search is a search algorithm that traverses a graph by following each path as deep as it can. We can implement it recursively, or iteratively with the help of a stack that stores at each step future nodes to be visited. For this algorithm, action costs are assumed to be equal to 0.**
+
+&#10230; Parcours en profondeur (DFS) - L'algorithme de parcours en profondeur (en anglais depth-first search ou DFS) est un algorithme de parcours de graphe traversant chaque chemin qu'il emprunte aussi loin que possible. On peut le coder de manière récursive, ou itérative à l'aide d'une pile qui stocke à chaque étape les prochains nœuds à visiter. Cet algorithme suppose que le coût de toutes les actions est égal à 0.
+
+<br>
+
+
+**12. Iterative deepening ― The iterative deepening trick is a modification of the depth-first search algorithm so that it stops after reaching a certain depth, which guarantees optimality when all action costs are equal. Here, we assume that action costs are equal to a constant c⩾0.**
+
+&#10230; Approfondissement itératif - L'astuce de l'approfondissement itératif (en anglais iterative deepening) est une modification de l'algorithme de DFS qui l'arrête après avoir atteint une certaine profondeur, garantissant l'optimalité de la solution trouvée quand toutes les actions ont un même coût constant c⩾0.
+
+<br>
+
+
+**13. Tree search algorithms summary ― By noting b the number of actions per state, d the solution depth, and D the maximum depth, we have:**
+
+&#10230; Récapitulatif des algorithmes de parcours d'arbre - En notant b le nombre d'actions par état, d la profondeur de la solution et D la profondeur maximale, on a :
+
+<br>
+
+
+**14. [Algorithm, Action costs, Space, Time]**
+
+&#10230; [Algorithme, Coût des actions, Espace, Temps]
+
+<br>
+
+
+**15. [Backtracking search, any, Breadth-first search, Depth-first search, DFS-Iterative deepening]**
+
+&#10230; [Retour sur trace, peu importe, Parcours en largeur, Parcours en profondeur, DFS-approfondissement itératif]
+
+<br>
+
+
+**16. Graph search**
+
+&#10230; Parcours de graphe
+
+<br>
+
+
+**17. This category of states-based algorithms aims at constructing optimal paths, enabling exponential savings. In this section, we will focus on dynamic programming and uniform cost search.**
+
+&#10230; Cette catégorie d'algorithmes basés sur les états vise à trouver des chemins optimaux avec une complexité moins grande qu'exponentielle. Dans cette section, nous allons nous concentrer sur la programmation dynamique et la recherche à coût uniforme.
+
+<br>
+
+
+**18. Graph ― A graph is comprised of a set of vertices V (also called nodes) as well as a set of edges E (also called links).**
+
+&#10230; Graphe - Un graphe se compose d'un ensemble de sommets V (aussi appelés noeuds) et d'arêtes E (appelés arcs lorsque le graphe est orienté).
+
+<br>
+
+
+**19. Remark: a graph is said to be acylic when there is no cycle.**
+
+&#10230; Remarque : un graphe est dit être acyclique lorsqu'il ne contient pas de cycle.
+
+<br>
+
+
+**20. State ― A state is a summary of all past actions sufficient to choose future actions optimally.**
+
+&#10230; État - Un état contient le résumé des actions passées suffisant pour choisir les actions futures de manière optimale.
+
+<br>
+
+
+**21. Dynamic programming ― Dynamic programming (DP) is a backtracking search algorithm with memoization (i.e. partial results are saved) whose goal is to find a minimum cost path from state s to an end state send. It can potentially have exponential savings compared to traditional graph search algorithms, and has the property to only work for acyclic graphs. For any given state s, the future cost is computed as follows:**
+
+&#10230; Programmation dynamique - La programmation dynamique (en anglais dynamic programming ou DP) est un algorithme de recherche de type retour sur trace qui utilise le principe de mémoïsation (i.e. les résultats intermédiaires sont enregistrés) et ayant pour but de trouver le chemin à coût minimal allant de l'état s à l'état final send. Cette procédure peut potentiellement engendrer des économies exponentielles si on la compare aux algorithmes de parcours de graphe traditionnels, et a la propriété de ne marcher que dans le cas de graphes acycliques. Pour un état s donné, le coût futur est calculé de la manière suivante :
+
+<br>
+
+
+**22. [if, otherwise]**
+
+&#10230; [si, sinon]
+
+<br>
+
+
+**23. Remark: the figure above illustrates a bottom-to-top approach whereas the formula provides the intuition of a top-to-bottom problem resolution.**
+
+&#10230; Remarque : la figure ci-dessus illustre une approche ascendante alors que la formule nous donne l'intuition d'une résolution avec une approche descendante.
+
+<br>
+
+
+**24. Types of states ― The table below presents the terminology when it comes to states in the context of uniform cost search:**
+
+&#10230; Types d'états - La table ci-dessous présente la terminologie relative aux états dans le contexte de la recherche à coût uniforme :
+
+<br>
+
+
+**25. [State, Explanation]**
+
+&#10230; [État, Explication]
+
+<br>
+
+
+**26. [Explored, Frontier, Unexplored]**
+
+&#10230; [Exploré, Frontière, Inexploré]
+
+<br>
+
+
+**27. [States for which the optimal path has already been found, States seen for which we are still figuring out how to get there with the cheapest cost, States not seen yet]**
+
+&#10230; [États pour lesquels le chemin optimal a déjà été trouvé, États rencontrés mais pour lesquels on se demande toujours comment s'y rendre avec un coût minimal, États non rencontrés jusqu'à présent]
+
+<br>
+
+
+**28. Uniform cost search ― Uniform cost search (UCS) is a search algorithm that aims at finding the shortest path from a state sstart to an end state send. It explores states s in increasing order of PastCost(s) and relies on the fact that all action costs are non-negative.**
+
+&#10230; Recherche à coût uniforme - La recherche à coût uniforme (uniform cost search ou UCS en anglais) est un algorithme de recherche qui a pour but de trouver le chemin le plus court entre les états sstart et send. Celui-ci explore les états s en les triant par coût croissant de PastCost(s) et repose sur le fait que toutes les actions ont un coût non négatif.
+
+<br>
+
+
+**29. Remark 1: the UCS algorithm is logically equivalent to Dijkstra's algorithm.**
+
+&#10230; Remarque 1 : UCS fonctionne de la même manière que l'algorithme de Dijkstra.
+
+<br>
+
+
+**30. Remark 2: the algorithm would not work for a problem with negative action costs, and adding a positive constant to make them non-negative would not solve the problem since this would end up being a different problem.**
+
+&#10230; Remarque 2 : cet algorithme ne marche pas sur une configuration contenant des actions à coût négatif. Quelqu'un pourrait penser à ajouter une constante positive à tous les coûts, mais cela ne résoudrait rien puisque le problème résultant serait différent.
+
+<br>
+
+
+**31. Correctness theorem ― When a state s is popped from the frontier F and moved to explored set E, its priority is equal to PastCost(s) which is the minimum cost path from sstart to s.**
+
+&#10230; Théorème de correction - Lorsqu'un état s passe de la frontière F à l'ensemble exploré E, sa priorité est égale à PastCost(s), représentant le chemin de coût minimal allant de sstart à s.
+
+<br>
+
+
+**32. Graph search algorithms summary ― By noting N the number of total states, n of which are explored before the end state send, we have:**
+
+&#10230; Récapitulatif des algorithmes de parcours de graphe - En notant N le nombre total d'états dont n sont explorés avant l'état final send, on a :
+
+<br>
+
+
+**33. [Algorithm, Acyclicity, Costs, Time/space]**
+
+&#10230; [Algorithme, Acyclicité, Coûts, Temps/Espace]
+
+<br>
+
+
+**34. [Dynamic programming, Uniform cost search]**
+
+&#10230; [Programmation dynamique, Recherche à coût uniforme]
+
+<br>
+
+
+**35. Remark: the complexity countdown supposes the number of possible actions per state to be constant.**
+
+&#10230; Remarque : ce décompte de la complexité suppose que le nombre d'actions possibles à partir de chaque état est constant.
+
+<br>
+
+
+**36. Learning costs**
+
+&#10230; Apprendre les coûts
+
+<br>
+
+
+**37. Suppose we are not given the values of Cost(s,a), we want to estimate these quantities from a training set of minimizing-cost-path sequence of actions (a1,a2,...,ak).**
+
+&#10230; Supposons que nous ne sommes pas donnés les valeurs de Cost(s,a). Nous souhaitons estimer ces quantités à partir d'un ensemble d'apprentissage de chemins à coût minimaux d'actions (a1,a2,...,ak).
+
+<br>
+
+
+**38. [Structured perceptron ― The structured perceptron is an algorithm aiming at iteratively learning the cost of each state-action pair. At each step, it:, decreases the estimated cost of each state-action of the true minimizing path y given by the training data, increases the estimated cost of each state-action of the current predicted path y' inferred from the learned weights.]**
+
+&#10230; [Perceptron structuré - L'algorithme du perceptron structuré vise à apprendre de manière itérative les coûts des paires état-action. À chaque étape, il :, fait décroître le coût estimé de chaque état-action du vrai chemin minimisant y donné par la base d'apprentissage, fait croître le coût estimé de chaque état-action du chemin y' prédit comme étant minimisant par les paramètres appris par l'algorithme.]
+
+<br>
+
+
+**39. Remark: there are several versions of the algorithm, one of which simplifies the problem to only learning the cost of each action a, and the other parametrizes Cost(s,a) to a feature vector of learnable weights.**
+
+&#10230; Remarque : plusieurs versions de cette algorithme existent, l'une d'elles réduisant ce problème à l'apprentissage du coût de chaque action a et l'autre paramétrisant chaque Cost(s,a) à un vecteur de paramètres pouvant être appris.
+
+<br>
+
+
+**40. A* search**
+
+&#10230; Algorithme A*
+
+<br>
+
+
+**41. Heuristic function ― A heuristic is a function h over states s, where each h(s) aims at estimating FutureCost(s), the cost of the path from s to send.**
+
+&#10230; Fonction heuristique - Une heuristique est une fonction h opérant sur les états s, où chaque h(s) vise à estimer FutureCost(s), le coût du chemin optimal allant de s à send.
+
+<br>
+
+
+**42. Algorithm ― A∗ is a search algorithm that aims at finding the shortest path from a state s to an end state send. It explores states s in increasing order of PastCost(s)+h(s). It is equivalent to a uniform cost search with edge costs Cost′(s,a) given by:**
+
+&#10230; Algorithme - A* est un algorithme de recherche visant à trouver le chemin le plus court entre un état s et un état final send. Il le fait en explorant les états s triés par ordre croissant de PastCost(s)+h(s). Cela revient à utiliser l'algorithme UCS où chaque arête est associée au coût Cost′(s,a) donné par :
+
+<br>
+
+
+**43. Remark: this algorithm can be seen as a biased version of UCS exploring states estimated to be closer to the end state.**
+
+&#10230; Remarque : cet algorithme peut être vu comme une version biaisée de UCS explorant les états estimés comme étant plus proches de l'état final.
+
+<br>
+
+
+**44. [Consistency ― A heuristic h is said to be consistent if it satisfies the two following properties:, For all states s and actions a, The end state verifies the following:]**
+
+&#10230; [Consistance - Une heuristique h est dite consistante si elle satisfait les deux propriétés suivantes :, Pour tous états s et actions a, L'état final vérifie la propriété :]
+
+<br>
+
+
+**45. Correctness ― If h is consistent, then A∗ returns the minimum cost path.**
+
+&#10230; Correction - Si h est consistante, alors A* renvoie le chemin de coût minimal.
+
+<br>
+
+
+**46. Admissibility ― A heuristic h is said to be admissible if we have:**
+
+&#10230; Admissibilité - Une heuristique est dite admissible si l'on a :
+
+<br>
+
+
+**47. Theorem ― Let h(s) be a given heuristic. We have:**
+
+&#10230; Théorème - Soit h(s) une heuristique. On a :
+
+<br>
+
+
+**48. [consistent, admissible]**
+
+&#10230; [consistante, admissible]
+
+<br>
+
+
+**49. Efficiency ― A* explores all states s satisfying the following equation:**
+
+&#10230; Efficacité - A* explore les états s satisfaisant l'équation :
+
+<br>
+
+
+**50. Remark: larger values of h(s) is better as this equation shows it will restrict the set of states s going to be explored.**
+
+&#10230; Remarque : avoir h(s) élevé est préférable puisque cette équation montre que le nombre d'états s à explorer est alors réduit.
+
+<br>
+
+
+**51. Relaxation**
+
+&#10230; Relaxation
+
+<br>
+
+
+**52. It is a framework for producing consistent heuristics. The idea is to find closed-form reduced costs by removing constraints and use them as heuristics.**
+
+&#10230; C'est un type de procédure permettant de produire des heuristiques consistantes. L'idée est de trouver une fonction de coût facile à exprimer en enlevant des contraintes au problème, et ensuite l'utiliser en tant qu'heuristique.
+
+<br>
+
+
+**53. Relaxed search problem ― The relaxation of search problem P with costs Cost is noted Prel with costs Costrel, and satisfies the identity:**
+
+&#10230; Relaxation d'un problème de recherche - La relaxation d'un problème de recherche P aux coûts Cost est noté Prel avec coûts Costrel, et vérifie la relation :
+
+<br>
+
+
+**54. Relaxed heuristic ― Given a relaxed search problem Prel, we define the relaxed heuristic h(s)=FutureCostrel(s) as the minimum cost path from s to an end state in the graph of costs Costrel(s,a).**
+
+&#10230; Relaxation d'une heuristique - Étant donné la relaxation d'un problème de recherche Prel, on définit l'heuristique relaxée h(s)=FutureCostrel(s) comme étant le chemin de coût minimal allant de s à un état final dans le graphe de fonction de coût Costrel(s,a).
+
+<br>
+
+
+**55. Consistency of relaxed heuristics ― Let Prel be a given relaxed problem. By theorem, we have:**
+
+&#10230; Consistance de la relaxation d'heuristiques - Soit Prel une relaxation d'un problème de recherche. Par théorème, on a :
+
+<br>
+
+
+**56. consistent**
+
+&#10230; consistante
+
+<br>
+
+
+**57. [Tradeoff when choosing heuristic ― We have to balance two aspects in choosing a heuristic:, Computational efficiency: h(s)=FutureCostrel(s) must be easy to compute. It has to produce a closed form, easier search and independent subproblems., Good enough approximation: the heuristic h(s) should be close to FutureCost(s) and we have thus to not remove too many constraints.]**
+
+&#10230; [Compromis lors du choix d'heuristique - Le choix d'heuristique se repose sur un compromis entre :, Complexité de calcul : h(s)=FutureCostrel(s) doit être facile à calculer. De manière préférable, cette fonction peut s'exprimer de manière explicite et elle permet de diviser le problème en sous-parties indépendantes.]
+
+<br>
+
+
+**58. Max heuristic ― Let h1(s), h2(s) be two heuristics. We have the following property:**
+
+&#10230; Heuristique max - Soient h1(s) et h2(s) deux heuristiques. On a la propriété suivante :
+
+<br>
+
+
+**59. Markov decision processes**
+
+&#10230; Processus de décision markovien
+
+<br>
+
+
+**60. In this section, we assume that performing action a from state s can lead to several states s′1,s′2,... in a probabilistic manner. In order to find our way between an initial state and an end state, our objective will be to find the maximum value policy by using Markov decision processes that help us cope with randomness and uncertainty.**
+
+&#10230; Dans cette section, on suppose qu'effectuer l'action a à partir de l'état s peut mener de manière probabiliste à plusieurs états s′1,s′2,... Dans le but de trouver ce qu'il faudrait faire entre un état initial et un état final, on souhaite trouver une stratégie maximisant la quantité des récompenses en utilisant un outil adapté à l'imprévisibilité et l'incertitude : les processus de décision markoviens.
+
+<br>
+
+
+**61. Notations**
+
+&#10230; Notations
+
+<br>
+
+
+**62. [Definition ― The objective of a Markov decision process is to maximize rewards. It is defined with:, a starting state sstart, possible actions Actions(s) from state s, transition probabilities T(s,a,s′) from s to s′ with action a, rewards Reward(s,a,s′) from s to s′ with action a, whether an end state was reached IsEnd(s), a discount factor 0⩽γ⩽1]**
+
+&#10230; [Définition - L'objectif d'un processus de décision markovien (en anglais Markov decision process ou MDP) est de maximiser la quantité de récompenses. Un tel problème est défini par :, un état de départ sstart, l'ensemble des actions Actions(s) pouvant être effectuées à partir de l'état s, la probabilité de transition T(s,a,s′) de l'état s vers l'état s' après avoir pris l'action a, la récompense Reward(s,a,s′) pour être passé de l'état s à l'état s' après avoir pris l'action a, la connaissance d'avoir atteint ou non un état final IsEnd(s), un facteur de dévaluation 0⩽γ⩽1]
+
+<br>
+
+
+**63. Transition probabilities ― The transition probability T(s,a,s′) specifies the probability of going to state s′ after action a is taken in state s. Each s′↦T(s,a,s′) is a probability distribution, which means that:**
+
+&#10230; Probabilités de transitions - La probabilité de transition T(s,a,s′) représente la probabilité de transitionner vers l'état s' après avoir effectué l'action a en étant dans l'état s. Chaque s′↦T(s,a,s′) est une loi de probabilité :
+
+<br>
+
+
+**64. states**
+
+&#10230; états
+
+<br>
+
+
+**65. Policy ― A policy π is a function that maps each state s to an action a, i.e.**
+
+&#10230; Politique - Une politique π est une fonction liant chaque état s à une action a, i.e. :
+
+<br>
+
+
+**66. Utility ― The utility of a path (s0,...,sk) is the discounted sum of the rewards on that path. In other words,**
+
+&#10230; Utilité - L'utilité d'un chemin (s0,...,sk) est la somme des récompenses dévaluées récoltées sur ce chemin. En d'autres termes,
+
+<br>
+
+
+**67. The figure above is an illustration of the case k=4.**
+
+&#10230; La figure ci-dessus illustre le cas k=4.
+
+<br>
+
+
+**68. Q-value ― The Q-value of a policy π at state s with action a, also noted Qπ(s,a), is the expected utility from state s after taking action a and then following policy π. It is defined as follows:**
+
+&#10230; Q-value - La fonction de valeur des états-actions (Q-value en anglais) d'une politique π évaluée à l'état s avec l'action a, aussi notée Qπ(s,a), est l'espérance de l'utilité partant de l'état s avec l'action a et adoptant ensuite la politique π. Cette fonction est définie par :
+
+<br>
+
+
+**69. Value of a policy ― The value of a policy π from state s, also noted Vπ(s), is the expected utility by following policy π from state s over random paths. It is defined as follows:**
+
+&#10230; Fonction de valeur des états d'une politique - La fonction de valeur des états d'une politique π évaluée à l'état s, aussi notée Vπ(s), est l'espérance de l'utilité partant de l'état s et adoptant ensuite la politique π. Cette fonction est définie par :
+
+<br>
+
+
+**70. Remark: Vπ(s) is equal to 0 if s is an end state.**
+
+&#10230; Remarque : Vπ(s) vaut 0 si s est un état final.
+
+<br>
+
+
+**71. Applications**
+
+&#10230; Applications
+
+<br>
+
+
+**72. [Policy evaluation ― Given a policy π, policy evaluation is an iterative algorithm that aims at estimating Vπ. It is done as follows:, Initialization: for all states s, we have:, Iteration: for t from 1 to TPE, we have, with]**
+
+&#10230; [Évaluation d'une politique - Étant donnée une politique π, on peut utiliser l'algorithme itératif d'évaluation de politiques (en anglais policy evaluation) pour estimer Vπ :, Initialisation : pour tous les états s, on a, Itération : pour t allant de 1 à TPE, on a, avec]
+
+<br>
+
+
+**73. Remark: by noting S the number of states, A the number of actions per state, S′ the number of successors and T the number of iterations, then the time complexity is of O(TPESS′).**
+
+&#10230; Remarque : en notant S le nombre d'états, A le nombre d'actions par états, S' le nombre de successeurs et T le nombre d'itérations, la complexité en temps est alors de O(TPESS′).
+
+<br>
+
+
+**74. Optimal Q-value ― The optimal Q-value Qopt(s,a) of state s with action a is defined to be the maximum Q-value attained by any policy starting. It is computed as follows:**
+
+&#10230; Q-value optimale - La Q-value optimale Qopt(s,a) d'un état s avec l'action a est définie comme étant la Q-value maximale atteinte avec n'importe quelle politique. Elle est calculée avec la formule :
+
+<br>
+
+
+**75. Optimal value ― The optimal value Vopt(s) of state s is defined as being the maximum value attained by any policy. It is computed as follows:**
+
+&#10230; Valeur optimale - La valeur optimale Vopt(s) d'un état s est définie comme étant la valeur maximum atteinte par n'importe quelle politique. Elle est calculée avec la formule :
+
+<br>
+
+
+**76. actions**
+
+&#10230; actions
+
+<br>
+
+
+**77. Optimal policy ― The optimal policy πopt is defined as being the policy that leads to the optimal values. It is defined by:**
+
+&#10230; Politique optimale - La politique optimale πopt est définie comme étant la politique liée aux valeurs optimales. Elle est définie par :
+
+<br>
+
+
+**78. [Value iteration ― Value iteration is an algorithm that finds the optimal value Vopt as well as the optimal policy πopt. It is done as follows:, Initialization: for all states s, we have:, Iteration: for t from 1 to TVI, we have:, with]**
+
+&#10230; [Itération sur la valeur - L'algorithme d'itération sur la valeur (en anglais value iteration) vise à trouver la valeur optimale Vopt ainsi que la politique optimale πopt en deux temps :, Initialisation : pour tout état s, on a, Itération : pour t allant de 1 à TVI, on a, avec]
+
+<br>
+
+
+**79. Remark: if we have either γ<1 or the MDP graph being acyclic, then the value iteration algorithm is guaranteed to converge to the correct answer.**
+
+&#10230; Remarque : si γ<1 ou si le graphe associé au processus de décision markovien est acyclique, alors l'algorithme d'itération sur la valeur est garanti de converger vers la bonne solution.
+
+<br>
+
+
+**80. When unknown transitions and rewards**
+
+&#10230; Cas des transitions et récompenses inconnues
+
+<br>
+
+
+**81. Now, let's assume that the transition probabilities and the rewards are unknown.**
+
+&#10230; On suppose maintenant que les probabilités de transition et les récompenses sont inconnues.
+
+<br>
+
+
+**82. Model-based Monte Carlo ― The model-based Monte Carlo method aims at estimating T(s,a,s′) and Reward(s,a,s′) using Monte Carlo simulation with:**
+
+&#10230; Monte-Carlo basé sur modèle - La méthode de Monte-Carlo basée sur modèle (en anglais model-based Monte Carlo) vise à estimer T(s,a,s′) et Reward(s,a,s′) en utilisant des simulations de Monte-Carlo avec :
+
+<br>
+
+
+**83. [# times (s,a,s′) occurs, and]**
+
+&#10230; [# de fois où (s,a,s') se produit]
+
+<br>
+
+
+**84. These estimations will be then used to deduce Q-values, including Qπ and Qopt.**
+
+&#10230; Ces estimations sont ensuite utilisées pour trouver les Q-values, ainsi que Qπ et Qopt.
+
+<br>
+
+
+**85. Remark: model-based Monte Carlo is said to be off-policy, because the estimation does not depend on the exact policy.**
+
+&#10230; Remarque : la méthode de Monte-Carlo basée sur modèle est dite "hors politique" (en anglais "off-policy") car l'estimation produite ne dépend pas de la politique utilisée.
+
+<br>
+
+
+**86. Model-free Monte Carlo ― The model-free Monte Carlo method aims at directly estimating Qπ, as follows:**
+
+&#10230; Monte-Carlo sans modèle - La méthode de Monte-Carlo sans modèle (en anglais model-free Monte Carlo) vise à directement estimer Qπ de la manière suivante :
+
+<br>
+
+
+**87. Qπ(s,a)=average of ut where st−1=s,at=a**
+
+&#10230; Qπ(s,a)=moyenne de ut où st−1=s,at=a
+
+<br>
+
+
+**88. where ut denotes the utility starting at step t of a given episode.**
+
+&#10230; où ut désigne l'utilité à partir de l'étape t d'un épisode donné.
+
+<br>
+
+
+**89. Remark: model-free Monte Carlo is said to be on-policy, because the estimated value is dependent on the policy π used to generate the data.**
+
+&#10230; Remarque : la méthode de Monte-Carlo sans modèle est dite "sur politique" (en anglais "on-policy") car l'estimation produite dépend de la politique π utilisée pour générer les données.
+
+<br>
+
+
+**90. Equivalent formulation - By introducing the constant η=11+(#updates to (s,a)) and for each (s,a,u) of the training set, the update rule of model-free Monte Carlo has a convex combination formulation:**
+
+&#10230; Formulation équivalente - En introduisant la constante η=11+(#mises à jour à (s,a)) et pour chaque triplet (s,a,u) de la base d'apprentissage, la formule de récurrence de la méthode de Monte-Carlo sans modèle s'écrit à l'aide de la combinaison convexe :
+
+<br>
+
+
+**91. as well as a stochastic gradient formulation:**
+
+&#10230; ainsi qu'une formulation mettant en valeur une sorte de gradient :
+
+<br>
+
+
+**92. SARSA ― State-action-reward-state-action (SARSA) is a boostrapping method estimating Qπ by using both raw data and estimates as part of the update rule. For each (s,a,r,s′,a′), we have:**
+
+&#10230; SARSA - État-action-récompense-état-action (en anglais state-action-reward-state-action ou SARSA) est une méthode de bootstrap qui estime Qπ en utilisant à la fois des données réelles et estimées dans sa formule de mise à jour. Pour chaque (s,a,r,s′,a′), on a :
+
+<br>
+
+
+**93. Remark: the SARSA estimate is updated on the fly as opposed to the model-free Monte Carlo one where the estimate can only be updated at the end of the episode.**
+
+&#10230; Remarque : l'estimation donnée par SARSA est mise à jour à la volée contrairement à celle donnée par la méthode de Monte-Carlo sans modèle où la mise à jour est uniquement effectuée à la fin de l'épisode.
+
+<br>
+
+
+**94. Q-learning ― Q-learning is an off-policy algorithm that produces an estimate for Qopt. On each (s,a,r,s′,a′), we have:**
+
+&#10230; Q-learning - Le Q-apprentissage (en anglais Q-learning) est un algorithme hors politique (en anglais off-policy) donnant une estimation de Qopt. Pour chaque (s,a,r,s′,a′), on a :
+
+<br>
+
+
+**95. Epsilon-greedy ― The epsilon-greedy policy is an algorithm that balances exploration with probability ϵ and exploitation with probability 1−ϵ. For a given state s, the policy πact is computed as follows:**
+
+&#10230; Epsilon-glouton - La politique epsilon-gloutonne (en anglais epsilon-greedy) est un algorithme essayant de trouver un compromis entre l'exploration avec probabilité ϵ et l'exploitation avec probabilité 1-ϵ. Pour un état s, la politique πact est calculée par :
+
+<br>
+
+
+**96. [with probability, random from Actions(s)]**
+
+&#10230; [avec probabilité, aléatoire venant d'Actions(s)]
+
+<br>
+
+
+**97. Game playing**
+
+&#10230; Jeux
+
+<br>
+
+
+**98. In games (e.g. chess, backgammon, Go), other agents are present and need to be taken into account when constructing our policy.**
+
+&#10230; Dans les jeux (e.g. échecs, backgammon, Go), d'autres agents sont présents et doivent être pris en compte au moment d'élaborer une politique.
+
+<br>
+
+
+**99. Game tree ― A game tree is a tree that describes the possibilities of a game. In particular, each node is a decision point for a player and each root-to-leaf path is a possible outcome of the game.**
+
+&#10230; Arbre de jeu - Un arbre de jeu est un arbre détaillant toutes les issues possibles d'un jeu. En particulier, chaque noeud représente un point de décision pour un joueur et chaque chemin liant la racine à une des feuilles traduit une possible instance du jeu.
+
+<br>
+
+
+**100. [Two-player zero-sum game ― It is a game where each state is fully observed and such that players take turns. It is defined with:, a starting state sstart, possible actions Actions(s) from state s, successors Succ(s,a) from states s with actions a, whether an end state was reached IsEnd(s), the agent's utility Utility(s) at end state s, the player Player(s) who controls state s]**
+
+&#10230; [Jeu à somme nulle à deux joueurs - C'est un type jeu où chaque état est entièrement observé et où les joueurs jouent de manière successive. On le définit par :, un état de départ sstart, de possibles actions Actions(s) partant de l'état s, du successeur Succ(s,a) l'état s après avoir effectué l'action a, la connaissance d'avoir atteint ou non un état final IsEnd(s), l'utilité de l'agent Utility(s) à l'état final s, le joueur Player(s) qui contrôle l'état s]
+
+<br>
+
+
+**101. Remark: we will assume that the utility of the agent has the opposite sign of the one of the opponent.**
+
+&#10230; Remarque : nous assumerons que l'utilité de l'agent a le signe opposé de celui de son adversaire.
+
+<br>
+
+
+**102. [Types of policies ― There are two types of policies:, Deterministic policies, noted πp(s), which are actions that player p takes in state s., Stochastic policies, noted πp(s,a)∈[0,1], which are probabilities that player p takes action a in state s.]**
+
+&#10230; [Types de politiques - Il y a deux types de politiques :, Les politiques déterministes, notées πp(s), qui représentent pour tout s l'action que le joueur p prend dans l'état s., Les politiques stochastiques, notées πp(s,a)∈[0,1], qui sont décrites pour tout s et a par la probabilité que le joueur p prenne l'action a dans l'état s.]
+
+<br>
+
+
+**103. Expectimax ― For a given state s, the expectimax value Vexptmax(s) is the maximum expected utility of any agent policy when playing with respect to a fixed and known opponent policy πopp. It is computed as follows:**
+
+&#10230; Expectimax - Pour un état donné s, la valeur d'expectimax Vexptmax(s) est l'utilité maximum sur l'ensemble des politiques utilisées par l'agent lorsque celui-ci joue avec un adversaire de politique connue πopp. Cette valeur est calculée de la manière suivante :
+
+<br>
+
+
+**104. Remark: expectimax is the analog of value iteration for MDPs.**
+
+&#10230; Remarque : expectimax est l'analogue de l'algorithme d'itération sur la valeur pour les MDPs.
+
+<br>
+
+
+**105. Minimax ― The goal of minimax policies is to find an optimal policy against an adversary by assuming the worst case, i.e. that the opponent is doing everything to minimize the agent's utility. It is done as follows:**
+
+&#10230; Minimax - Le but des politiques minimax est de trouver une politique optimale contre un adversaire que l'on assume effectuer toutes les pires actions, i.e. toutes celles qui minimisent l'utilité de l'agent. La valeur correspondante est calculée par :
+
+<br>
+
+
+**106. Remark: we can extract πmax and πmin from the minimax value Vminimax.**
+
+&#10230; Remarque : on peut déduire πmax et πmin à partir de la valeur minimax Vminimax.
+
+<br>
+
+
+**107. Minimax properties ― By noting V the value function, there are 3 properties around minimax to have in mind:**
+
+&#10230; Propriétés de minimax - En notant V la fonction de valeur, il y a 3 propriétés sur minimax qu'il faut avoir à l'esprit :
+
+<br>
+
+
+**108. Property 1: if the agent were to change its policy to any πagent, then the agent would be no better off.**
+
+&#10230; Propriété 1 : si l'agent changeait sa politique en un quelconque πagent, alors il ne s'en sortirait pas mieux.
+
+<br>
+
+
+**109. Property 2: if the opponent changes its policy from πmin to πopp, then he will be no better off.**
+
+&#10230; Propriété 2 : si son adversaire change sa politique de πmin à πopp, alors il ne s'en sortira pas mieux.
+
+<br>
+
+
+**110. Property 3: if the opponent is known to be not playing the adversarial policy, then the minimax policy might not be optimal for the agent.**
+
+&#10230; Propriété 3 : si l'on sait que son adversaire ne joue pas les pires actions possibles, alors la politique minimax peut ne pas être optimale pour l'agent.
+
+<br>
+
+
+**111. In the end, we have the following relationship:**
+
+&#10230; À la fin, on a la relation suivante :
+
+<br>
+
+
+**112. Speeding up minimax**
+
+&#10230; Accélération de minimax
+
+<br>
+
+
+**113. Evaluation function ― An evaluation function is a domain-specific and approximate estimate of the value Vminimax(s). It is noted Eval(s).**
+
+&#10230; Fonction d'évaluation - Une fonction d'évaluation estime de manière approximative la valeur Vminimax(s) selon les paramètres du problème. Elle est notée Eval(s).
+
+<br>
+
+
+**114. Remark: FutureCost(s) is an analogy for search problems.**
+
+&#10230; Remarque : l'analogue de cette fonction utilisé dans les problèmes de recherche est FutureCost(s).
+
+<br>
+
+
+**115. Alpha-beta pruning ― Alpha-beta pruning is a domain-general exact method optimizing the minimax algorithm by avoiding the unnecessary exploration of parts of the game tree. To do so, each player keeps track of the best value they can hope for (stored in α for the maximizing player and in β for the minimizing player). At a given step, the condition β<α means that the optimal path is not going to be in the current branch as the earlier player had a better option at their disposal.**
+
+&#10230; Élagage alpha-bêta - L'élagage alpha-bêta (en anglais alpha-beta pruning) est une méthode exacte d'optimisation employée sur l'algorithme de minimax et a pour but d'éviter l'exploration de parties inutiles de l'arbre de jeu. Pour ce faire, chaque joueur garde en mémoire la meilleure valeur qu'il puisse espérer (appelée α chez le joueur maximisant et β chez le joueur minimisant). À une étape donnée, la condition β<α signifie que le chemin optimal ne peut pas passer par la branche actuelle puisque le joueur qui précédait avait une meilleure option à sa disposition.
+
+<br>
+
+
+**116. TD learning ― Temporal difference (TD) learning is used when we don't know the transitions/rewards. The value is based on exploration policy. To be able to use it, we need to know rules of the game Succ(s,a). For each (s,a,r,s′), the update is done as follows:**
+
+&#10230; TD learning - L'apprentissage par différence de temps (en anglais temporal difference learning ou TD learning) est une méthode utilisée lorsque l'on ne connait pas les transitions/récompenses. La valeur et alors basée sur la politique d'exploration. Pour pouvoir l'utiliser, on a besoin de connaître les règles du jeu Succ(s,a). Pour chaque (s,a,r,s′), la mise à jour des coefficients est faite de la manière suivante :
+
+<br>
+
+
+**117. Simultaneous games**
+
+&#10230; Jeux simultanés
+
+<br>
+
+
+**118. This is the contrary of turn-based games, where there is no ordering on the player's moves.**
+
+&#10230; Ce cas est opposé aux jeux joués tour à tour. Il n'y a pas d'ordre prédéterminé sur le mouvement du joueur.
+
+<br>
+
+
+**119. Single-move simultaneous game ― Let there be two players A and B, with given possible actions. We note V(a,b) to be A's utility if A chooses action a, B chooses action b. V is called the payoff matrix.**
+
+&#10230; Jeu simultané à un mouvement - Soient deux joueurs A et B, munis de possibles actions. On note V(a,b) l'utilité de A si A choisit l'action a et B l'action b. V est appelée la matrice de profit (en anglais payoff matrix).
+
+<br>
+
+
+**120. [Strategies ― There are two main types of strategies:, A pure strategy is a single action:, A mixed strategy is a probability distribution over actions:]**
+
+&#10230; [Stratégies - Il y a principalement deux types de stratégies :, Une stratégie pure est une seule action, Une stratégie mixte est une loi de probabilité sur les actions :]
+
+<br>
+
+
+**121. Game evaluation ― The value of the game V(πA,πB) when player A follows πA and player B follows πB is such that:**
+
+&#10230; Évaluation de jeu - La valeur d'un jeu V(πA,πB) quand le joueur A suit πA et le joueur B suit πB est telle que :
+
+<br>
+
+
+**122. Minimax theorem ― By noting πA,πB ranging over mixed strategies, for every simultaneous two-player zero-sum game with a finite number of actions, we have:**
+
+&#10230; Théorème Minimax - Soient πA et πB des stratégies mixtes. Pour chaque jeu à somme nulle à deux joueurs ayant un nombre fini d'actions, on a :
+
+<br>
+
+
+**123. Non-zero-sum games**
+
+&#10230; Jeux à somme non nulle
+
+<br>
+
+
+**124. Payoff matrix ― We define Vp(πA,πB) to be the utility for player p.**
+
+&#10230; Matrice de profit - On définit Vp(πA,πB) l'utilité du joueur p.
+
+<br>
+
+
+**125. Nash equilibrium ― A Nash equilibrium is (π∗A,π∗B) such that no player has an incentive to change its strategy. We have:**
+
+&#10230; Équilibre de Nash - Un équilibre de Nash est défini par (π∗A,π∗B) tel qu'aucun joueur n'a d'intérêt de changer sa stratégie. On a :
+
+<br>
+
+
+**126. and**
+
+&#10230; et
+
+<br>
+
+
+**127. Remark: in any finite-player game with finite number of actions, there exists at least one Nash equilibrium.**
+
+&#10230; Remarque : dans un jeu à nombre de joueurs et d'actions finis, il existe au moins un équilibre de Nash.
+
+<br>
+
+
+**128. [Tree search, Backtracking search, Breadth-first search, Depth-first search, Iterative deepening]**
+
+&#10230; [Parcours d'arbre, Retour sur trace, Parcours en largeur, Parcours en profondeur, Approfondissement itératif]
+
+<br>
+
+
+**129. [Graph search, Dynamic programming, Uniform cost search]**
+
+&#10230; [Parcours de graphe, Programmation dynamique, Recherche à coût uniforme]
+
+<br>
+
+
+**130. [Learning costs, Structured perceptron]**
+
+&#10230; [Apprendre les coûts, Perceptron structuré]
+
+<br>
+
+
+**131. [A star search, Heuristic function, Algorithm, Consistency, correctness, Admissibility, efficiency]**
+
+&#10230; [A étoile, Fonction heuristique, Algorithme, Consistance, Correction, Admissibilité, Efficacité]
+
+<br>
+
+
+**132. [Relaxation, Relaxed search problem, Relaxed heuristic, Max heuristic]**
+
+&#10230; [Relaxation, Relaxation d'un problème de recherche, Relaxation d'une heuristique, Heuristique max]
+
+<br>
+
+
+**133. [Markov decision processes, Overview, Policy evaluation, Value iteration, Transitions, rewards]**
+
+&#10230; [Processus de décision markovien, Aperçu, Évaluation d'une politique, Itération sur la valeur, Transitions, Récompenses]
+
+<br>
+
+
+**134. [Game playing, Expectimax, Minimax, Speeding up minimax, Simultaneous games, Non-zero-sum games]**
+
+&#10230; [Jeux, Expectimax, Minimax, Accélération de minimax, Jeux simultanés, Jeux à somme non nulle]
+
+<br>
+
+
+**135. View PDF version on GitHub**
+
+&#10230; Voir la version PDF sur GitHub.
+
+<br>
+
+
+**136. Original authors**
+
+&#10230; Auteurs d'origine.
+
+<br>
+
+
+**137. Translated by X, Y and Z**
+
+&#10230; Traduit de l'anglais par X, Y et Z.
+
+<br>
+
+
+**138. Reviewed by X, Y and Z**
+
+&#10230; Revu par X, Y et Z.
+
+<br>
+
+
+**139. By X and Y**
+
+&#10230; De X et Y.
+
+<br>
+
+
+**140. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230; Les pense-bêtes d'intelligence artificielle sont maintenant disponibles en français !
diff --git a/fr/cs-221-variables-models.md b/fr/cs-221-variables-models.md
new file mode 100644
index 000000000..9c802583b
--- /dev/null
+++ b/fr/cs-221-variables-models.md
@@ -0,0 +1,617 @@
+**Variables-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-variables-models)
+
+<br>
+
+**1. Variables-based models with CSP and Bayesian networks**
+
+&#10230; Modèles basés sur les variables : CSP et réseaux bayésiens
+
+<br>
+
+
+**2. Constraint satisfaction problems**
+
+&#10230; Problèmes de satisfaction de contraintes
+
+<br>
+
+
+**3. In this section, our objective is to find maximum weight assignments of variable-based models. One advantage compared to states-based models is that these algorithms are more convenient to encode problem-specific constraints.**
+
+&#10230; Dans cette section, notre but est de trouver des affectations de poids maximisants dans des problèmes impliquant des modèles basés sur les variables. Un avantage comparé aux modèles basés sur les états est que ces algorithmes sont plus commodes lorsqu'il s'agit de transcrire des contraintes spécifiques à certains problèmes.
+
+<br>
+
+
+**4. Factor graphs**
+
+&#10230; Graphes de facteurs
+
+<br>
+
+
+**5. Definition ― A factor graph, also referred to as a Markov random field, is a set of variables X=(X1,...,Xn) where Xi∈Domaini and m factors f1,...,fm with each fj(X)⩾0.**
+
+&#10230; Définition - Un graphe de facteurs, aussi appelé champ aléatoire de Markov, est un ensemble de variables X=(X1,...,Xn) où Xi∈Domaini muni de m facteurs f1,...,fm où chaque fj(X)⩾0.
+
+<br>
+
+
+**6. Domain**
+
+&#10230; Domaine
+
+<br>
+
+
+**7. Scope and arity ― The scope of a factor fj is the set of variables it depends on. The size of this set is called the arity.**
+
+&#10230; Arité - Le nombre de variables dépendant d'un facteur fj est appelé son arité.
+
+<br>
+
+
+**8. Remark: factors of arity 1 and 2 are called unary and binary respectively.**
+
+&#10230; Remarque : les facteurs d'arité 1 et 2 sont respectivement appelés unaire et binaire.
+
+<br>
+
+
+**9. Assignment weight ― Each assignment x=(x1,...,xn) yields a weight Weight(x) defined as being the product of all factors fj applied to that assignment. Its expression is given by:**
+
+&#10230; Affectation de poids - Chaque affectation x=(x1,...,xn) donne un poids Weight(x) défini comme étant le produit de tous les facteurs fj appliqués à cette affectation. Son expression est donnée par :
+
+<br>
+
+
+**10. Constraint satisfaction problem ― A constraint satisfaction problem (CSP) is a factor graph where all factors are binary; we call them to be constraints:**
+
+&#10230; Problème de satisfaction de contraintes - Un problème de satisfaction de contraintes (en anglais constraint satisfaction problem ou CSP) est un graphe de facteurs où tous les facteurs sont binaires ; on les appelle "contraintes".
+
+<br>
+
+
+**11. Here, the constraint j with assignment x is said to be satisfied if and only if fj(x)=1.**
+
+&#10230; Ici, on dit que l'affectation x satisfait la contrainte j si et seulement si fj(x)=1.
+
+<br>
+
+
+**12. Consistent assignment ― An assignment x of a CSP is said to be consistent if and only if Weight(x)=1, i.e. all constraints are satisfied.**
+
+&#10230; Affectation consistante - Une affectation x d'un CSP est dite consistante si et seulement si Weight(x)=1, i.e. toutes les contraintes sont satisfaites.
+
+<br>
+
+
+**13. Dynamic ordering**
+
+&#10230; Mise en ordre dynamique
+
+<br>
+
+
+**14. Dependent factors ― The set of dependent factors of variable Xi with partial assignment x is called D(x,Xi), and denotes the set of factors that link Xi to already assigned variables.**
+
+&#10230; Facteurs dépendants - L'ensemble des facteurs dépendants de la variable Xi dont l'affectation partielle est x est appelé D(x,Xi) et désigne l'ensemble des facteurs liant Xi à des variables déjà affectées.
+
+<br>
+
+
+**15. Backtracking search ― Backtracking search is an algorithm used to find maximum weight assignments of a factor graph. At each step, it chooses an unassigned variable and explores its values by recursion. Dynamic ordering (i.e. choice of variables and values) and lookahead (i.e. early elimination of inconsistent options) can be used to explore the graph more efficiently, although the worst-case runtime stays exponential: O(|Domain|n).**
+
+&#10230; Recherche avec retour sur trace - L'algorithme de recherche avec retour sur trace (en anglais backtracking search) est utilisé pour trouver l'affectation de poids maximum d'un graphe de facteurs. À chaque étape, une variable non assignée est choisie et ses valeurs sont explorées par récursivité. On peut utiliser un processus de mise en ordre dynamique sur le choix des variables et valeurs et/ou d'anticipation (i.e. élimination précoce d'options non consistantes) pour explorer le graphe de manière plus efficace. La complexité temporelle dans tous les cas reste néanmoins exponentielle : O(|Domaine|n).
+
+<br>
+
+
+**16. [Forward checking ― It is a one-step lookahead heuristic that preemptively removes inconsistent values from the domains of neighboring variables. It has the following characteristics:, After assigning a variable Xi, it eliminates inconsistent values from the domains of all its neighbors., If any of these domains becomes empty, we stop the local backtracking search., If we un-assign a variable Xi, we have to restore the domain of its neighbors.]**
+
+&#10230; [Vérification en avant - La vérification en avant (forward checking en anglais) est une heuristique d'anticipation à une étape qui enlève des variables voisines les valeurs impossibles de manière préemptive. Cette méthode a les caractéristiques suivantes :, Après l'affectation d'une variable Xi, les valeurs non consistantes sont éliminées du domaine de tous ses voisins., Si l'un de ces domaines devient vide, la recherche locale s'arrête., Si l'on enlève l'affectation d'une valeur Xi, on doit restaurer le domaine de ses voisins.]
+
+<br>
+
+
+**17. Most constrained variable ― It is a variable-level ordering heuristic that selects the next unassigned variable that has the fewest consistent values. This has the effect of making inconsistent assignments to fail earlier in the search, which enables more efficient pruning.**
+
+&#10230; Variable la plus contrainte - L'heuristique de la variable la plus contrainte (en anglais most constrained variable ou MCV) sélectionne la prochaine variable sans affectation ayant le moins de valeurs consistantes. Cette procédure a pour effet de faire échouer les affectations impossibles plus tôt dans la recherche, permettant un élagage plus efficace.
+
+<br>
+
+
+**18. Least constrained value ― It is a value-level ordering heuristic that assigns the next value that yields the highest number of consistent values of neighboring variables. Intuitively, this procedure chooses first the values that are most likely to work.**
+
+&#10230; Valeur la moins contraignante - L'heuristique de la valeur la moins contraignante (en anglais least constrained value ou LCV) sélectionne pour une variable donnée la prochaine valeur maximisant le nombre de valeurs consistantes chez les variables voisines. De manière intuitive, on peut dire que cette procédure choisit en premier les valeurs qui sont le plus susceptible de marcher.
+
+<br>
+
+
+**19. Remark: in practice, this heuristic is useful when all factors are constraints.**
+
+&#10230; Remarque : en pratique, cette heuristique est utile quand tous les facteurs sont des contraintes.
+
+<br>
+
+
+**20. The example above is an illustration of the 3-color problem with backtracking search coupled with most constrained variable exploration and least constrained value heuristic, as well as forward checking at each step.**
+
+&#10230; L'exemple ci-dessus est une illustration du problème de coloration de graphe à 3 couleurs en utilisant l'algorithme de recherche avec retour sur trace couplé avec les heuristiques de MCV, de LCV ainsi que de vérification en avant à chaque étape.
+
+<br>
+
+
+**21. [Arc consistency ― We say that arc consistency of variable Xl with respect to Xk is enforced when for each xl∈Domainl:, unary factors of Xl are non-zero, there exists at least one xk∈Domaink such that any factor between Xl and Xk is non-zero.]**
+
+&#10230; [Arc-consistance - On dit que l'arc-consistance de la variable Xl par rapport à Xk est vérifiée lorsque pour tout xl∈Domainl :, les facteurs unaires de Xl sont non-nuls, il existe au moins un xk∈Domaink tel que n'importe quel facteur entre Xl et Xk est non nul.]
+
+<br>
+
+
+**22. AC-3 ― The AC-3 algorithm is a multi-step lookahead heuristic that applies forward checking to all relevant variables. After a given assignment, it performs forward checking and then successively enforces arc consistency with respect to the neighbors of variables for which the domain change during the process.**
+
+&#10230; AC-3 - L'algorithme d'AC-3 est une heuristique qui applique le principe de vérification en avant à toutes les variables susceptibles d'être concernées. Après l'affectation d'une variable, cet algorithme effectue une vérification en avant et applique successivement l'arc-consistance avec tous les voisins de variables pour lesquels le domaine change.
+
+<br>
+
+
+**23. Remark: AC-3 can be implemented both iteratively and recursively.**
+
+&#10230; Remarque : AC-3 peut être codé de manière itérative ou récursive.
+
+<br>
+
+
+**24. Approximate methods**
+
+&#10230; Méthodes approximatives
+
+<br>
+
+
+**25. Beam search ― Beam search is an approximate algorithm that extends partial assignments of n variables of branching factor b=|Domain| by exploring the K top paths at each step. The beam size K∈{1,...,bn} controls the tradeoff between efficiency and accuracy. This algorithm has a time complexity of O(n⋅Kblog(Kb)).**
+
+&#10230; Recherche en faisceau - L'algorithme de recherche en faisceau (en anglais beam search) est une technique approximative qui étend les affectations partielles de n variables de facteur de branchement b=|Domain| en explorant les K meilleurs chemins qui s'offrent à chaque étape. La largeur du faisceau K∈{1,...,bn} détermine la balance entre efficacité et précision de l'algorithme. Sa complexité en temps est de O(n⋅Kblog(Kb)).
+
+<br>
+
+
+**26. The example below illustrates a possible beam search of parameters K=2, b=3 and n=5.**
+
+&#10230; L'exemple ci-dessous illustre une recherche en faisceau de paramètres K=2, b=3 et n=5.
+
+<br>
+
+
+**27. Remark: K=1 corresponds to greedy search whereas K→+∞ is equivalent to BFS tree search.**
+
+&#10230; Remarque : K=1 correspond à la recherche gloutonne alors que K→+∞ est équivalent à effectuer un parcours en largeur.
+
+<br>
+
+
+**28. Iterated conditional modes ― Iterated conditional modes (ICM) is an iterative approximate algorithm that modifies the assignment of a factor graph one variable at a time until convergence. At step i, we assign to Xi the value v that maximizes the product of all factors connected to that variable.**
+
+&#10230; Modes conditionnels itérés - L'algorithme des modes conditionnels itérés (en anglais iterated conditional modes ou ICM) est une technique itérative et approximative qui modifie l'affectation d'un graphe de facteurs une variable à la fois jusqu'à convergence. À l'étape i, Xi prend la valeur v qui maximise le produit de tous les facteurs connectés à cette variable.
+
+<br>
+
+
+**29. Remark: ICM may get stuck in local minima.**
+
+&#10230; Remarque : il est possible qu'ICM reste bloqué dans un minimum local.
+
+<br>
+
+
+**30. [Gibbs sampling ― Gibbs sampling is an iterative approximate method that modifies the assignment of a factor graph one variable at a time until convergence. At step i:, we assign to each element u∈Domaini a weight w(u) that is the product of all factors connected to that variable, we sample v from the probability distribution induced by w and assign it to Xi.]**
+
+&#10230; [Échantillonnage de Gibbs - La méthode d'échantillonnage de Gibbs (en anglais Gibbs sampling) est une technique itérative et approximative qui modifie les affectations d'un graphe de facteurs une variable à la fois jusqu'à convergence. À l'étape i :, on assigne à chaque élément u∈Domaini un poids w(u) qui est le produit de tous les facteurs connectés à cette variable, on échantillonne v de la loi de probabilité engendrée par w et on l'associe à Xi.]
+
+<br>
+
+
+**31. Remark: Gibbs sampling can be seen as the probabilistic counterpart of ICM. It has the advantage to be able to escape local minima in most cases.**
+
+&#10230; Remarque : la méthode d'échantillonnage de Gibbs peut être vue comme étant la version probabiliste de ICM. Cette méthode a l'avantage de pouvoir échapper aux potentiels minimum locaux dans la plupart des situations.
+
+<br>
+
+
+**32. Factor graph transformations**
+
+&#10230; Transformations sur les graphes de facteurs
+
+<br>
+
+
+**33. Independence ― Let A,B be a partitioning of the variables X. We say that A and B are independent if there are no edges between A and B and we write:**
+
+&#10230; Indépendance - Soit A, B une partition des variables X. On dit que A et B sont indépendants s'il n'y a pas d'arête connectant A et B et on écrit :
+
+<br>
+
+
+**34. Remark: independence is the key property that allows us to solve subproblems in parallel.**
+
+&#10230; Remarque : l'indépendance est une propriété importante car elle nous permet de décomposer la situation en sous-problèmes que l'on peut résoudre en parallèle.
+
+<br>
+
+
+**35. Conditional independence ― We say that A and B are conditionally independent given C if conditioning on C produces a graph in which A and B are independent. In this case, it is written:**
+
+&#10230; Indépendance conditionnelle - On dit que A et B sont conditionnellement indépendants par rapport à C si le fait de conditionner sur C produit un graphe dans lequel A et B sont indépendants. Dans ce cas, on écrit :
+
+<br>
+
+
+**36. [Conditioning ― Conditioning is a transformation aiming at making variables independent that breaks up a factor graph into smaller pieces that can be solved in parallel and can use backtracking. In order to condition on a variable Xi=v, we do as follows:, Consider all factors f1,...,fk that depend on Xi, Remove Xi and f1,...,fk, Add gj(x) for j∈{1,...,k} defined as:]**
+
+&#10230; [Conditionnement - Le conditionnement est une transformation visant à rendre des variables indépendantes et ainsi diviser un graphe de facteurs en pièces plus petites qui peuvent être traitées en parallèle et utiliser le retour sur trace. Pour conditionner par rapport à une variable Xi=v, on :, considère toues les facteurs f1,...,fk qui dépendent de Xi, enlève Xi et f1,...,fk, ajoute gj(x) pour j∈{1,...,k} défini par :]
+
+<br>
+
+
+**37. Markov blanket ― Let A⊆X be a subset of variables. We define MarkovBlanket(A) to be the neighbors of A that are not in A.**
+
+&#10230; Couverture de Markov - Soit A⊆X une partie des variables. On définit MarkovBlanket(A) comme étant les voisins de A qui ne sont pas dans A.
+
+<br>
+
+
+**38. Proposition ― Let C=MarkovBlanket(A) and B=X∖(A∪C). Then we have:**
+
+&#10230; Proposition - Soit C=MarkovBlanket(A) et B=X∖(A∪C). On a alors :
+
+<br>
+
+
+**39. [Elimination ― Elimination is a factor graph transformation that removes Xi from the graph and solves a small subproblem conditioned on its Markov blanket as follows:, Consider all factors fi,1,...,fi,k that depend on Xi, Remove Xi
+and fi,1,...,fi,k, Add fnew,i(x) defined as:]**
+
+&#10230; [Élimination - L'élimination est une transformation consistant à enlever Xi d'un graphe de facteurs pour ensuite résoudre un sous-problème conditionné sur sa couverture de Markov où l'on :, considère tous les facteurs fi,1,...,fi,k qui dépendent de Xi, enlève Xi et fi,1,...,fi,k, ajoute fnew,i(x) défini par :]
+
+<br>
+
+
+**40. Treewidth ― The treewidth of a factor graph is the maximum arity of any factor created by variable elimination with the best variable ordering. In other words,**
+
+&#10230; Largeur arborescente - La largeur arborescente (en anglais treewidth) d'un graphe de facteurs est l'arité maximum de n'importe quel facteur créé par élimination avec le meilleur ordre de variable. En d'autres termes,
+
+<br>
+
+
+**41. The example below illustrates the case of a factor graph of treewidth 3.**
+
+&#10230; L'exemple ci-dessous illustre le cas d'un graphe de facteurs ayant une largeur arborescente égale à 3.
+
+<br>
+
+
+**42. Remark: finding the best variable ordering is a NP-hard problem.**
+
+&#10230; Remarque : trouver le meilleur ordre de variable est un problème NP-difficile.
+
+<br>
+
+
+**43. Bayesian networks**
+
+&#10230; Réseaux bayésiens
+
+<br>
+
+
+**44. In this section, our goal will be to compute conditional probabilities. What is the probability of a query given evidence?**
+
+&#10230; Dans cette section, notre but est de calculer des probabilités conditionnelles. Quelle est la probabilité d'un événement étant donné des observations ?
+
+<br>
+
+
+**45. Introduction**
+
+&#10230; Introduction
+
+<br>
+
+
+**46. Explaining away ― Suppose causes C1 and C2 influence an effect E. Conditioning on the effect E and on one of the causes (say C1) changes the probability of the other cause (say C2). In this case, we say that C1 has explained away C2.**
+
+&#10230; Explication - Supposons que les causes C1 et C2 influencent un effet E. Le conditionnement sur l'effet E et une des causes (disons C1) change la probabilité de l'autre cause (disons C2). Dans ce cas, on dit que C1 a expliqué C2.
+
+<br>
+
+
+**47. Directed acyclic graph ― A directed acyclic graph (DAG) is a finite directed graph with no directed cycles.**
+
+&#10230; Graphe orienté acyclique - Un graphe orienté acyclique (en anglais directed acyclic graph ou DAG) est un graphe orienté fini sans cycle orienté.
+
+<br>
+
+
+**48. Bayesian network ― A Bayesian network is a directed acyclic graph (DAG) that specifies a joint distribution over random variables X=(X1,...,Xn) as a product of local conditional distributions, one for each node:**
+
+&#10230; Réseau bayésien - Un réseau bayésien (en anglais Bayesian network) est un DAG qui définit une loi de probabilité jointe sur les variables aléatoires X=(X1,...,Xn) comme étant le produit des lois de probabilités conditionnelles locales (une pour chaque noeud) :
+
+<br>
+
+
+**49. Remark: Bayesian networks are factor graphs imbued with the language of probability.**
+
+&#10230; Remarque : les réseaux bayésiens sont des graphes de facteurs imprégnés de concepts de probabilité.
+
+<br>
+
+
+**50. Locally normalized ― For each xParents(i), all factors are local conditional distributions. Hence they have to satisfy:**
+
+&#10230; Normalisation locale - Pour chaque xParents(i), tous les facteurs sont localement des lois de probabilité conditionnelles. Elles doivent donc vérifier :
+
+<br>
+
+
+**51. As a result, sub-Bayesian networks and conditional distributions are consistent.**
+
+&#10230; De ce fait, les sous-réseaux bayésiens et les distributions conditionnelles sont consistants.
+
+<br>
+
+
+**52. Remark: local conditional distributions are the true conditional distributions.**
+
+&#10230; Remarque : les lois locales de probabilité conditionnelles sont de vraies lois de probabilité conditionnelles.
+
+<br>
+
+
+**53. Marginalization ― The marginalization of a leaf node yields a Bayesian network without that node.**
+
+&#10230; Marginalisation - La marginalisation d'un noeud sans enfant entraine un réseau bayésian sans ce noeud.
+
+<br>
+
+
+**54. Probabilistic programs**
+
+&#10230; Programmes probabilistes
+
+<br>
+
+
+**55. Concept ― A probabilistic program randomizes variables assignment. That way, we can write down complex Bayesian networks that generate assignments without us having to explicitly specify associated probabilities.**
+
+&#10230; Concept - Un programme probabiliste rend aléatoire l'affectation de variables. De ce fait, on peut imaginer des réseaux bayésiens compliqués pour la génération d'affectations sans avoir à écrire de manière explicite les probabilités associées.
+
+<br>
+
+
+**56. Remark: examples of probabilistic programs include Hidden Markov model (HMM), factorial HMM, naive Bayes, latent Dirichlet allocation, diseases and symptoms and stochastic block models.**
+
+&#10230; Remarque : quelques exemples de programmes probabilistes incluent parmi d'autres le modèle de Markov caché (en anglais hidden Markov model ou HMM), HMM factoriel, le modèle bayésien naïf (en anglais naive Bayes), l'allocation de Dirichlet latente (en anglais latent Dirichlet allocation ou LDA), le modèle à blocs stochastiques (en anglais stochastic block model).
+
+<br>
+
+
+**57. Summary ― The table below summarizes the common probabilistic programs as well as their applications:**
+
+&#10230; Récapitulatif - La table ci-dessous résume les programmes probabilistes les plus fréquents ainsi que leur champ d'application associé :
+
+<br>
+
+
+**58. [Program, Algorithm, Illustration, Example]**
+
+&#10230; [Programme, Algorithme, Illustration, Exemple]
+
+<br>
+
+
+**59. [Markov Model, Hidden Markov Model (HMM), Factorial HMM, Naive Bayes, Latent Dirichlet Allocation (LDA)]**
+
+&#10230; [Modèle de Markov, Modèle de Markov caché (HMM), HMM factoriel, Bayésien naïf, Allocation de Dirichlet latente (LDA)]
+
+<br>
+
+
+**60. [Generate, distribution]**
+
+&#10230; [Génère, distribution]
+
+<br>
+
+
+**61. [Language modeling, Object tracking, Multiple object tracking, Document classification, Topic modeling]**
+
+&#10230; [Modélisation du langage, Suivi d'objet, Suivi de plusieurs objets, Classification de document, Modélisation de sujet]
+
+<br>
+
+
+**62. Inference**
+
+&#10230; Inférence
+
+<br>
+
+
+**63. [General probabilistic inference strategy ― The strategy to compute the probability P(Q|E=e) of query Q given evidence E=e is as follows:, Step 1: Remove variables that are not ancestors of the query Q or the evidence E by marginalization, Step 2: Convert Bayesian network to factor graph, Step 3: Condition on the evidence E=e, Step 4: Remove nodes disconnected from the query Q by marginalization, Step 5: Run a probabilistic inference algorithm (manual, variable elimination, Gibbs sampling, particle filtering)]**
+
+&#10230; [Stratégie générale pour l'inférence probabiliste - La stratégie que l'on utilise pour calculer la probabilité P(Q|E=e) d'une requête Q étant donnée l'observation E=e est la suivante :, Étape 1 : on enlève les variables qui ne sont pas les ancêtres de la requête Q ou de l'observation E par marginalisation, Étape 2 : on convertit le réseau bayésien en un graphe de facteurs, Étape 3 : on conditionne sur l'observation E=e, Étape 4 : on enlève les noeuds déconnectés de la requête Q par marginalisation, Étape 5 : on lance un algorithme d'inférence probabiliste (manuel, élimination de variables, échantillonnage de Gibbs, filtrage particulaire)]
+
+<br>
+
+
+**64. Forward-backward algorithm ― This algorithm computes the exact value of P(H=hk|E=e) (smoothing query) for any k∈{1,...,L} in the case of an HMM of size L. To do so, we proceed in 3 steps:**
+
+&#10230; Algorithme progressif-rétrogressif - L'algorithme progressif-rétrogressif (en anglais forward-backward) calcule la valeur exacte de P(H=hk|E=e) pour chaque k∈{1,...,L} dans le cas d'un HMM de taille L. Pour ce faire, on procède en 3 étapes :
+
+<br>
+
+
+**65. Step 1: for ..., compute ...**
+
+&#10230; Étape 1 : pour ..., calculer ...
+
+<br>
+
+
+**66. with the convention F0=BL+1=1. From this procedure and these notations, we get that**
+
+&#10230; avec la convention F0=BL+1=1. À partir de cette procédure et avec ces notations, on obtient
+
+<br>
+
+
+**67. Remark: this algorithm interprets each assignment to be a path where each edge hi−1→hi is of weight p(hi|hi−1)p(ei|hi).**
+
+&#10230; Remarque : cet algorithme interprète une affectation comme étant un chemin où chaque arête hi−1→hi a un poids p(hi|hi−1)p(ei|hi).
+
+<br>
+
+
+**68. [Gibbs sampling ― This algorithm is an iterative approximate method that uses a small set of assignments (particles) to represent a large probability distribution. From a random assignment x, Gibbs sampling performs the following steps for i∈{1,...,n} until convergence:, For all u∈Domaini, compute the weight w(u) of assignment x where Xi=u, Sample v from the probability distribution induced by w: v∼P(Xi=v|X−i=x−i), Set Xi=v]**
+
+&#10230; [Échantillonnage de Gibbs - L'algorithme d'échantillonnage de Gibbs (en anglais Gibbs sampling) est une méthode itérative et approximative qui utilise un petit ensemble d'affectations (particules) pour représenter une loi de probabilité. Pour une affectation aléatoire x, l'échantillonnage de Gibbs effectue les étapes suivantes pour i∈{1,...,n} jusqu'à convergence :, Pour tout u∈Domaini, on calcule le poids w(u) de l'affectation x où Xi=u, On échantillonne v de la loi de probabilité engendrée par w : v∼P(Xi=v|X−i=x−i), On pose Xi=v]
+
+<br>
+
+
+**69. Remark: X−i denotes X∖{Xi} and x−i represents the corresponding assignment.**
+
+&#10230; Remarque X-i veut dire X∖{Xi} et x−i représente l'affectation correspondante.
+
+<br>
+
+
+**70. [Particle filtering ― This algorithm approximates the posterior density of state variables given the evidence of observation variables by keeping track of K particles at a time. Starting from a set of particles C of size K, we run the following 3 steps iteratively:, Step 1: proposal - For each old particle xt−1∈C, sample x from the transition probability distribution p(x|xt−1) and add x to a set C′., Step 2: weighting - Weigh each x of the set C′ by w(x)=p(et|x), where et is the evidence observed at time t., Step 3: resampling - Sample K elements from the set C′ using the probability distribution induced by w and store them in C: these are the current particles xt.]**
+
+&#10230; [Filtrage particulaire - L'algorithme de filtrage particulaire (en anglais particle filtering) approxime la densité postérieure de variables d'états à partir des variables observées en suivant K particules à la fois. En commençant avec un ensemble de particules C de taille K, on répète les 3 étapes suivantes :, Étape 1 : proposition - Pour chaque particule xt−1∈C, on échantillonne x avec loi de probabilité p(x|xt−1) et on ajoute x à un ensemble C′., Étape 2 : pondération - On associe chaque x de l'ensemble C′ au poids w(x)=p(et|x), où et est l'observation vue à l'instant t. Étape 3 : ré-échantillonnage - On échantillonne K éléments de l'ensemble C´ en utilisant la loi de probabilité engendrée par w et on les met dans C : ce sont les particules actuelles xt.]
+
+<br>
+
+
+**71. Remark: a more expensive version of this algorithm also keeps track of past particles in the proposal step.**
+
+&#10230; Remarque : une version plus coûteuse de cet algorithme tient aussi compte des particules passée à l'étape de proposition.
+
+<br>
+
+
+**72. Maximum likelihood ― If we don't know the local conditional distributions, we can learn them using maximum likelihood.**
+
+&#10230; Maximum de vraisemblance - Si l'on ne connaît pas les lois de probabilité locales, on peut les trouver en utilisant le maximum de vraisemblance.
+
+<br>
+
+
+**73. Laplace smoothing ― For each distribution d and partial assignment (xParents(i),xi), add λ to countd(xParents(i),xi), then normalize to get probability estimates.**
+
+&#10230; Lissage de Laplace - Pour chaque loi de probabilité d et affectation partielle (xParents(i),xi), on ajoute λ à countd(xParents(i),xi) et on normalise ensuite pour obtenir des probabilités.
+
+<br>
+
+
+**74. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230; Espérance-maximisation - L'algorithme d'espérance-maximisation (en anglais expectation-maximization ou EM) est une méthode efficace utilisée pour estimer le paramètre θ via l'estimation du maximum de vraisemblance en construisant de manière répétée une borne inférieure de la vraisemblance (étape E) et en optimisant cette borne inférieure (étape M) :
+
+<br>
+
+
+**75. [E-step: Evaluate the posterior probability q(h) that each data point e came from a particular cluster h as follows:, M-step: Use the posterior probabilities q(h) as cluster specific weights on data points e to determine θ through maximum likelihood.]**
+
+&#10230; [Étape E : on évalue la probabilité postérieure q(h) que chaque point e vienne d'une partition particulière h avec :, Étape M : on utilise la probabilité postérieure q(h) en tant que poids de la partition h sur les points e pour déterminer θ via maximum de vraisemblance]
+
+<br>
+
+
+**76. [Factor graphs, Arity, Assignment weight, Constraint satisfaction problem, Consistent assignment]**
+
+&#10230; [Graphe de facteurs, Arité, Poids, Satisfaction de contraintes, Affectation consistante]
+
+<br>
+
+
+**77. [Dynamic ordering, Dependent factors, Backtracking search, Forward checking, Most constrained variable, Least constrained value]**
+
+&#10230; [Mise en ordre dynamique, Facteurs dépendants, Retour sur trace, Vérification en avant, Variable la plus contrainte, Valeur la moins contraignante]
+
+<br>
+
+
+**78. [Approximate methods, Beam search, Iterated conditional modes, Gibbs sampling]**
+
+&#10230; [Méthodes approximatives, Recherche en faisceau, Modes conditionnels itérés, Échantillonnage de Gibbs]
+
+<br>
+
+
+**79. [Factor graph transformations, Conditioning, Elimination]**
+
+&#10230; [Transformations de graphes de facteurs, Conditionnement, Élimination]
+
+<br>
+
+
+**80. [Bayesian networks, Definition, Locally normalized, Marginalization]**
+
+&#10230; [Réseaux bayésiens, Définition, Normalisé localement, Marginalisation]
+
+<br>
+
+
+**81. [Probabilistic program, Concept, Summary]**
+
+&#10230; [Programme probabiliste, Concept, Récapitulatif]
+
+<br>
+
+
+**82. [Inference, Forward-backward algorithm, Gibbs sampling, Laplace smoothing]**
+
+&#10230; [Inférence, Algorithme progressif-rétrogressif, Échantillonnage de Gibbs, Lissage de Laplace]
+
+<br>
+
+
+**83. View PDF version on GitHub**
+
+&#10230; Voir la version PDF sur GitHub.
+
+<br>
+
+
+**84. Original authors**
+
+&#10230; Auteurs d'origine.
+
+<br>
+
+
+**85. Translated by X, Y and Z**
+
+&#10230; Traduit de l'anglais par X, Y et Z.
+
+<br>
+
+
+**86. Reviewed by X, Y and Z**
+
+&#10230; Revu par X, Y et Z.
+
+<br>
+
+
+**87. By X and Y**
+
+&#10230; De X et Y.
+
+<br>
+
+
+**88. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230; Les pense-bêtes d'intelligence artificielle sont maintenant disponibles en français !
diff --git a/fr/cheatsheet-deep-learning.md b/fr/cs-229-deep-learning.md
similarity index 95%
rename from fr/cheatsheet-deep-learning.md
rename to fr/cs-229-deep-learning.md
index 4045d723c..56073a5e8 100644
--- a/fr/cheatsheet-deep-learning.md
+++ b/fr/cs-229-deep-learning.md
@@ -120,7 +120,7 @@
 
 **21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
 
-&#10230; Pré-requis de la couche convolutionelle ― Si l'on note W la taille du volume d'entrée, F la taille de la couche de neurones convolutionelle, P la quantité de zero padding, alors le nombre de neurones N qui tient dans un volume donné est tel que :
+&#10230; Pré-requis de la couche convolutionnelle ― Si l'on note W la taille du volume d'entrée, F la taille de la couche de neurones convolutionnelle, P la quantité de zero padding, alors le nombre de neurones N qui tient dans un volume donné est tel que :
 
 <br>
 
@@ -132,7 +132,7 @@
 
 **23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
 
-&#10230; Cela est normalement effectué après une couche fully-connected/couche convolutionelle et avant une couche de non-linéarité et a pour but de permettre un taux d'apprentissage plus grand et de réduire une dépendance trop forte à l'initialisation.
+&#10230; Cela est normalement effectué après une couche fully-connected/couche convolutionnelle et avant une couche de non-linéarité et a pour but de permettre un taux d'apprentissage plus grand et de réduire une dépendance trop forte à l'initialisation.
 
 <br>
 
diff --git a/fr/refresher-linear-algebra.md b/fr/cs-229-linear-algebra.md
similarity index 92%
rename from fr/refresher-linear-algebra.md
rename to fr/cs-229-linear-algebra.md
index 37329faa3..f1aea7efd 100644
--- a/fr/refresher-linear-algebra.md
+++ b/fr/cs-229-linear-algebra.md
@@ -42,7 +42,7 @@
 
 **8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
 
-&#10230; Matrice identitée ― La matrice identitée I∈Rn×n est une matrice carrée avec des 1 sur sa diagonale et des 0 partout ailleurs :
+&#10230; Matrice identité ― La matrice identité I∈Rn×n est une matrice carrée avec des 1 sur sa diagonale et des 0 partout ailleurs :
 
 <br>
 
@@ -150,7 +150,7 @@
 
 **26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
 
-&#10230; Trace ― La trace d'une matrice carée A, notée tr(A), est définie comme la somme de ses coefficients diagonaux:
+&#10230; Trace ― La trace d'une matrice carrée A, notée tr(A), est définie comme la somme de ses coefficients diagonaux:
 
 <br>
 
@@ -186,7 +186,7 @@
 
 **32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
 
-&#10230; Décomposition symmétrique ― Une matrice donnée A peut être exprimée en termes de ses parties symétrique et antisymétrique de la manière suivante :
+&#10230; Décomposition symétrique ― Une matrice donnée A peut être exprimée en termes de ses parties symétrique et antisymétrique de la manière suivante :
 
 <br>
 
@@ -252,7 +252,7 @@
 
 **43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
 
-&#10230; Remarque : de manière similaire, une matrice A est dite définie positive et est notée A≻0 si elle est semi-définie positive et que pour tout vector x non-nul, on a xTAx>0.
+&#10230; Remarque : de manière similaire, une matrice A est dite définie positive et est notée A≻0 si elle est semi-définie positive et que pour tout vecteur x non-nul, on a xTAx>0.
 
 <br>
 
@@ -264,7 +264,7 @@
 
 **45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230; Théorème spectral ― Soit A∈Rn×n. Si A est symmétrique, alors A est diagonalisable par une matrice orthogonale réelle U∈Rn×n. En notant Λ=diag(λ1,...,λn), on a :
+&#10230; Théorème spectral ― Soit A∈Rn×n. Si A est symétrique, alors A est diagonalisable par une matrice orthogonale réelle U∈Rn×n. En notant Λ=diag(λ1,...,λn), on a :
 
 <br>
 
@@ -300,7 +300,7 @@
 
 **51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
 
-&#10230; Hessienne ― Soit f:Rn→R une fonction et x∈Rn un vecteur. La hessienne de f par rapport à x est une matrice symmetrique n×n, notée ∇2xf(x), telle que : 
+&#10230; Hessienne ― Soit f:Rn→R une fonction et x∈Rn un vecteur. La hessienne de f par rapport à x est une matrice symétrique n×n, notée ∇2xf(x), telle que :
 
 <br>
 
diff --git a/fr/cheatsheet-machine-learning-tips-and-tricks.md b/fr/cs-229-machine-learning-tips-and-tricks.md
similarity index 99%
rename from fr/cheatsheet-machine-learning-tips-and-tricks.md
rename to fr/cs-229-machine-learning-tips-and-tricks.md
index d74182df0..2adf1db50 100644
--- a/fr/cheatsheet-machine-learning-tips-and-tricks.md
+++ b/fr/cs-229-machine-learning-tips-and-tricks.md
@@ -198,7 +198,7 @@
 
 **34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
-&#10230; [Réduit les coefficients à 0, Bon pour la sélection de variables, Rend les coefficients plus petits, Compromis entre la selection de variables et la réduction de coefficients]
+&#10230; [Réduit les coefficients à 0, Bon pour la sélection de variables, Rend les coefficients plus petits, Compromis entre la sélection de variables et la réduction de coefficients]
 
 <br>
 
diff --git a/fr/refresher-probability.md b/fr/cs-229-probability.md
similarity index 98%
rename from fr/refresher-probability.md
rename to fr/cs-229-probability.md
index fe4562f80..8e407b9b2 100644
--- a/fr/refresher-probability.md
+++ b/fr/cs-229-probability.md
@@ -36,7 +36,7 @@
 
 **7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
 
-&#10230; Axiome 2 ― La probabilité qu'au moins un des évènements élementaires de tout l'univers se produise est 1, i.e.
+&#10230; Axiome 2 ― La probabilité qu'au moins un des évènements élémentaires de tout l'univers se produise est 1, i.e.
 
 <br>
 
@@ -120,7 +120,7 @@
 
 **21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
 
-&#10230; Variable aléatoire ― Une variable aléatoire, souvent notée X, est une fonction qui associe chaque élement de l'univers de probabilité à la droite des réels.
+&#10230; Variable aléatoire ― Une variable aléatoire, souvent notée X, est une fonction qui associe chaque élément de l'univers de probabilité à la droite des réels.
 
 <br>
 
diff --git a/fr/cheatsheet-supervised-learning.md b/fr/cs-229-supervised-learning.md
similarity index 96%
rename from fr/cheatsheet-supervised-learning.md
rename to fr/cs-229-supervised-learning.md
index 2f4850d1f..b79583323 100644
--- a/fr/cheatsheet-supervised-learning.md
+++ b/fr/cs-229-supervised-learning.md
@@ -42,7 +42,7 @@
 
 **8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
 
-&#10230; [Modèle discriminatif, Modèle génératif, But, Ce qui est appris, Illustration, Exemples]
+&#10230; [Modèle discriminant, Modèle génératif, But, Ce qui est appris, Illustration, Exemples]
 
 <br>
 
@@ -66,7 +66,7 @@
 
 **12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
 
-&#10230; Fonction de loss ― Une fonction de loss est une fonction L:(z,y)∈R×Y⟼L(z,y)∈R prennant comme entrée une valeur prédite z correspondant à une valeur réelle y, et nous renseigne sur la ressemblance de ces deux valeurs. Les fonctions de loss courantes sont récapitulées dans le tableau ci-dessous :
+&#10230; Fonction de loss ― Une fonction de loss est une fonction L:(z,y)∈R×Y⟼L(z,y)∈R prenant comme entrée une valeur prédite z correspondant à une valeur réelle y, et nous renseigne sur la ressemblance de ces deux valeurs. Les fonctions de loss courantes sont récapitulées dans le tableau ci-dessous :
 
 <br>
 
@@ -138,7 +138,7 @@
 
 **24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
 
-&#10230; Équations normales ― En notant X la matrice de design, la valeur de θ qui minimize la fonction de cost a une solution de forme fermée tel que :
+&#10230; Équations normales ― En notant X la matrice de design, la valeur de θ qui minimise la fonction de cost a une solution de forme fermée tel que :
 
 <br>
 
@@ -186,7 +186,7 @@
 
 **32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
 
-&#10230; Régression softmax ― Une régression softmax, aussi appelée un régression logistique multiclasse, est utilisée pour généraliser la régression logistique lorsqu'il y a plus de 2 classes à prédire. Par convention, on fixe θK=0, ce qui oblige le paramètre de Bernoulli ϕi de chaque classe i à être égal à :
+&#10230; Régression softmax ― Une régression softmax, aussi appelée un régression logistique multi-classe, est utilisée pour généraliser la régression logistique lorsqu'il y a plus de 2 classes à prédire. Par convention, on fixe θK=0, ce qui oblige le paramètre de Bernoulli ϕi de chaque classe i à être égal à :
 
 <br>
 
@@ -210,7 +210,7 @@
 
 **36. Here are the most common exponential distributions summed up in the following table:**
 
-&#10230; Les distributions exponentielles les plus communémment rencontrées sont récapitulées dans le tableau ci-dessous : 
+&#10230; Les distributions exponentielles les plus communément rencontrées sont récapitulées dans le tableau ci-dessous :
 
 <br>
 
@@ -324,7 +324,7 @@
 
 **55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
 
-&#10230; Un modèle génératif essaie d'abord d'apprendre comment les données sont générées en estimant P(x|y), nous permettant ensuite d'estimer P(y|x) par le biais du théorème de Bayes. 
+&#10230; Un modèle génératif essaie d'abord d'apprendre comment les données sont générées en estimant P(x|y), nous permettant ensuite d'estimer P(y|x) par le biais du théorème de Bayes.
 
 <br>
 
diff --git a/fr/cheatsheet-unsupervised-learning.md b/fr/cs-229-unsupervised-learning.md
similarity index 95%
rename from fr/cheatsheet-unsupervised-learning.md
rename to fr/cs-229-unsupervised-learning.md
index f64268a4b..7757f9539 100644
--- a/fr/cheatsheet-unsupervised-learning.md
+++ b/fr/cs-229-unsupervised-learning.md
@@ -12,7 +12,7 @@
 
 **3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
 
-&#10230; Motivation ― Le but de l'apprentissage non-supervisé est de trouver des formes cachées dans un jeu de données non-labelées {x(1),...,x(m)}.
+&#10230; Motivation ― Le but de l'apprentissage non-supervisé est de trouver des formes cachées dans un jeu de données non annotées {x(1),...,x(m)}.
 
 <br>
 
@@ -66,7 +66,7 @@
 
 **12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
 
-&#10230; M-step : Utiliser les probabilités postérieures Qi(z(i)) en tant que coefficients propres aux partitions sur les points x(i) pour ré-estimer séparemment chaque modèle de partition de la manière suivante :
+&#10230; M-step : Utiliser les probabilités postérieures Qi(z(i)) en tant que coefficients propres aux partitions sur les points x(i) pour ré-estimer séparément chaque modèle de partition de la manière suivante :
 
 <br>
 
@@ -102,7 +102,7 @@
 
 **18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
 
-&#10230; Fonction de distortion ― Pour voir si l'algorithme converge, on regarde la fonction de distortion définie de la manière suivante :
+&#10230; Fonction de distorsion ― Pour voir si l'algorithme converge, on regarde la fonction de distorsion définie de la manière suivante :
 
 <br>
 
@@ -192,7 +192,7 @@
 
 **33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230; Théorème spectral ― Soit A∈Rn×n. Si A est symmétrique, alors A est diagonalisable par une matrice réelle orthogonale U∈Rn×n. En notant Λ=diag(λ1,...,λn), on a :
+&#10230; Théorème spectral ― Soit A∈Rn×n. Si A est symétrique, alors A est diagonalisable par une matrice réelle orthogonale U∈Rn×n. En notant Λ=diag(λ1,...,λn), on a :
 
 <br>
 
@@ -222,7 +222,7 @@
 
 **38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
 
-&#10230; Étape 2 : Calculer Σ=1mm∑i=1x(i)x(i)T∈Rn×n, qui est symmétrique et aux valeurs propres réelles.
+&#10230; Étape 2 : Calculer Σ=1mm∑i=1x(i)x(i)T∈Rn×n, qui est symétrique et aux valeurs propres réelles.
 
 <br>
 
@@ -264,7 +264,7 @@
 
 **45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
 
-&#10230; Hypothèses ― On suppose que nos données x ont été générées par un vecteur source à n dimensions s=(s1,...,sn), où les si sont des variables aléatoires indépendantes, par le biais d'une matrice de mélange et inversible A de la manière suivante : 
+&#10230; Hypothèses ― On suppose que nos données x ont été générées par un vecteur source à n dimensions s=(s1,...,sn), où les si sont des variables aléatoires indépendantes, par le biais d'une matrice de mélange et inversible A de la manière suivante :
 
 <br>
 
@@ -294,4 +294,4 @@
 
 **50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
 
-&#10230; Par conséquent, l'algorithme du gradient stochastique est tel que pour chaque example de ensemble d'apprentissage x(i), on met à jour W de la manière suivante :
+&#10230; Par conséquent, l'algorithme du gradient stochastique est tel que pour chaque exemple de ensemble d'apprentissage x(i), on met à jour W de la manière suivante :
diff --git a/fr/cs-230-convolutional-neural-networks.md b/fr/cs-230-convolutional-neural-networks.md
new file mode 100644
index 000000000..29cca030e
--- /dev/null
+++ b/fr/cs-230-convolutional-neural-networks.md
@@ -0,0 +1,716 @@
+**Convolutional Neural Networks translation**
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230; Pense-bête de réseaux de neurones convolutionnels
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Apprentissage profond
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+&#10230; [Vue d'ensemble, Structure de l'architecture]
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230; [Types de couche, Convolution, Pooling, Fully connected]
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230; [Paramètres du filtre, Dimensions, Stride, Padding]
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230; [Réglage des paramètres, Compatibilité des paramètres, Complexité du modèle, Champ récepteur]
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230; [Fonction d'activation, Unité linéaire rectifiée, Softmax]
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230; [Détection d'objet, Types de modèle, Détection, Intersection sur union, Suppression non-max, YOLO, R-CNN]
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230; [Vérification/reconnaissance de visage, Apprentissage par coup, Réseau siamois, Loss triple]
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230; [Transfert de style de neurones, Activation, Matrice de style, Fonction de coût de style/contenu]
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230; [Architectures à astuces calculatoires, Generative Adversarial Net, ResNet, Inception Network]
+
+<br>
+
+
+**12. Overview**
+
+&#10230; Vue d'ensemble
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230; Architecture d'un CNN traditionnel ― Les réseaux de neurones convolutionnels (en anglais <i>Convolutional neural networks</i>), aussi connus sous le nom de CNNs, sont un type spécifique de réseaux de neurones qui sont généralement composés des couches suivantes :
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230; La couche convolutionnelle et la couche de pooling peuvent être ajustées en utilisant des paramètres qui sont décrites dans les sections suivantes.
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230; Types de couche
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230; Couche convolutionnelle (CONV) ― La couche convolutionnelle (en anglais <i>convolution layer</i>) (CONV) utilise des filtres qui scannent l'entrée I suivant ses dimensions en effectuant des opérations de convolution. Elle peut être réglée en ajustant la taille du filtre F et le stride S. La sortie O de cette opération est appelée *feature map* ou aussi *activation map*.
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230; Remarque : l'étape de convolution peut aussi être généralisée dans les cas 1D et 3D.
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230; Pooling (POOL) ― La couche de pooling (en anglais <i>pooling layer</i>) (POOL) est une opération de sous-échantillonnage typiquement appliquée après une couche convolutionnelle. En particulier, les types de pooling les plus populaires sont le max et l'average pooling, où les valeurs maximales et moyennes sont prises, respectivement.
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230; [Type, But, Illustration, Commentaires]
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230; [Max pooling, Average pooling, Chaque opération de pooling sélectionne la valeur maximale de la surface. Chaque opération de pooling sélectionne la valeur moyenne de la surface.]
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230; [Garde les caractéristiques détectées. Plus communément utilisé, Sous-échantillonne la <i>feature map</i>, Utilisé dans LeNet]
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230; Fully Connected (FC) ― La couche de fully connected (en anglais <i>fully connected layer</i>) (FC) s'applique sur une entrée préalablement aplatie où chaque entrée est connectée à tous les neurones. Les couches de fully connected sont typiquement présentes à la fin des architectures de CNN et peuvent être utilisées pour optimiser des objectifs tels que les scores de classe.
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230; Paramètres du filtre
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230; La couche convolutionnelle contient des filtres pour lesquels il est important de savoir comment ajuster ses paramètres.
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230; Dimensions d'un filtre ― Un filtre de taille F×F appliqué à une entrée contenant C canaux est un volume de taille F×F×C qui effectue des convolutions sur une entrée de taille I×I×C et qui produit un <i>feature map</i> de sortie (aussi appelé <i>activation map</i>) de taille O×O×1.
+
+<br>
+
+
+**26. Filter**
+
+&#10230; Filtre
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230; Remarque : appliquer K filtres de taille F×F engendre un <i>feature map</i> de sortie de taille O×O×K.
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230; Stride ― Dans le contexte d'une opération de convolution ou de pooling, la stride S est un paramètre qui dénote le nombre de pixels par lesquels la fenêtre se déplace après chaque opération.
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230; Zero-padding ― Le zero-padding est une technique consistant à ajouter P zeros à chaque côté des frontières de l'entrée. Cette valeur peut être spécifiée soit manuellement, soit automatiquement par le biais d'une des configurations détaillées ci-dessous :
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230; [Configuration, Valeur, Illustration, But, Valide, Pareil, Total]
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230; [Pas de padding, Enlève la dernière opération de convolution si les dimensions ne collent pas, Le padding tel que la feature map est de taille ⌈IS⌉, La taille de sortie est mathématiquement satisfaisante, Aussi appelé 'demi' padding, Padding maximum tel que les dernières convolutions sont appliquées sur les bords de l'entrée, Le filtre 'voit' l'entrée du début à la fin]
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230; Ajuster les paramètres
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230; Compatibilité des paramètres dans la couche convolutionnelle ― En notant I le côté du volume d'entrée, F la taille du filtre, P la quantité de zero-padding, S la stride, la taille O de la feature map de sortie suivant cette dimension est telle que :
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230; [Entrée, Filtre, Sortie]
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230; Remarque : on a souvent Pstart=Pend≜P, auquel cas on remplace Pstart+Pend par 2P dans la formule au-dessus.
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230; Comprendre la complexité du modèle ― Pour évaluer la complexité d'un modèle, il est souvent utile de déterminer le nombre de paramètres que l'architecture va avoir. Dans une couche donnée d'un réseau de neurones convolutionnels, on a :
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230; [Illustration, Taille d'entrée, Taille de sortie, Nombre de paramètres, Remarques]
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230; [Un paramètre de biais par filtre, Dans la plupart des cas, S<F, 2C est un choix commun pour K]
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230; [L'opération de pooling est effectuée pour chaque canal, Dans la plupart des cas, S=F]
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230; [L'entrée est aplatie, Un paramètre de biais par neurone, Le choix du nombre de neurones de FC est libre]
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230; Champ récepteur ― Le champ récepteur à la couche k est la surface notée Rk×Rk de l'entrée que chaque pixel de la k-ième <i>activation map</i> peut 'voir'. En notant Fj la taille du filtre de la couche j et Si la valeur de stride de la couche i et avec la convention S0=1, le champ récepteur à la couche k peut être calculé de la manière suivante :
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230; Dans l'exemple ci-dessous, on a F1=F2=3 et S1=S2=1, ce qui donne R2=1+2⋅1+2⋅1=5.
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230; Fonctions d'activation communément utilisées
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230; Unité linéaire rectifiée ― La couche d'unité linéaire rectifiée (en anglais <i>rectified linear unit layer</i>) (ReLU) est une fonction d'activation g qui est utilisée sur tous les éléments du volume. Elle a pour but d'introduire des complexités non-linéaires au réseau. Ses variantes sont récapitulées dans le tableau suivant :
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230; [ReLU, Leaky ReLU, ELU, avec]
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230; [Complexités non-linéaires interprétables d'un point de vue biologique, Répond au problème de <i>dying ReLU</i>, Dérivable partout]
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230; Softmax ― L'étape softmax peut être vue comme une généralisation de la fonction logistique qui prend comme argument un vecteur de scores x∈Rn et qui renvoie un vecteur de probabilités p∈Rn à travers une fonction softmax à la fin de l'architecture. Elle est définie de la manière suivante :
+
+<br>
+
+
+**48. where**
+
+&#10230; où
+
+<br>
+
+
+**49. Object detection**
+
+&#10230; Détection d'objet
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230; Types de modèles ― Il y a 3 principaux types d'algorithme de reconnaissance d'objet, pour lesquels la nature de ce qui est prédit est different. Ils sont décrits dans la table ci-dessous :
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230; [Classification d'image, Classification avec localisation, Détection]
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230; [Ours en peluche, Livre]
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230; [Classifie une image, Prédit la probabilité d'un objet, Détecte un objet dans une image, Prédit la probabilité de présence d'un objet et où il est situé, Peut détecter plusieurs objets dans une image, Prédit les probabilités de présence des objets et où ils sont situés]
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230; [CNN traditionnel, YOLO simplifié, R-CNN, YOLO, R-CNN]
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230; Détection ― Dans le contexte de la détection d'objet, des méthodes différentes sont utilisées selon si l'on veut juste localiser l'objet ou alors détecter une forme plus complexe dans l'image. Les deux méthodes principales sont résumées dans le tableau ci-dessous :
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230; [Détection de zone délimitante, Détection de forme complexe]
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230; [Détecte la partie de l'image où l'objet est situé, Détecte la forme ou les caractéristiques d'un objet (e.g. yeux), Plus granulaire]
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230; [Zone de centre (bx,by), hauteur bh et largeur bw, Points de référence (l1x,l1y), ..., (lnx,lny)]
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230; Intersection sur Union ― Intersection sur Union (en anglais <i>Intersection over Union</i>), aussi appelé IoU, est une fonction qui quantifie à quel point la zone délimitante prédite Bp est correctement positionnée par rapport à la zone délimitante vraie Ba. Elle est définie de la manière suivante :
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230; Remarque : on a toujours IoU∈[0,1]. Par convention, la prédiction Bp d'une zone délimitante est considérée comme étant satisfaisante si l'on a IoU(Bp,Ba)⩾0.5.
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230; Zone d'accroche ― La technique des zones d'accroche (en anglais <i>anchor boxing</i>) sert à prédire des zones délimitantes qui se chevauchent. En pratique, on permet au réseau de prédire plus d'une zone délimitante simultanément, où chaque zone prédite doit respecter une forme géométrique particulière. Par exemple, la première prédiction peut potentiellement être une zone rectangulaire d'une forme donnée, tandis qu'une seconde prédiction doit être une zone rectangulaire d'une autre forme.
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230; Suppression non-max ― La technique de suppression non-max (en anglais <i>non-max suppression</i>) a pour but d'enlever des zones délimitantes qui se chevauchent et qui prédisent un seul et même objet, en sélectionnant les zones les plus représentatives. Après avoir enlevé toutes les zones ayant une probabilité prédite de moins de 0.6, les étapes suivantes sont répétées pour éliminer les zones redondantes :
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230; [Pour une classe donnée, Étape 1 : Choisir la zone ayant la plus grande probabilité de prédiction., Étape 2 : Enlever toute zone ayant IoU⩾0.5 avec la zone choisie précédemment.]
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230; [Zones prédites, Sélection de la zone de probabilité maximum, Suppression des chevauchements d'une même classe, Zones délimitantes finales]
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230; YOLO ― L'algorithme You Only Look Once (YOLO) est un algorithme de détection d'objet qui fonctionne de la manière suivante :
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230; [Étape 1 : Diviser l'image d'entrée en une grille de taille G×G., Étape 2 : Pour chaque cellule, faire tourner un CNN qui prédit y de la forme suivante :, répété k fois]
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230; où pc est la probabilité de détecter un objet, bx,by,bh,bw sont les propriétés de la zone délimitante détectée, c1,...,cp est une représentation binaire (en anglais <i>one-hot representation</i>) de l'une des p classes détectée, et k est le nombre de zones d'accroche.
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230; Étape 3 : Faire tourner l'algorithme de suppression non-max pour enlever des doublons potentiels qui chevauchent des zones délimitantes.
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230; [Image originale, Division en une grille de taille GxG, Prédiction de zone délimitante, Suppression non-max]
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230; Remarque : lorsque pc=0, le réseau ne détecte plus d'objet. Dans ce cas, les prédictions correspondantes bx,...,cp doivent être ignorées.
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230; R-CNN ― L'algorithme de région avec des réseaux de neurones convolutionnels (en anglais <i>Region with Convolutional Neural Networks</i>) (R-CNN) est un algorithme de détection d'objet qui segmente l'image d'entrée pour trouver des zones délimitantes pertinentes, puis fait tourner un algorithme de détection pour trouver les objets les plus probables d'apparaître dans ces zones délimitantes.
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230; [Image originale, Segmentation, Prédiction de zone délimitante, Suppression non-max]
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230; Remarque : bien que l'algorithme original soit lent et coûteux en temps de calcul, de nouvelles architectures ont permis de faire tourner l'algorithme plus rapidement, tels que le Fast R-CNN et le Faster R-CNN.
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230; Vérification et reconnaissance de visage
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230; Types de modèles ― Deux principaux types de modèle sont récapitulés dans le tableau ci-dessous :
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230; [Vérification de visage, Reconnaissance de visage, Requête, Référence, Base de données]
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230; [Est-ce la bonne personne ?, , Est-ce une des K personnes dans la base de données ?, ]
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230; Apprentissage par coup ― L'apprentissage par coup (en anglais <i>One Shot Learning</i>) est un algorithme de vérification de visage qui utilise un training set de petite taille pour apprendre une fonction de similarité qui quantifie à quel point deux images données sont différentes. La fonction de similarité appliquée à deux images est souvent notée d(image 1,image 2).
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230; Réseaux siamois ― Les réseaux siamois (en anglais <i>Siamese Networks</i>) ont pour but d'apprendre comment encoder des images pour quantifier le degré de différence de deux images données. Pour une image d'entrée donnée x(i), l'encodage de sortie est souvent notée f(x(i)).
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230; Loss triple ― Le loss triple (en anglais <i>triplet loss</i>) ℓ est une fonction de loss calculée sur une représentation encodée d'un triplet d'images A (accroche), P (positif), et N (négatif). L'exemple d'accroche et l'exemple positif appartiennent à la même classe, tandis que l'exemple négatif appartient à une autre. En notant α∈R+ le paramètre de marge, le loss est défini de la manière suivante :
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230; Transfert de style neuronal
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230; Motivation ― Le but du transfert de style neuronal (en anglais <i>neural style transfer</i>) est de générer une image G à partir d'un contenu C et d'un style S.
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230; [Contenu C, Style S, Image générée G]
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230; Activation ― Dans une couche l donnée, l'activation est notée a[l] et est de dimensions nH×nw×nc
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230; Fonction de coût de contenu ― La fonction de coût de contenu (en anglais <i>content cost function</i>), notée Jcontenu(C,G), est utilisée pour quantifier à quel point l'image générée G diffère de l'image de contenu original C. Elle est définie de la manière suivante :
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230; Matrice de style ― La matrice de style (en anglais <i>style matrix</i>) G[l] d'une couche l donnée est une matrice de Gram dans laquelle chacun des éléments G[l]kk′ quantifie le degré de corrélation des canaux k and k′. Elle est définie en fonction des activations a[l] de la manière suivante :
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230; Remarque : les matrices de style de l'image de style et de l'image générée sont notées G[l] (S) and G[l] (G) respectivement.
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230; Fonction de coût de style ― La fonction de coût de style (en anglais <i>style cost function</i>), notée Jstyle(S,G), est utilisée pour quantifier à quel point l'image générée G diffère de l'image de style S. Elle est définie de la manière suivante :
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230; Fonction de coût totale ― La fonction de coût totale (en anglais <i>overall cost function</i>) est définie comme étant une combinaison linéaire des fonctions de coût de contenu et de style, pondérées par les paramètres α,β, de la manière suivante :
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230; Remarque : plus α est grand, plus le modèle privilégiera le contenu et plus β est grand, plus le modèle sera fidèle au style.
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230; Architectures utilisant des astuces de calcul
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230; Réseau antagoniste génératif ― Les réseaux antagonistes génératifs (en anglais <i>generative adversarial networks</i>), aussi connus sous le nom de GANs, sont composés d'un modèle génératif et d'un modèle discriminatif, où le modèle génératif a pour but de générer des prédictions aussi réalistes que possibles, qui seront ensuite envoyées dans un modèle discriminatif qui aura pour but de différencier une image générée d'une image réelle.
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230; [Training, Bruit, Image réelle, Générateur, Discriminant, Vrai faux]
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230; Remarque : les GANs sont utilisées dans des applications pouvant aller de la génération de musique au traitement de texte vers image.
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230; ResNet ― L'architecture du réseau résiduel (en anglais <i>Residual Network</i>), aussi appelé ResNet, utilise des blocs résiduels avec un nombre élevé de couches et a pour but de réduire l'erreur de training. Le bloc résiduel est caractérisé par l'équation suivante :
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230; Inception Network ― Cette architecture utilise des modules d'<i>inception</i> et a pour but de tester toute sorte de configuration de convolution pour améliorer sa performance en diversifiant ses attributs. En particulier, elle utilise l'astuce de la convolution 1x1 pour limiter sa complexité de calcul.
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230; Les pense-bêtes d'apprentissage profond sont maintenant disponibles en français.
+
+<br>
+
+
+**98. Original authors**
+
+&#10230; Auteurs
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230; Traduit par X, Y et Z
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230; Relu par X, Y et Z
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230; Voir la version PDF sur GitHub
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230; Par X et Y
+
+<br>
diff --git a/fr/cs-230-deep-learning-tips-and-tricks.md b/fr/cs-230-deep-learning-tips-and-tricks.md
new file mode 100644
index 000000000..4c84b51f4
--- /dev/null
+++ b/fr/cs-230-deep-learning-tips-and-tricks.md
@@ -0,0 +1,457 @@
+**Deep Learning Tips and Tricks translation**
+
+<br>
+
+**1. Deep Learning Tips and Tricks cheatsheet**
+
+&#10230; Pense-bête de petites astuces d'apprentissage profond
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Apprentissage profond
+
+<br>
+
+
+**3. Tips and tricks**
+
+&#10230; Petites astuces
+
+<br>
+
+
+**4. [Data processing, Data augmentation, Batch normalization]**
+
+&#10230; [Traitement des données, Augmentation des données, Normalisation de lot]
+
+<br>
+
+
+**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
+
+&#10230; [Entrainement d'un réseau de neurones, Epoch, Mini-lot, Entropie croisée, Rétropropagation du gradient, Algorithme du gradient, Mise à jour des coefficients, Vérification de gradient]
+
+<br>
+
+
+**6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
+
+&#10230; [Ajustement de paramètres, Initialisation de Xavier, Apprentissage par transfert, Taux d'apprentissage, Taux d'apprentissage adaptatifs]
+
+<br>
+
+
+**7. [Regularization, Dropout, Weight regularization, Early stopping]**
+
+&#10230; [Régularisation, Abandon, Régularisation des coefficients, Arrêt prématuré]
+
+<br>
+
+
+**8. [Good practices, Overfitting small batch, Gradient checking]**
+
+&#10230; [Bonnes pratiques, Surapprentissage d'un mini-lot, Vérification de gradient]
+
+<br>
+
+
+**9. View PDF version on GitHub**
+
+&#10230; Voir la version PDF sur GitHub
+
+<br>
+
+
+**10. Data processing**
+
+&#10230; Traitement des données
+
+<br>
+
+
+**11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
+
+&#10230; Augmentation des données - Les modèles d'apprentissage profond ont typiquement besoin de beaucoup de données afin d'être entrainés convenablement. Il est souvent utile de générer plus de données à partir de celles déjà existantes à l'aide de techniques d'augmentation de données. Celles les plus souvent utilisées sont résumées dans le tableau ci-dessous. À partir d'une image, voici les techniques que l'on peut utiliser :
+
+<br>
+
+
+**12. [Original, Flip, Rotation, Random crop]**
+
+&#10230; [Original, Symétrie axiale, Rotation, Recadrage aléatoire]
+
+<br>
+
+
+**13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
+
+&#10230; [Image sans aucune modification, Symétrie par rapport à un axe pour lequel le sens de l'image est conservé, Rotation avec un petit angle, Reproduit une calibration imparfaite de l'horizon, Concentration aléatoire sur une partie de l'image, Plusieurs rognements aléatoires peuvent être faits à la suite]
+
+<br>
+
+
+**14. [Color shift, Noise addition, Information loss, Contrast change]**
+
+&#10230; [Changement de couleur, Addition de bruit, Perte d'information, Changement de contraste]
+
+<br>
+
+
+**15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
+
+&#10230; [Nuances de RGB sont légèrement changées, Capture le bruit qui peut survenir avec de l'exposition lumineuse, Addition de bruit, Plus de tolérance envers la variation de la qualité de l'entrée, Parties de l'image ignorées, Imite des pertes potentielles de parties de l'image, Changement de luminosité, Contrôle la différence de l'exposition dû à l'heure de la journée]
+
+<br>
+
+
+**16. Remark: data is usually augmented on the fly during training.**
+
+&#10230; Remarque : les données sont normalement augmentées à la volée durant l'étape de training.
+
+<br>
+
+
+**17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
+
+&#10230; Normalisation de lot ― La normalisation de lot (en anglais <i>batch normalization</i>) est une étape qui normalise le lot {xi} avec un choix de paramètres γ,β. En notant μB,σ2B la moyenne et la variance de ce que l'on veut corriger du lot, on a :
+
+<br>
+
+
+**18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
+
+&#10230; Ceci est couramment fait après un fully connected/couche de convolution et avant une couche non-linéaire. Elle vise à permettre d'avoir de plus grands taux d'apprentissages et de réduire la dépendance à l'initialisation.
+
+<br>
+
+
+**19. Training a neural network**
+
+&#10230; Entraîner un réseau de neurones
+
+<br>
+
+
+**20. Definitions**
+
+&#10230; Définitions
+
+<br>
+
+
+**21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
+
+&#10230; Epoch ― Dans le contexte de l'entraînement d'un modèle, l'<i>epoch</i> est un terme utilisé pour référer à une itération où le modèle voit tout le training set pour mettre à jour ses coefficients.
+
+<br>
+
+
+**22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
+
+&#10230; Gradient descent sur mini-lots ― Durant la phase d'entraînement, la mise à jour des coefficients n'est souvent basée ni sur tout le training set d'un coup à cause de temps de calculs coûteux, ni sur un seul point à cause de bruits potentiels. À la place de cela, l'étape de mise à jour est faite sur des mini-lots, où le nombre de points dans un lot est un paramètre que l'on peut régler.
+
+<br>
+
+
+**23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
+
+&#10230; Fonction de loss ― Pour pouvoir quantifier la performance d'un modèle donné, la fonction de loss (en anglais <i>loss function</i>) L est utilisée pour évaluer la mesure dans laquelle les sorties vraies y sont correctement prédites par les prédictions du modèle z.
+
+<br>
+
+
+**24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
+
+&#10230; Entropie croisée ― Dans le contexte de la classification binaire d'un réseau de neurones, l'entropie croisée (en anglais <i>cross-entropy loss</i>) L(z,y) est couramment utilisée et est définie par :
+
+<br>
+
+
+**25. Finding optimal weights**
+
+&#10230; Recherche de coefficients optimaux
+
+<br>
+
+
+**26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
+
+&#10230; Backpropagation ― La backpropagation est une méthode de mise à jour des coefficients d'un réseau de neurones en prenant en compte les sorties vraies et désirées. La dérivée par rapport à chaque coefficient w est calculée en utilisant la règle de la chaîne.
+
+<br>
+
+
+**27. Using this method, each weight is updated with the rule:**
+
+&#10230; En utilisant cette méthode, chaque coefficient est mis à jour par :
+
+<br>
+
+
+**28. Updating weights ― In a neural network, weights are updated as follows:**
+
+&#10230; Mettre à jour les coefficients ― Dans un réseau de neurones, les coefficients sont mis à jour par :
+
+<br>
+
+
+**29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
+
+&#10230; [Étape 1 : Prendre un lot de training data et effectuer une forward propagation pour calculer le loss, Étape 2 : Backpropaguer le loss pour obtenir le gradient du loss par rapport à chaque coefficient, Étape 3 : Utiliser les gradients pour mettre à jour les coefficients du réseau.]
+
+<br>
+
+
+**30. [Forward propagation, Backpropagation, Weights update]**
+
+&#10230; [Forward propagation, Backpropagation, Mise à jour des coefficients]
+
+<br>
+
+
+**31. Parameter tuning**
+
+&#10230; Réglage des paramètres
+
+<br>
+
+
+**32. Weights initialization**
+
+&#10230; Initialisation des coefficients
+
+<br>
+
+
+**33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
+
+&#10230; Initialization de Xavier ― Au lieu de laisser les coefficients s'initialiser de manière purement aléatoire, l'initialisation de Xavier permet d'avoir des coefficients initiaux qui prennent en compte les caractéristiques uniques de l'architecture.
+
+<br>
+
+
+**34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
+
+&#10230; Apprentissage de transfert ― Entraîner un modèle d'apprentissage profond requière beaucoup de données et beaucoup de temps. Il est souvent utile de profiter de coefficients pre-entraînés sur des données énormes qui ont pris des jours/semaines pour être entraînés, et profiter de cela pour notre cas. Selon la quantité de données que l'on a sous la main, voici différentes manières d'utiliser cette méthode :
+
+<br>
+
+
+**35. [Training size, Illustration, Explanation]**
+
+&#10230; [Taille du training, Illustration, Explication]
+
+<br>
+
+
+**36. [Small, Medium, Large]**
+
+&#10230; [Petit, Moyen, Grand]
+
+<br>
+
+
+**37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
+
+&#10230; [Gèle toutes les couches, entraîne les coefficients du softmax, Gèle la plupart des couches, entraîne les coefficients des dernières couches et du softmax, Entraîne les coefficients des couches et du softmax en initialisant les coefficients sur ceux qui ont été pré-entraînés]
+
+<br>
+
+
+**38. Optimizing convergence**
+
+&#10230; Optimisation de la convergence
+
+<br>
+
+
+**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
+
+&#10230; Taux d'apprentissage ― Le taux d'apprentissage (en anglais <i>learning rate</i>), souvent noté α ou η, indique la vitesse à laquelle les coefficients sont mis à jour. Il peut être fixe ou variable. La méthode actuelle la plus populaire est appelée Adam, qui est une méthode faisant varier le taux d'apprentissage.
+
+<br>
+
+
+**40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
+
+&#10230; Taux d'apprentissage adaptatifs ― Laisser le taux d'apprentissage varier pendant la phase d'entraînement du modèle peut réduire le temps d'entraînement et améliorer la qualité de la solution numérique optimale. Bien que la méthode d'Adam est la plus utilisée, d'autres peuvent aussi être utiles. Les différentes méthodes sont récapitulées dans le tableau ci-dessous :
+
+<br>
+
+
+**41. [Method, Explanation, Update of w, Update of b]**
+
+&#10230; [Méthode, Explication, Mise à jour de b, Mise à jour de b]
+
+<br>
+
+
+**42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
+
+&#10230; [Momentum, Amortit les oscillations, Amélioration par rapport à la méthode SGD, 2 paramètres à régler]
+
+<br>
+
+
+**43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
+
+&#10230; [RMSprop, Root Mean Square propagation, Accélère l'algorithme d'apprentissage en contrôlant les oscillations]
+
+<br>
+
+
+**44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
+
+&#10230; [Adam, Adaptive Moment estimation, Méthode la plus populaire, 4 paramètres à régler]
+
+<br>
+
+
+**45. Remark: other methods include Adadelta, Adagrad and SGD.**
+
+&#10230; Remarque : parmi les autres méthodes existantes, on trouve Adadelta, Adagrad et SGD.
+
+<br>
+
+
+**46. Regularization**
+
+&#10230; Régularisation
+
+<br>
+
+
+**47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
+
+&#10230; Dropout ― Le dropout est une technique qui est destinée à empêcher le sur-ajustement sur les données de training en abandonnant des unités dans un réseau de neurones avec une probabilité p>0. Cela force le modèle à éviter de trop s'appuyer sur un ensemble particulier de features.
+
+<br>
+
+
+**48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
+
+&#10230; Remarque : la plupart des frameworks d'apprentissage profond paramétrisent le dropout à travers le paramètre 'garder' 1-p.
+
+<br>
+
+
+**49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
+
+&#10230; Régularisation de coefficient ― Pour s'assurer que les coefficients ne sont pas trop grands et que le modèle ne sur-ajuste pas sur le training set, on utilise des techniques de régularisation sur les coefficients du modèle. Les techniques principales sont résumées dans le tableau suivant :
+
+<br>
+
+
+**50. [LASSO, Ridge, Elastic Net]**
+
+&#10230; [LASSO, Ridge, Elastic Net]
+
+<br>
+
+**50 bis. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230; [Réduit les coefficients à 0, Bon pour la sélection de variables, Rend les coefficients plus petits, Compromis entre la sélection de variables et la réduction de la taille des coefficients]
+
+<br>
+
+**51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
+
+&#10230; Arrêt prématuré ― L'arrêt prématuré (en anglais <i>early stopping</i>) est une technique de régularisation qui consiste à stopper l'étape d'entraînement dès que le loss de validation atteint un plateau ou commence à augmenter.
+
+<br>
+
+
+**52. [Error, Validation, Training, early stopping, Epochs]**
+
+&#10230; [Erreur, Validation, Training, arrêt prématuré, Epochs]
+
+<br>
+
+
+**53. Good practices**
+
+&#10230; Bonnes pratiques
+
+<br>
+
+
+**54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
+
+&#10230; Sur-ajuster un mini-lot ― Lorsque l'on débugge un modèle, il est souvent utile de faire de petits tests pour voir s'il y a un gros souci avec l'architecture du modèle lui-même. En particulier, pour s'assurer que le modèle peut être entraîné correctement, un mini-lot est passé dans le réseau pour voir s'il peut sur-ajuster sur lui. Si le modèle ne peut pas le faire, cela signifie que le modèle est soit trop complexe ou pas assez complexe pour être sur-ajusté sur un mini-lot.
+
+<br>
+
+
+**55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
+
+&#10230; Gradient checking ― La méthode de gradient checking est utilisée durant l'implémentation d'un backward pass d'un réseau de neurones. Elle compare la valeur du gradient analytique par rapport au gradient numérique au niveau de certains points et joue un rôle de vérification élémentaire.
+
+<br>
+
+
+**56. [Type, Numerical gradient, Analytical gradient]**
+
+&#10230; [Type, Gradient numérique, Gradient analytique]
+
+<br>
+
+
+**57. [Formula, Comments]**
+
+&#10230; [Formule, Commentaires]
+
+<br>
+
+
+**58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
+
+&#10230; [Coûteux; le loss doit être calculé deux fois par dimension, Utilisé pour vérifier l'exactitude d'une implémentation analytique, Compromis dans le choix de h entre pas trop petit (instabilité numérique) et pas trop grand (estimation du gradient approximative)]
+
+<br>
+
+
+**59. ['Exact' result, Direct computation, Used in the final implementation]**
+
+&#10230; [Résultat 'exact', Calcul direct, Utilisé dans l'implémentation finale]
+
+<br>
+
+
+**60. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230; Les pense-bêtes d'apprentissage profond sont maintenant disponibles en français.
+
+<br>
+
+**61. Original authors**
+
+&#10230; Auteurs
+
+<br>
+
+**62. Translated by X, Y and Z**
+
+&#10230; Traduit par X, Y et Z
+
+<br>
+
+**63. Reviewed by X, Y and Z**
+
+&#10230; Relu par X, Y et Z
+
+<br>
+
+**64. View PDF version on GitHub**
+
+&#10230; Voir la version PDF sur GitHub
+
+<br>
+
+**65. By X and Y**
+
+&#10230; Par X et Y
+
+<br>
diff --git a/fr/cs-230-recurrent-neural-networks.md b/fr/cs-230-recurrent-neural-networks.md
new file mode 100644
index 000000000..e7d8f5343
--- /dev/null
+++ b/fr/cs-230-recurrent-neural-networks.md
@@ -0,0 +1,678 @@
+**Recurrent Neural Networks translation**
+
+<br>
+
+**1. Recurrent Neural Networks cheatsheet**
+
+&#10230; Pense-bête de réseaux de neurones récurrents
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Apprentissage profond
+
+<br>
+
+
+**3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
+
+&#10230; [Vue d'ensemble, Structure d'architecture, Applications des RNNs, Fonction de loss, Backpropagation]
+
+<br>
+
+
+**4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
+
+&#10230; [Dépendances à long terme, Fonctions d'activation communes, Gradient qui disparait/explose, Coupure de gradient, GRU/LSTM, Types de porte, RNN bi-directionnel, RNN profond]
+
+<br>
+
+
+**5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
+
+&#10230; [Apprentissage de la représentation de mots, Notations, Matrice de représentation, Word2vec, Skip-gram, Échantillonnage négatif, GloVe]
+
+<br>
+
+
+**6. [Comparing words, Cosine similarity, t-SNE]**
+
+&#10230; [Comparaison des mots, Similarité cosinus, t-SNE]
+
+<br>
+
+
+**7. [Language model, n-gram, Perplexity]**
+
+&#10230; [Modèle de langage, n-gram, Perplexité]
+
+<br>
+
+
+**8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
+
+&#10230; [Traduction machine, Recherche en faisceau, Normalisation de longueur, Analyse d'erreur, Score bleu]
+
+<br>
+
+
+**9. [Attention, Attention model, Attention weights]**
+
+&#10230; [Attention, Modèle d'attention, Coefficients d'attention]
+
+<br>
+
+
+**10. Overview**
+
+&#10230; Vue d'ensemble
+
+<br>
+
+
+**11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
+
+&#10230; Architecture d'un RNN traditionnel ― Les réseaux de neurones récurrents (en anglais <i>recurrent neural networks</i>), aussi appelés RNNs, sont une classe de réseaux de neurones qui permettent aux prédictions antérieures d'être utilisées comme entrées, par le biais d'états cachés (en anglais <i>hidden states</i>). Ils sont de la forme suivante :
+
+<br>
+
+
+**12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
+
+&#10230; À l'instant t, l'activation a<t> et la sortie y<t> sont de la forme suivante :
+
+<br>
+
+
+**13. and**
+
+&#10230; et
+
+<br>
+
+
+**14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
+
+&#10230; où Wax,Waa,Wya,ba,by sont des coefficients indépendants du temps et où g1,g2 sont des fonctions d'activation.
+
+<br>
+
+
+**15. The pros and cons of a typical RNN architecture are summed up in the table below:**
+
+&#10230; Les avantages et inconvénients des architectures de RNN traditionnelles sont résumés dans le tableau ci-dessous :
+
+<br>
+
+
+**16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
+
+&#10230; [Avantages, Possibilité de prendre en compte des entrées de toute taille, La taille du modèle n'augmente pas avec la taille de l'entrée, Les calculs prennent en compte les informations antérieures, Les coefficients sont indépendants du temps]
+
+<br>
+
+
+**17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
+
+&#10230; [Inconvénients, Le temps de calcul est long, Difficulté d'accéder à des informations d'un passé lointain, Impossibilité de prendre en compte des informations futures un état donné]
+
+<br>
+
+
+**18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
+
+&#10230; Applications des RNNs ― Les modèles RNN sont surtout utilisés dans les domaines du traitement automatique du langage naturel et de la reconnaissance vocale. Le tableau suivant détaille les applications principales à retenir :
+
+<br>
+
+
+**19. [Type of RNN, Illustration, Example]**
+
+&#10230; [Type de RNN, Illustration, Exemple]
+
+<br>
+
+
+**20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
+
+&#10230; [Un à un, Un à plusieurs, Plusieurs à un, Plusieurs à plusieurs]
+
+<br>
+
+
+**21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
+
+&#10230; [Réseau de neurones traditionnel, Génération de musique, Classification de sentiment, Reconnaissance d'entité, Traduction machine]
+
+<br>
+
+
+**22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
+
+&#10230; Fonction de loss ― Dans le contexte des réseaux de neurones récurrents, la fonction de loss L prend en compte le loss à chaque temps T de la manière suivante :
+
+<br>
+
+
+**23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
+
+&#10230; Backpropagation temporelle ― L'étape de backpropagation est appliquée dans la dimension temporelle. À l'instant T, la dérivée du loss L par rapport à la matrice de coefficients W est donnée par :
+
+<br>
+
+
+**24. Handling long term dependencies**
+
+&#10230; Dépendances à long terme
+
+<br>
+
+
+**25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
+
+&#10230; Fonctions d'activation communément utilisées ― Les fonctions d'activation les plus utilisées dans les RNNs sont décrits ci-dessous :
+
+<br>
+
+
+**26. [Sigmoid, Tanh, RELU]**
+
+&#10230; [Sigmoïde, Tanh, RELU]
+
+<br>
+
+
+**27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
+
+&#10230; Gradient qui disparait/explose ― Les phénomènes de gradient qui disparait et qui explose (en anglais <i>vanishing gradient</i> et <i>exploding gradient</i>) sont souvent rencontrés dans le contexte des RNNs. Ceci est dû au fait qu'il est difficile de capturer des dépendances à long terme à cause du gradient multiplicatif qui peut décroître/croître de manière exponentielle en fonction du nombre de couches.
+
+<br>
+
+
+**28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
+
+&#10230; Coupure de gradient ― Cette technique est utilisée pour atténuer le phénomène de gradient qui explose qui peut être rencontré lors de l'étape de backpropagation. En plafonnant la valeur qui peut être prise par le gradient, ce phénomène est maîtrisé en pratique.
+
+<br>
+
+
+**29. clipped**
+
+&#10230; coupé
+
+<br>
+
+
+**30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
+
+&#10230; Types de porte ― Pour remédier au problème du gradient qui disparait, certains types de porte sont spécifiquement utilisés dans des variantes de RNNs et ont un but bien défini. Les portes sont souvent notées Γ et sont telles que :
+
+<br>
+
+
+**31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
+
+&#10230; où W,U,b sont des coefficients spécifiques à la porte et σ est une sigmoïde. Les portes à retenir sont récapitulées dans le tableau ci-dessous :
+
+<br>
+
+
+**32. [Type of gate, Role, Used in]**
+
+&#10230; [Type de porte, Rôle, Utilisée dans]
+
+<br>
+
+
+**33. [Update gate, Relevance gate, Forget gate, Output gate]**
+
+&#10230; [Porte d'actualisation, Porte de pertinence, Porte d'oubli, Porte de sortie]
+
+<br>
+
+
+**34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
+
+&#10230; [Dans quelle mesure le passé devrait être important ?, Enlever les informations précédentes ?, Enlever une cellule ?, Combien devrait-on révéler d'une cellule ?]
+
+<br>
+
+
+**35. [LSTM, GRU]**
+
+&#10230; [LSTM, GRU]
+
+<br>
+
+
+**36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
+
+&#10230; GRU/LSTM ― Les unités de porte récurrente (en anglais <i>Gated Recurrent Unit</i>) (GRU) et les unités de mémoire à long/court terme (en anglais <i>Long Short-Term Memory units</i>) (LSTM) apaisent le problème du gradient qui disparait rencontré par les RNNs traditionnels, où le LSTM peut être vu comme étant une généralisation du GRU. Le tableau ci-dessous résume les équations caractéristiques de chacune de ces architectures :
+
+<br>
+
+
+**37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
+
+&#10230; [Caractérisation, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dépendances]
+
+<br>
+
+
+**38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
+
+&#10230; Remarque : le signe ⋆ dénote le produit de Hadamard entre deux vecteurs.
+
+<br>
+
+
+**39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
+
+&#10230; Variantes des RNNs ― Le tableau ci-dessous récapitule les autres architectures RNN communément utilisées :
+
+<br>
+
+
+**40. [Bidirectional (BRNN), Deep (DRNN)]**
+
+&#10230; [Bi-directionnel (BRNN), Profond (DRNN)]
+
+<br>
+
+
+**41. Learning word representation**
+
+&#10230; Apprentissage de la représentation de mots
+
+<br>
+
+
+**42. In this section, we note V the vocabulary and |V| its size.**
+
+&#10230; Dans cette section, on note V le vocabulaire et |V| sa taille.
+
+<br>
+
+
+**43. Motivation and notations**
+
+&#10230; Motivation et notations
+
+<br>
+
+
+**44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
+
+&#10230; Techniques de représentation ― Les deux manières principales de représenter des mots sont décrits dans le tableau suivant :
+
+<br>
+
+
+**45. [1-hot representation, Word embedding]**
+
+&#10230; [Représentation binaire, Représentation du mot]
+
+<br>
+
+
+**46. [teddy bear, book, soft]**
+
+&#10230; [ours en peluche, livre, doux]
+
+<br>
+
+
+**47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
+
+&#10230; [Noté ow, Approche naïve, pas d'information de similarité, Noté ew, Prend en compte la similarité des mots]
+
+<br>
+
+
+**48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
+
+&#10230; Matrice de représentation ― Pour un mot donné w, la matrice de représentation (en anglais <i>embedding matrix</i>) E est une matrice qui relie une représentation binaire ow à sa représentation correspondante ew de la manière suivante :
+
+<br>
+
+
+**49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
+
+&#10230; Remarque : l'apprentissage d'une matrice de représentation peut être effectuée en utilisant des modèles probabilistiques de cible/contexte.
+
+<br>
+
+
+**50. Word embeddings**
+
+&#10230; Représentation de mots
+
+<br>
+
+
+**51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
+
+&#10230; Word2vec ― Word2vec est un ensemble de techniques visant à apprendre comment représenter les mots en estimant la probabilité qu'un mot donné a d'être entouré par d'autres mots. Le skip-gram, l'échantillonnage négatif et le CBOW font parti des modèles les plus populaires.
+
+<br>
+
+
+**52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
+
+&#10230; [Un ours en peluche mignon est en train de lire, ours en peluche, doux, poésie persane, art]
+
+<br>
+
+
+**53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
+
+&#10230; [Entraîner le réseau, Extraire une représentation globale, Calculer une représentation des mots]
+
+<br>
+
+
+**54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
+
+&#10230; Skip-gram ― Le skip-gram est un modèle de type supervisé qui apprend comment représenter les mots en évaluant la probabilité de chaque mot cible t donné dans un mot contexte c. En notant θt le paramètre associé à t, la probabilité P(t|c) est donnée par :
+
+<br>
+
+
+**55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
+
+&#10230; Remarque : le fait d'additionner tout le vocabulaire dans le dénominateur du softmax rend le modèle coûteux en temps de calcul. CBOW est un autre modèle utilisant les mots avoisinants pour prédire un mot donné.
+
+<br>
+
+
+**56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
+
+&#10230; Échantillonnage négatif ― Cette méthode utilise un ensemble de classifieurs binaires utilisant des régressions logistiques qui visent à évaluer dans quelle mesure des mots contexte et cible sont susceptible d'apparaître simultanément, avec des modèles étant entraînés sur des ensembles de k exemples négatifs et 1 exemple positif. Étant donnés un mot contexte c et un mot cible t, la prédiction est donnée par :
+
+<br>
+
+
+**57. Remark: this method is less computationally expensive than the skip-gram model.**
+
+&#10230; Remarque : cette méthode est moins coûteuse en calcul par rapport au modèle skip-gram.
+
+<br>
+
+
+**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
+
+&#10230; GloVe ― Le modèle GloVe (en anglais <i>global vectors for word representation</i>) est une technique de représentation des mots qui utilise une matrice de co-occurrence X où chaque Xi,j correspond au nombre de fois qu'une cible i se produit avec un contexte j. Sa fonction de coût J est telle que :
+
+<br>
+
+
+**58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
+Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
+
+&#10230; où f est une fonction à coefficients telle que Xi,j=0⟹f(Xi,j)=0.
+Étant donné la symétrie que e et θ ont dans un modèle, la représentation du mot final e(final)w est donnée par :
+
+<br>
+
+
+**59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
+
+&#10230; Remarque : les composantes individuelles de la représentation d'un mot n'est pas nécessairement facilement interprétable.
+
+<br>
+
+
+**60. Comparing words**
+
+&#10230; Comparaison de mots
+
+<br>
+
+
+**61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
+
+&#10230; Similarité cosinus ― La similarité cosinus (en anglais <i>cosine similarity</i>) entre les mots w1 et w2 est donnée par :
+
+<br>
+
+
+**62. Remark: θ is the angle between words w1 and w2.**
+
+&#10230; Remarque : θ est l'angle entre les mots w1 et w2.
+
+<br>
+
+
+**63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
+
+&#10230; t-SNE ― La méthode t-SNE (en anglais <i>t-distributed Stochastic Neighbor Embedding</i>) est une technique visant à réduire une représentation dans un espace de haute dimension en un espace de plus faible dimension. En pratique, on visualise les vecteur-mots dans un espace 2D.
+
+<br>
+
+
+**64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
+
+&#10230; [littérature, art, livre, culture, poème, lecture, connaissance, divertissant, aimable, enfance, gentil, ours en peluche, doux, câlin, mignon, adorable]
+
+<br>
+
+
+**65. Language model**
+
+&#10230; Modèle de langage
+
+<br>
+
+
+**66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
+
+&#10230; Vue d'ensemble ― Un modèle de langage vise à estimer la probabilité d'une phrase P(y).
+
+<br>
+
+
+**67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
+
+&#10230; Modèle n-gram ― Ce modèle consiste en une approche naïve qui vise à quantifier la probabilité qu'une expression apparaisse dans un corpus en comptabilisant le nombre de son apparition dans le training data.
+
+<br>
+
+
+**68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
+
+&#10230; Perplexité ― Les modèles de langage sont communément évalués en utilisant la perplexité, aussi noté PP, qui peut être interprété comme étant la probabilité inverse des données normalisée par le nombre de mots T. La perplexité est telle que plus elle est faible, mieux c'est. Elle est définie de la manière suivante :
+
+<br>
+
+
+**69. Remark: PP is commonly used in t-SNE.**
+
+&#10230; Remarque : PP est souvent utilisée dans le cadre du t-SNE.
+
+<br>
+
+
+**70. Machine translation**
+
+&#10230; Traduction machine
+
+<br>
+
+
+**71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
+
+&#10230; Vue d'ensemble ― Un modèle de traduction machine est similaire à un modèle de langage ayant un auto-encodeur placé en amont. Pour cette raison, ce modèle est souvent surnommé modèle conditionnel de langage. Le but est de trouver une phrase y telle que :
+
+<br>
+
+
+**72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
+
+&#10230; Recherche en faisceau ― Cette technique (en anglais <i>beam search</i>) est un algorithme de recherche heuristique, utilisé dans le cadre de la traduction machine et de la reconnaissance vocale, qui vise à trouver la phrase la plus probable y sachant l'entrée x.
+
+<br>
+
+
+**73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
+
+&#10230; [Étape 1 : Trouver les B mots les plus probables y<1>, Étape 2 : Calculer les probabilités conditionnelles y<k>|x,y<1>,...,y<k−1>, Étape 3 : Garder les B combinaisons les plus probables x,y<1>,...,y<k>, Arrêter la procédure à un mot stop]
+
+<br>
+
+
+**74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
+
+&#10230; Remarque : si la largeur du faisceau est prise égale à 1, alors ceci est équivalent à un algorithme glouton.
+
+<br>
+
+
+**75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
+
+&#10230; Largeur du faisceau ― La largeur du faisceau (en anglais <i>beam width</i>) B est un paramètre de la recherche en faisceau. De grandes valeurs de B conduisent à avoir de meilleurs résultats mais avec un coût de mémoire plus lourd et à un temps de calcul plus long. De faibles valeurs de B conduisent à de moins bons résultats mais avec un coût de calcul plus faible. Une valeur de B égale à 10 est standard et est souvent utilisée.
+
+<br>
+
+
+**76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
+
+&#10230; Normalisation de longueur ― Pour que la stabilité numérique puisse être améliorée, la recherche en faisceau utilise un objectif normalisé, souvent appelé l'objectif de log-probabilité normalisé, défini par :
+
+<br>
+
+
+**77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
+
+&#10230; Remarque : le paramètre α est souvent comprise entre 0.5 et 1.
+
+<br>
+
+
+**78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
+
+&#10230; Analyse d'erreur ― Lorsque l'on obtient une mauvaise traduction prédite ˆy, on peut se demander la raison pour laquelle l'algorithme n'a pas obtenu une bonne traduction y∗ en faisant une analyse d'erreur de la manière suivante :
+
+<br>
+
+
+**79. [Case, Root cause, Remedies]**
+
+&#10230; [Cas, Cause, Remèdes]
+
+<br>
+
+
+**80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
+
+&#10230; [Recherche en faisceau défectueuse, RNN défectueux, Augmenter la largeur du faisceau, Essayer une différente architecture, Régulariser, Obtenir plus de données]
+
+<br>
+
+
+**81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
+
+&#10230; Score bleu ― Le score bleu (en anglais <i>bilingual evaluation understudy</i>) a pour but de quantifier à quel point une traduction est bonne en calculant un score de similarité basé sur une précision n-gram. Il est défini de la manière suivante :
+
+<br>
+
+
+**82. where pn is the bleu score on n-gram only defined as follows:**
+
+&#10230; où pn est le score bleu uniquement basé sur les n-gram, défini par :
+
+<br>
+
+
+**83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
+
+&#10230; Remarque : une pénalité de brièveté peut être appliquée aux traductions prédites courtes pour empêcher que le score bleu soit artificiellement haut.
+
+<br>
+
+
+**84. Attention**
+
+&#10230; Attention
+
+<br>
+
+
+**85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
+
+&#10230; Modèle d'attention ― Le modèle d'attention (en anglais <i>attention model</i>) permet au RNN de mettre en valeur des parties spécifiques de l'entrée qui peuvent être considérées comme étant importantes, ce qui améliore la performance du modèle final en pratique. En notant α<t,t′> la quantité d'attention que la sortie y<t> devrait porter à l'activation a<t′> et au contexte c<t> à l'instant t, on a :
+
+<br>
+
+
+**86. with**
+
+&#10230; avec
+
+<br>
+
+
+**87. Remark: the attention scores are commonly used in image captioning and machine translation.**
+
+&#10230; Remarque : les scores d'attention sont communément utilisés dans la génération de légende d'image ainsi que dans la traduction machine.
+
+<br>
+
+
+**88. A cute teddy bear is reading Persian literature.**
+
+&#10230; Un ours en peluche mignon est en train de lire de la littérature persane.
+
+<br>
+
+
+**89. Attention weight ― The amount of attention that the output y<t> should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
+
+&#10230; Coefficient d'attention ― La quantité d'attention que la sortie y<t> devrait porter à l'activation a<t′> est donné α<t,t′>, qui est calculé de la manière suivante :
+
+<br>
+
+
+**90. Remark: computation complexity is quadratic with respect to Tx.**
+
+&#10230; Remarque : la complexité de calcul est quadratique par rapport à Tx.
+
+<br>
+
+
+**91. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230; Les pense-bêtes d'apprentissage profond sont maintenant disponibles en français.
+
+<br>
+
+**92. Original authors**
+
+&#10230; Auteurs
+
+<br>
+
+**93. Translated by X, Y and Z**
+
+&#10230; Traduit par X, Y et Z
+
+<br>
+
+**94. Reviewed by X, Y and Z**
+
+&#10230; Relu par X, Y et Z
+
+<br>
+
+**95. View PDF version on GitHub**
+
+&#10230; Voir la version PDF sur GitHub
+
+<br>
+
+**96. By X and Y**
+
+&#10230; Par X et Y
+
+<br>
diff --git a/he/cheatsheet-deep-learning.md b/he/cheatsheet-deep-learning.md
deleted file mode 100644
index a5aa3756c..000000000
--- a/he/cheatsheet-deep-learning.md
+++ /dev/null
@@ -1,321 +0,0 @@
-**1. Deep Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Neural Networks**
-
-&#10230;
-
-<br>
-
-**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
-
-&#10230;
-
-<br>
-
-**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
-
-&#10230;
-
-<br>
-
-**5. [Input layer, hidden layer, output layer]**
-
-&#10230;
-
-<br>
-
-**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
-
-&#10230;
-
-<br>
-
-**7. where we note w, b, z the weight, bias and output respectively.**
-
-&#10230;
-
-<br>
-
-**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
-
-&#10230;
-
-<br>
-
-**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
-
-&#10230;
-
-<br>
-
-**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
-
-&#10230;
-
-<br>
-
-**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
-
-&#10230;
-
-<br>
-
-**13. As a result, the weight is updated as follows:**
-
-&#10230;
-
-<br>
-
-**14. Updating weights ― In a neural network, weights are updated as follows:**
-
-&#10230;
-
-<br>
-
-**15. Step 1: Take a batch of training data.**
-
-&#10230;
-
-<br>
-
-**16. Step 2: Perform forward propagation to obtain the corresponding loss.**
-
-&#10230;
-
-<br>
-
-**17. Step 3: Backpropagate the loss to get the gradients.**
-
-&#10230;
-
-<br>
-
-**18. Step 4: Use the gradients to update the weights of the network.**
-
-&#10230;
-
-<br>
-
-**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
-
-&#10230;
-
-<br>
-
-**20. Convolutional Neural Networks**
-
-&#10230;
-
-<br>
-
-**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
-
-&#10230;
-
-<br>
-
-**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
-
-&#10230;
-
-<br>
-
-**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
-
-&#10230;
-
-<br>
-
-**24. Recurrent Neural Networks**
-
-&#10230;
-
-<br>
-
-**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
-
-&#10230;
-
-<br>
-
-**26. [Input gate, forget gate, gate, output gate]**
-
-&#10230;
-
-<br>
-
-**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
-
-&#10230;
-
-<br>
-
-**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
-
-&#10230;
-
-<br>
-
-**29. Reinforcement Learning and Control**
-
-&#10230;
-
-<br>
-
-**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
-
-&#10230;
-
-<br>
-
-**33. S is the set of states**
-
-&#10230;
-
-<br>
-
-**34. A is the set of actions**
-
-&#10230;
-
-<br>
-
-**35. {Psa} are the state transition probabilities for s∈S and a∈A**
-
-&#10230;
-
-<br>
-
-**36. γ∈[0,1[ is the discount factor**
-
-&#10230;
-
-<br>
-
-**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
-
-&#10230;
-
-<br>
-
-**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
-
-&#10230;
-
-<br>
-
-**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
-
-&#10230;
-
-<br>
-
-**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
-
-&#10230;
-
-<br>
-
-**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
-
-&#10230;
-
-<br>
-
-**42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
-
-&#10230;
-
-<br>
-
-**43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
-
-&#10230;
-
-<br>
-
-**44. 1) We initialize the value:**
-
-&#10230;
-
-<br>
-
-**45. 2) We iterate the value based on the values before:**
-
-&#10230;
-
-<br>
-
-**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
-
-&#10230;
-
-<br>
-
-**47. times took action a in state s and got to s′**
-
-&#10230;
-
-<br>
-
-**48. times took action a in state s**
-
-&#10230;
-
-<br>
-
-**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
-
-&#10230;
-
-<br>
-
-**50. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-**51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
-
-&#10230;
-
-<br>
-
-**52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
-
-&#10230;
-
-<br>
-
-**53. [Recurrent Neural Networks, Gates, LSTM]**
-
-&#10230;
-
-<br>
-
-**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
-
-&#10230;
diff --git a/he/cheatsheet-machine-learning-tips-and-tricks.md b/he/cheatsheet-machine-learning-tips-and-tricks.md
deleted file mode 100644
index 9712297b8..000000000
--- a/he/cheatsheet-machine-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,285 +0,0 @@
-**1. Machine Learning tips and tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Classification metrics**
-
-&#10230;
-
-<br>
-
-**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
-
-&#10230;
-
-<br>
-
-**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**5. [Predicted class, Actual class]**
-
-&#10230;
-
-<br>
-
-**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
-
-&#10230;
-
-<br>
-
-**7. [Metric, Formula, Interpretation]**
-
-&#10230;
-
-<br>
-
-**8. Overall performance of model**
-
-&#10230;
-
-<br>
-
-**9. How accurate the positive predictions are**
-
-&#10230;
-
-<br>
-
-**10. Coverage of actual positive sample**
-
-&#10230;
-
-<br>
-
-**11. Coverage of actual negative sample**
-
-&#10230;
-
-<br>
-
-**12. Hybrid metric useful for unbalanced classes**
-
-&#10230;
-
-<br>
-
-**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**14. [Metric, Formula, Equivalent]**
-
-&#10230;
-
-<br>
-
-**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
-
-&#10230;
-
-<br>
-
-**16. [Actual, Predicted]**
-
-&#10230;
-
-<br>
-
-**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
-
-&#10230;
-
-<br>
-
-**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
-
-&#10230;
-
-<br>
-
-**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
-
-&#10230;
-
-<br>
-
-**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
-
-&#10230;
-
-<br>
-
-**22. Model selection**
-
-&#10230;
-
-<br>
-
-**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
-
-&#10230;
-
-<br>
-
-**24. [Training set, Validation set, Testing set]**
-
-&#10230;
-
-<br>
-
-**25. [Model is trained, Model is assessed, Model gives predictions]**
-
-&#10230;
-
-<br>
-
-**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
-
-&#10230;
-
-<br>
-
-**27. [Also called hold-out or development set, Unseen data]**
-
-&#10230;
-
-<br>
-
-**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
-
-&#10230;
-
-<br>
-
-**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
-
-&#10230;
-
-<br>
-
-**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
-
-&#10230;
-
-<br>
-
-**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
-
-&#10230;
-
-<br>
-
-**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
-
-&#10230;
-
-<br>
-
-**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-**35. Diagnostics**
-
-&#10230;
-
-<br>
-
-**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
-
-&#10230;
-
-<br>
-
-**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
-
-&#10230;
-
-<br>
-
-**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
-
-&#10230;
-
-<br>
-
-**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
-
-&#10230;
-
-<br>
-
-**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
-
-&#10230;
-
-<br>
-
-**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
-
-&#10230;
-
-<br>
-
-**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
-
-&#10230;
-
-<br>
-
-**44. Regression metrics**
-
-&#10230;
-
-<br>
-
-**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-**47. [Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/he/cheatsheet-supervised-learning.md b/he/cheatsheet-supervised-learning.md
deleted file mode 100644
index a6b19ea1c..000000000
--- a/he/cheatsheet-supervised-learning.md
+++ /dev/null
@@ -1,567 +0,0 @@
-**1. Supervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Supervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
-
-&#10230;
-
-<br>
-
-**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**5. [Regression, Classifier, Outcome, Examples]**
-
-&#10230;
-
-<br>
-
-**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**7. Type of model ― The different models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
-
-&#10230;
-
-<br>
-
-**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**10. Notations and general concepts**
-
-&#10230;
-
-<br>
-
-**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
-
-&#10230;
-
-<br>
-
-**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
-
-&#10230;
-
-<br>
-
-**14. [Linear regression, Logistic regression, SVM, Neural Network]**
-
-&#10230;
-
-<br>
-
-**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
-
-&#10230;
-
-<br>
-
-**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
-
-&#10230;
-
-<br>
-
-**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
-
-&#10230;
-
-<br>
-
-**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
-
-&#10230;
-
-<br>
-
-**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
-
-&#10230;
-
-<br>
-
-**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
-
-&#10230;
-
-<br>
-
-**21. Linear models**
-
-&#10230;
-
-<br>
-
-**22. Linear regression**
-
-&#10230;
-
-<br>
-
-**23. We assume here that y|x;θ∼N(μ,σ2)**
-
-&#10230;
-
-<br>
-
-**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
-
-&#10230;
-
-<br>
-
-**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
-
-&#10230;
-
-<br>
-
-**26. Remark: the update rule is a particular case of the gradient ascent.**
-
-&#10230;
-
-<br>
-
-**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
-
-&#10230;
-
-<br>
-
-**28. Classification and logistic regression**
-
-&#10230;
-
-<br>
-
-**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
-
-&#10230;
-
-<br>
-
-**31. Remark: there is no closed form solution for the case of logistic regressions.**
-
-&#10230;
-
-<br>
-
-**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
-
-&#10230;
-
-<br>
-
-**33. Generalized Linear Models**
-
-&#10230;
-
-<br>
-
-**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
-
-&#10230;
-
-<br>
-
-**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
-
-&#10230;
-
-<br>
-
-**36. Here are the most common exponential distributions summed up in the following table:**
-
-&#10230;
-
-<br>
-
-**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
-
-&#10230;
-
-<br>
-
-**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
-
-&#10230;
-
-<br>
-
-**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
-
-&#10230;
-
-<br>
-
-**40. Support Vector Machines**
-
-&#10230;
-
-<br>
-
-**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
-
-&#10230;
-
-<br>
-
-**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
-
-&#10230;
-
-<br>
-
-**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
-
-&#10230;
-
-<br>
-
-**44. such that**
-
-&#10230;
-
-<br>
-
-**45. support vectors**
-
-&#10230;
-
-<br>
-
-**46. Remark: the line is defined as wTx−b=0.**
-
-&#10230;
-
-<br>
-
-**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
-
-&#10230;
-
-<br>
-
-**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
-
-&#10230;
-
-<br>
-
-**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
-
-&#10230;
-
-<br>
-
-**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
-
-&#10230;
-
-<br>
-
-**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the coefficients βi are called the Lagrange multipliers.**
-
-&#10230;
-
-<br>
-
-**54. Generative Learning**
-
-&#10230;
-
-<br>
-
-**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
-
-&#10230;
-
-<br>
-
-**56. Gaussian Discriminant Analysis**
-
-&#10230;
-
-<br>
-
-**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
-
-&#10230;
-
-<br>
-
-**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
-
-&#10230;
-
-<br>
-
-**59. Naive Bayes**
-
-&#10230;
-
-<br>
-
-**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
-
-&#10230;
-
-<br>
-
-**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
-
-&#10230;
-
-<br>
-
-**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
-
-&#10230;
-
-<br>
-
-**63. Tree-based and ensemble methods**
-
-&#10230;
-
-<br>
-
-**64. These methods can be used for both regression and classification problems.**
-
-&#10230;
-
-<br>
-
-**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
-
-&#10230;
-
-<br>
-
-**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
-
-&#10230;
-
-<br>
-
-**67. Remark: random forests are a type of ensemble methods.**
-
-&#10230;
-
-<br>
-
-**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**69. [Adaptive boosting, Gradient boosting]**
-
-&#10230;
-
-<br>
-
-**70. High weights are put on errors to improve at the next boosting step**
-
-&#10230;
-
-<br>
-
-**71. Weak learners trained on remaining errors**
-
-&#10230;
-
-<br>
-
-**72. Other non-parametric approaches**
-
-&#10230;
-
-<br>
-
-**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
-
-&#10230;
-
-<br>
-
-**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**75. Learning Theory**
-
-&#10230;
-
-<br>
-
-**76. Union bound ― Let A1,...,Ak be k events. We have:**
-
-&#10230;
-
-<br>
-
-**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
-
-&#10230;
-
-<br>
-
-**78. Remark: this inequality is also known as the Chernoff bound.**
-
-&#10230;
-
-<br>
-
-**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
-
-&#10230;
-
-<br>
-
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
-
-&#10230;
-
-<br>
-
-**81: the training and testing sets follow the same distribution **
-
-&#10230;
-
-<br>
-
-**82. the training examples are drawn independently**
-
-&#10230;
-
-<br>
-
-**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
-
-&#10230;
-
-<br>
-
-**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
-
-&#10230;
-
-<br>
-
-**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
-
-&#10230;
-
-<br>
-
-**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**88. [Introduction, Type of prediction, Type of model]**
-
-&#10230;
-
-<br>
-
-**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
-
-&#10230;
-
-<br>
-
-**90. [Linear models, linear regression, logistic regression, generalized linear models]**
-
-&#10230;
-
-<br>
-
-**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
-
-&#10230;
-
-<br>
-
-**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
-
-&#10230;
-
-<br>
-
-**94. [Other methods, k-NN]**
-
-&#10230;
-
-<br>
-
-**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
-
-&#10230;
diff --git a/he/refresher-probability.md b/he/refresher-probability.md
deleted file mode 100644
index 5c9b34656..000000000
--- a/he/refresher-probability.md
+++ /dev/null
@@ -1,381 +0,0 @@
-**1. Probabilities and Statistics refresher**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Probability and Combinatorics**
-
-&#10230;
-
-<br>
-
-**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-
-<br>
-
-**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-
-<br>
-
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-
-<br>
-
-**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-
-<br>
-
-**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-
-<br>
-
-**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-
-<br>
-
-**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-
-<br>
-
-**12. Conditional Probability**
-
-&#10230;
-
-<br>
-
-**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-
-<br>
-
-**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-
-<br>
-
-**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-
-<br>
-
-**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-
-<br>
-
-**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-
-<br>
-
-**18. Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-
-<br>
-
-**19. Random Variables**
-
-&#10230;
-
-<br>
-
-**20. Definitions**
-
-&#10230;
-
-<br>
-
-**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-
-<br>
-
-**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-
-<br>
-
-**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-
-<br>
-
-**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-
-<br>
-
-**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-
-<br>
-
-**26. [Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-
-<br>
-
-**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-
-<br>
-
-**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-
-<br>
-
-**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-
-<br>
-
-**32. Probability Distributions**
-
-&#10230;
-
-<br>
-
-**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-
-<br>
-
-**34. Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-
-<br>
-
-**35. [Type, Distribution]**
-
-&#10230;
-
-<br>
-
-**36. Jointly Distributed Random Variables**
-
-&#10230;
-
-<br>
-
-**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-
-<br>
-
-**38. [Case, Marginal density, Cumulative function]**
-
-&#10230;
-
-<br>
-
-**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**40. Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-
-<br>
-
-**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-
-<br>
-
-**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-
-<br>
-
-**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-
-<br>
-
-**44. Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-
-<br>
-
-**45. Parameter estimation**
-
-&#10230;
-
-<br>
-
-**46. Definitions**
-
-&#10230;
-
-<br>
-
-**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-
-<br>
-
-**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-
-<br>
-
-**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-
-<br>
-
-**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-
-<br>
-
-**51. Estimating the mean**
-
-&#10230;
-
-<br>
-
-**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
-
-&#10230;
-
-<br>
-
-**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-
-<br>
-
-**55. Estimating the variance**
-
-&#10230;
-
-<br>
-
-**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-
-<br>
-
-**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-
-<br>
-
-**59. [Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-
-<br>
-
-**60. [Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-
-<br>
-
-**61. [Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-
-<br>
-
-**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-
-<br>
-
-**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-
-<br>
-
-**64. [Parameter estimation, Mean, Variance]**
-
-&#10230;
diff --git a/hi/cheatsheet-deep-learning.md b/hi/cheatsheet-deep-learning.md
deleted file mode 100644
index a5aa3756c..000000000
--- a/hi/cheatsheet-deep-learning.md
+++ /dev/null
@@ -1,321 +0,0 @@
-**1. Deep Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Neural Networks**
-
-&#10230;
-
-<br>
-
-**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
-
-&#10230;
-
-<br>
-
-**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
-
-&#10230;
-
-<br>
-
-**5. [Input layer, hidden layer, output layer]**
-
-&#10230;
-
-<br>
-
-**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
-
-&#10230;
-
-<br>
-
-**7. where we note w, b, z the weight, bias and output respectively.**
-
-&#10230;
-
-<br>
-
-**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
-
-&#10230;
-
-<br>
-
-**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
-
-&#10230;
-
-<br>
-
-**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
-
-&#10230;
-
-<br>
-
-**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
-
-&#10230;
-
-<br>
-
-**13. As a result, the weight is updated as follows:**
-
-&#10230;
-
-<br>
-
-**14. Updating weights ― In a neural network, weights are updated as follows:**
-
-&#10230;
-
-<br>
-
-**15. Step 1: Take a batch of training data.**
-
-&#10230;
-
-<br>
-
-**16. Step 2: Perform forward propagation to obtain the corresponding loss.**
-
-&#10230;
-
-<br>
-
-**17. Step 3: Backpropagate the loss to get the gradients.**
-
-&#10230;
-
-<br>
-
-**18. Step 4: Use the gradients to update the weights of the network.**
-
-&#10230;
-
-<br>
-
-**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
-
-&#10230;
-
-<br>
-
-**20. Convolutional Neural Networks**
-
-&#10230;
-
-<br>
-
-**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
-
-&#10230;
-
-<br>
-
-**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
-
-&#10230;
-
-<br>
-
-**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
-
-&#10230;
-
-<br>
-
-**24. Recurrent Neural Networks**
-
-&#10230;
-
-<br>
-
-**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
-
-&#10230;
-
-<br>
-
-**26. [Input gate, forget gate, gate, output gate]**
-
-&#10230;
-
-<br>
-
-**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
-
-&#10230;
-
-<br>
-
-**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
-
-&#10230;
-
-<br>
-
-**29. Reinforcement Learning and Control**
-
-&#10230;
-
-<br>
-
-**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
-
-&#10230;
-
-<br>
-
-**33. S is the set of states**
-
-&#10230;
-
-<br>
-
-**34. A is the set of actions**
-
-&#10230;
-
-<br>
-
-**35. {Psa} are the state transition probabilities for s∈S and a∈A**
-
-&#10230;
-
-<br>
-
-**36. γ∈[0,1[ is the discount factor**
-
-&#10230;
-
-<br>
-
-**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
-
-&#10230;
-
-<br>
-
-**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
-
-&#10230;
-
-<br>
-
-**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
-
-&#10230;
-
-<br>
-
-**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
-
-&#10230;
-
-<br>
-
-**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
-
-&#10230;
-
-<br>
-
-**42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
-
-&#10230;
-
-<br>
-
-**43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
-
-&#10230;
-
-<br>
-
-**44. 1) We initialize the value:**
-
-&#10230;
-
-<br>
-
-**45. 2) We iterate the value based on the values before:**
-
-&#10230;
-
-<br>
-
-**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
-
-&#10230;
-
-<br>
-
-**47. times took action a in state s and got to s′**
-
-&#10230;
-
-<br>
-
-**48. times took action a in state s**
-
-&#10230;
-
-<br>
-
-**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
-
-&#10230;
-
-<br>
-
-**50. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-**51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
-
-&#10230;
-
-<br>
-
-**52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
-
-&#10230;
-
-<br>
-
-**53. [Recurrent Neural Networks, Gates, LSTM]**
-
-&#10230;
-
-<br>
-
-**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
-
-&#10230;
diff --git a/hi/cheatsheet-supervised-learning.md b/hi/cheatsheet-supervised-learning.md
deleted file mode 100644
index a6b19ea1c..000000000
--- a/hi/cheatsheet-supervised-learning.md
+++ /dev/null
@@ -1,567 +0,0 @@
-**1. Supervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Supervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
-
-&#10230;
-
-<br>
-
-**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**5. [Regression, Classifier, Outcome, Examples]**
-
-&#10230;
-
-<br>
-
-**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**7. Type of model ― The different models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
-
-&#10230;
-
-<br>
-
-**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**10. Notations and general concepts**
-
-&#10230;
-
-<br>
-
-**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
-
-&#10230;
-
-<br>
-
-**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
-
-&#10230;
-
-<br>
-
-**14. [Linear regression, Logistic regression, SVM, Neural Network]**
-
-&#10230;
-
-<br>
-
-**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
-
-&#10230;
-
-<br>
-
-**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
-
-&#10230;
-
-<br>
-
-**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
-
-&#10230;
-
-<br>
-
-**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
-
-&#10230;
-
-<br>
-
-**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
-
-&#10230;
-
-<br>
-
-**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
-
-&#10230;
-
-<br>
-
-**21. Linear models**
-
-&#10230;
-
-<br>
-
-**22. Linear regression**
-
-&#10230;
-
-<br>
-
-**23. We assume here that y|x;θ∼N(μ,σ2)**
-
-&#10230;
-
-<br>
-
-**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
-
-&#10230;
-
-<br>
-
-**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
-
-&#10230;
-
-<br>
-
-**26. Remark: the update rule is a particular case of the gradient ascent.**
-
-&#10230;
-
-<br>
-
-**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
-
-&#10230;
-
-<br>
-
-**28. Classification and logistic regression**
-
-&#10230;
-
-<br>
-
-**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
-
-&#10230;
-
-<br>
-
-**31. Remark: there is no closed form solution for the case of logistic regressions.**
-
-&#10230;
-
-<br>
-
-**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
-
-&#10230;
-
-<br>
-
-**33. Generalized Linear Models**
-
-&#10230;
-
-<br>
-
-**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
-
-&#10230;
-
-<br>
-
-**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
-
-&#10230;
-
-<br>
-
-**36. Here are the most common exponential distributions summed up in the following table:**
-
-&#10230;
-
-<br>
-
-**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
-
-&#10230;
-
-<br>
-
-**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
-
-&#10230;
-
-<br>
-
-**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
-
-&#10230;
-
-<br>
-
-**40. Support Vector Machines**
-
-&#10230;
-
-<br>
-
-**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
-
-&#10230;
-
-<br>
-
-**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
-
-&#10230;
-
-<br>
-
-**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
-
-&#10230;
-
-<br>
-
-**44. such that**
-
-&#10230;
-
-<br>
-
-**45. support vectors**
-
-&#10230;
-
-<br>
-
-**46. Remark: the line is defined as wTx−b=0.**
-
-&#10230;
-
-<br>
-
-**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
-
-&#10230;
-
-<br>
-
-**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
-
-&#10230;
-
-<br>
-
-**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
-
-&#10230;
-
-<br>
-
-**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
-
-&#10230;
-
-<br>
-
-**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the coefficients βi are called the Lagrange multipliers.**
-
-&#10230;
-
-<br>
-
-**54. Generative Learning**
-
-&#10230;
-
-<br>
-
-**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
-
-&#10230;
-
-<br>
-
-**56. Gaussian Discriminant Analysis**
-
-&#10230;
-
-<br>
-
-**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
-
-&#10230;
-
-<br>
-
-**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
-
-&#10230;
-
-<br>
-
-**59. Naive Bayes**
-
-&#10230;
-
-<br>
-
-**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
-
-&#10230;
-
-<br>
-
-**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
-
-&#10230;
-
-<br>
-
-**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
-
-&#10230;
-
-<br>
-
-**63. Tree-based and ensemble methods**
-
-&#10230;
-
-<br>
-
-**64. These methods can be used for both regression and classification problems.**
-
-&#10230;
-
-<br>
-
-**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
-
-&#10230;
-
-<br>
-
-**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
-
-&#10230;
-
-<br>
-
-**67. Remark: random forests are a type of ensemble methods.**
-
-&#10230;
-
-<br>
-
-**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**69. [Adaptive boosting, Gradient boosting]**
-
-&#10230;
-
-<br>
-
-**70. High weights are put on errors to improve at the next boosting step**
-
-&#10230;
-
-<br>
-
-**71. Weak learners trained on remaining errors**
-
-&#10230;
-
-<br>
-
-**72. Other non-parametric approaches**
-
-&#10230;
-
-<br>
-
-**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
-
-&#10230;
-
-<br>
-
-**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**75. Learning Theory**
-
-&#10230;
-
-<br>
-
-**76. Union bound ― Let A1,...,Ak be k events. We have:**
-
-&#10230;
-
-<br>
-
-**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
-
-&#10230;
-
-<br>
-
-**78. Remark: this inequality is also known as the Chernoff bound.**
-
-&#10230;
-
-<br>
-
-**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
-
-&#10230;
-
-<br>
-
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
-
-&#10230;
-
-<br>
-
-**81: the training and testing sets follow the same distribution **
-
-&#10230;
-
-<br>
-
-**82. the training examples are drawn independently**
-
-&#10230;
-
-<br>
-
-**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
-
-&#10230;
-
-<br>
-
-**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
-
-&#10230;
-
-<br>
-
-**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
-
-&#10230;
-
-<br>
-
-**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**88. [Introduction, Type of prediction, Type of model]**
-
-&#10230;
-
-<br>
-
-**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
-
-&#10230;
-
-<br>
-
-**90. [Linear models, linear regression, logistic regression, generalized linear models]**
-
-&#10230;
-
-<br>
-
-**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
-
-&#10230;
-
-<br>
-
-**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
-
-&#10230;
-
-<br>
-
-**94. [Other methods, k-NN]**
-
-&#10230;
-
-<br>
-
-**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
-
-&#10230;
diff --git a/hi/cheatsheet-unsupervised-learning.md b/hi/cheatsheet-unsupervised-learning.md
deleted file mode 100644
index d07b74750..000000000
--- a/hi/cheatsheet-unsupervised-learning.md
+++ /dev/null
@@ -1,340 +0,0 @@
-**1. Unsupervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Unsupervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
-
-&#10230;
-
-<br>
-
-**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
-
-&#10230;
-
-<br>
-
-**5. Clustering**
-
-&#10230;
-
-<br>
-
-**6. Expectation-Maximization**
-
-&#10230;
-
-<br>
-
-**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
-
-&#10230;
-
-<br>
-
-**8. [Setting, Latent variable z, Comments]**
-
-&#10230;
-
-<br>
-
-**9. [Mixture of k Gaussians, Factor analysis]**
-
-&#10230;
-
-<br>
-
-**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
-
-&#10230;
-
-<br>
-
-**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
-
-&#10230;
-
-<br>
-
-**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
-
-&#10230;
-
-<br>
-
-**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
-
-&#10230;
-
-<br>
-
-**14. k-means clustering**
-
-&#10230;
-
-<br>
-
-**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
-
-&#10230;
-
-<br>
-
-**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
-
-&#10230;
-
-<br>
-
-**17. [Means initialization, Cluster assignment, Means update, Convergence]**
-
-&#10230;
-
-<br>
-
-**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
-
-&#10230;
-
-<br>
-
-**19. Hierarchical clustering**
-
-&#10230;
-
-<br>
-
-**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
-
-&#10230;
-
-<br>
-
-**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**22. [Ward linkage, Average linkage, Complete linkage]**
-
-&#10230;
-
-<br>
-
-**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
-
-&#10230;
-
-<br>
-
-**24. Clustering assessment metrics**
-
-&#10230;
-
-<br>
-
-**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
-
-&#10230;
-
-<br>
-
-**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
-
-&#10230;
-
-<br>
-
-**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
-
-&#10230;
-
-<br>
-
-**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Dimension reduction**
-
-&#10230;
-
-<br>
-
-**30. Principal component analysis**
-
-&#10230;
-
-<br>
-
-**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
-
-&#10230;
-
-<br>
-
-**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**34. diagonal**
-
-&#10230;
-
-<br>
-
-**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
-
-&#10230;
-
-<br>
-
-**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
-dimensions by maximizing the variance of the data as follows:**
-
-&#10230;
-
-<br>
-
-**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
-
-&#10230;
-
-<br>
-
-**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
-
-&#10230;
-
-<br>
-
-**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
-
-&#10230;
-
-<br>
-
-**40. Step 4: Project the data on spanR(u1,...,uk).**
-
-&#10230;
-
-<br>
-
-**41. This procedure maximizes the variance among all k-dimensional spaces.**
-
-&#10230;
-
-<br>
-
-**42. [Data in feature space, Find principal components, Data in principal components space]**
-
-&#10230;
-
-<br>
-
-**43. Independent component analysis**
-
-&#10230;
-
-<br>
-
-**44. It is a technique meant to find the underlying generating sources.**
-
-&#10230;
-
-<br>
-
-**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
-
-&#10230;
-
-<br>
-
-**46. The goal is to find the unmixing matrix W=A−1.**
-
-&#10230;
-
-<br>
-
-**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
-
-&#10230;
-
-<br>
-
-**48. Write the probability of x=As=W−1s as:**
-
-&#10230;
-
-<br>
-
-**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
-
-&#10230;
-
-<br>
-
-**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
-
-&#10230;
-
-<br>
-
-**51. The Machine Learning cheatsheets are now available in Hindi.**
-
-&#10230;
-
-<br>
-
-**52. Original authors**
-
-&#10230;
-
-<br>
-
-**53. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**54. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**55. [Introduction, Motivation, Jensen's inequality]**
-
-&#10230;
-
-<br>
-
-**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
-
-&#10230;
-
-<br>
-
-**57. [Dimension reduction, PCA, ICA]**
-
-&#10230;
diff --git a/hi/refresher-linear-algebra.md b/hi/refresher-linear-algebra.md
deleted file mode 100644
index a6b440d1e..000000000
--- a/hi/refresher-linear-algebra.md
+++ /dev/null
@@ -1,339 +0,0 @@
-**1. Linear Algebra and Calculus refresher**
-
-&#10230;
-
-<br>
-
-**2. General notations**
-
-&#10230;
-
-<br>
-
-**3. Definitions**
-
-&#10230;
-
-<br>
-
-**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
-
-&#10230;
-
-<br>
-
-**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
-
-&#10230;
-
-<br>
-
-**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
-
-&#10230;
-
-<br>
-
-**7. Main matrices**
-
-&#10230;
-
-<br>
-
-**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
-
-&#10230;
-
-<br>
-
-**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we also note D as diag(d1,...,dn).**
-
-&#10230;
-
-<br>
-
-**12. Matrix operations**
-
-&#10230;
-
-<br>
-
-**13. Multiplication**
-
-&#10230;
-
-<br>
-
-**14. Vector-vector ― There are two types of vector-vector products:**
-
-&#10230;
-
-<br>
-
-**15. inner product: for x,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**16. outer product: for x∈Rm,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
-
-&#10230;
-
-<br>
-
-**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
-
-&#10230;
-
-<br>
-
-**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
-
-&#10230;
-
-<br>
-
-**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
-
-&#10230;
-
-<br>
-
-**21. Other operations**
-
-&#10230;
-
-<br>
-
-**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
-
-&#10230;
-
-<br>
-
-**23. Remark: for matrices A,B, we have (AB)T=BTAT**
-
-&#10230;
-
-<br>
-
-**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
-
-&#10230;
-
-<br>
-
-**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
-
-&#10230;
-
-<br>
-
-**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
-
-&#10230;
-
-<br>
-
-**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
-
-&#10230;
-
-<br>
-
-**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
-
-&#10230;
-
-<br>
-
-**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
-
-&#10230;
-
-<br>
-
-**30. Matrix properties**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
-
-&#10230;
-
-<br>
-
-**33. [Symmetric, Antisymmetric]**
-
-&#10230;
-
-<br>
-
-**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
-
-&#10230;
-
-<br>
-
-**35. N(ax)=|a|N(x) for a scalar**
-
-&#10230;
-
-<br>
-
-**36. if N(x)=0, then x=0**
-
-&#10230;
-
-<br>
-
-**37. For x∈V, the most commonly used norms are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**38. [Norm, Notation, Definition, Use case]**
-
-&#10230;
-
-<br>
-
-**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
-
-&#10230;
-
-<br>
-
-**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
-
-&#10230;
-
-<br>
-
-**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
-
-&#10230;
-
-<br>
-
-**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
-
-&#10230;
-
-<br>
-
-**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
-
-&#10230;
-
-<br>
-
-**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**46. diagonal**
-
-&#10230;
-
-<br>
-
-**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
-
-&#10230;
-
-<br>
-
-**48. Matrix calculus**
-
-&#10230;
-
-<br>
-
-**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
-
-&#10230;
-
-<br>
-
-**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
-
-&#10230;
-
-<br>
-
-**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
-
-&#10230;
-
-<br>
-
-**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
-
-&#10230;
-
-<br>
-
-**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
-
-&#10230;
-
-<br>
-
-**54. [General notations, Definitions, Main matrices]**
-
-&#10230;
-
-<br>
-
-**55. [Matrix operations, Multiplication, Other operations]**
-
-&#10230;
-
-<br>
-
-**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
-
-&#10230;
-
-<br>
-
-**57. [Matrix calculus, Gradient, Hessian, Operations]**
-
-&#10230;
diff --git a/hi/refresher-probability.md b/hi/refresher-probability.md
deleted file mode 100644
index 5c9b34656..000000000
--- a/hi/refresher-probability.md
+++ /dev/null
@@ -1,381 +0,0 @@
-**1. Probabilities and Statistics refresher**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Probability and Combinatorics**
-
-&#10230;
-
-<br>
-
-**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-
-<br>
-
-**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-
-<br>
-
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-
-<br>
-
-**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-
-<br>
-
-**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-
-<br>
-
-**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-
-<br>
-
-**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-
-<br>
-
-**12. Conditional Probability**
-
-&#10230;
-
-<br>
-
-**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-
-<br>
-
-**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-
-<br>
-
-**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-
-<br>
-
-**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-
-<br>
-
-**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-
-<br>
-
-**18. Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-
-<br>
-
-**19. Random Variables**
-
-&#10230;
-
-<br>
-
-**20. Definitions**
-
-&#10230;
-
-<br>
-
-**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-
-<br>
-
-**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-
-<br>
-
-**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-
-<br>
-
-**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-
-<br>
-
-**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-
-<br>
-
-**26. [Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-
-<br>
-
-**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-
-<br>
-
-**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-
-<br>
-
-**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-
-<br>
-
-**32. Probability Distributions**
-
-&#10230;
-
-<br>
-
-**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-
-<br>
-
-**34. Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-
-<br>
-
-**35. [Type, Distribution]**
-
-&#10230;
-
-<br>
-
-**36. Jointly Distributed Random Variables**
-
-&#10230;
-
-<br>
-
-**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-
-<br>
-
-**38. [Case, Marginal density, Cumulative function]**
-
-&#10230;
-
-<br>
-
-**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**40. Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-
-<br>
-
-**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-
-<br>
-
-**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-
-<br>
-
-**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-
-<br>
-
-**44. Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-
-<br>
-
-**45. Parameter estimation**
-
-&#10230;
-
-<br>
-
-**46. Definitions**
-
-&#10230;
-
-<br>
-
-**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-
-<br>
-
-**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-
-<br>
-
-**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-
-<br>
-
-**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-
-<br>
-
-**51. Estimating the mean**
-
-&#10230;
-
-<br>
-
-**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
-
-&#10230;
-
-<br>
-
-**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-
-<br>
-
-**55. Estimating the variance**
-
-&#10230;
-
-<br>
-
-**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-
-<br>
-
-**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-
-<br>
-
-**59. [Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-
-<br>
-
-**60. [Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-
-<br>
-
-**61. [Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-
-<br>
-
-**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-
-<br>
-
-**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-
-<br>
-
-**64. [Parameter estimation, Mean, Variance]**
-
-&#10230;
diff --git a/id/cs-230-convolutional-neural-networks.md b/id/cs-230-convolutional-neural-networks.md
new file mode 100644
index 000000000..4f22dfc35
--- /dev/null
+++ b/id/cs-230-convolutional-neural-networks.md
@@ -0,0 +1,715 @@
+**Convolutional Neural Networks translation**
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230;Cheatsheet Convolutional Neural Network
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230;Deep Learning
+
+<br>
+
+
+**3. [Intisari, Struktur arsitektur]**
+
+&#10230;[Overview, Struktur Arsitektur]
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230;[Jenis-jenis layer, Konvolusi, Pooling, Fully connected]
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230;[Hiperparameter filter, Dimensi, Stride, Padding]
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230;[Penyetelan hiperparameter, Kesesuaian parameter, Kompleksitas model, Receptive field]
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230;[Fungsi-fungsi aktifasi, Rectified Linear Unit, Softmax]
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230;[Deteksi objek, Tipe-tipe model, Deteksi, Intersection over Union, Non-max suppression, YOLO, R-CNN]
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230;[Verifikasi/pengenal wajah, One shot learning, Siamese network, Loss triplet]
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230;[Transfer neural style, Aktifasi, Matriks style, Fungsi cost style/konten]
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230;[Arkitektur trik komputasional, Generative Adversarial Net, ResNet, Inception Network]
+
+<br>
+
+
+**12. Overview**
+
+&#10230;Ringkasan
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230;Arkitektur dari sebuah tradisional CNN - Convolutional neural network, juga dikenal sebagai CNN, adalah sebuah tipe khusus dari neural network yang secara umum terdiri dari layer-layer berikut:
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230;Layer konvolusi and layer pooling dapat disesuaikan terhadap hiperparameter yang dijelaskan pada bagian selanjutnya.
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230;Jenis-jenis layer
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230;Layer convolution - Layer convolution (CONV) menggunakan banyak filter yang dapat melakukan operasi konvolusi karena CONV memindai input I dengan memperhatikan dimensinya. Hiperparameter dari CONV meliputi ukuran filter F dan stride S. Keluaran hasil O disebut feature map atau activation map.
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230;Catatan: tahap konvolusi dapat digeneralisasi juga dalam kasus 1D dan 3D.
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230;Pooling (POOL) - Layer pooling adalah sebuah operasi downsampling, biasanya diaplikasikan setelah lapisan konvolusi, yang menyebabkan invarian spasial. Pada khususnya, pooling max dan average merupakan jenis-jenis pooling spesial di mana masing-masing nilai maksimal dan rata-rata diambil.
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230;[Jenis, Tujuan, Ilustrasi, Komentar]
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230;[Max pooling, Average pooling, Setiap operasi pooling mewakili nilai maksimal dari tampilan terbaru, setiap operasi pooling meratakan nilai-nilai dari tampilan terbaru]
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230;[Mempertahankan fitur yang terdeteksi, yang paling sering digunakan, Downsamples feature map, dipakai di LeNet]
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230;Fully Connected (FC) - Fully connected layer (FC) menangani sebuah masukan dijadikan 1D ddi mana setiap masukan terhubung ke seluruh neuron. Bila ada, lapisan-lapisan FC biasanya ditemukan pada akhir arsitektur CNN dan dapat digunakan untuk mengoptimalkan hasil seperti skor-skor kelas (pada kasus klasifikasi).
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230;Hiperparameter filter
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230;Layer konvolusi mengandung penyaring yang penting untuk dimengerti tentang maksud dari penyaring hiperparameter tersebut.
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230;Dimensi dari sebuah filter - Sebuah filter dengan ukuran FxF diaplikasikan pada sebuah input yang memuat C channel memiliki volume FxFxC yang melakukan konvolusi pada sebuah input masukan dengan ukuran IxIxC dan menghasilkan sebuah keluaran feature map (juga dikenal activation map) dengan ukuran O×O×1
+
+<br>
+
+
+**26. Filter**
+
+&#10230;Filter
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230;Catatan: pengaplikasian dari penyaring F dengan ukuran FxF menghasilkan sebuah keluaran fitur peta dengan ukuran O×O×K.
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230;Stride - Untuk sebuah konvolusi atau sebauh operasi pooling, stide S melambangkan jumlah pixel yang dilewati window setelah setiap operasi.
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230;Zero-padding - Zero-padding melambangkan proses penambahan P nilai 0 pada setiap sisi akhir dari masukan. Nilai dari zero-padding dapat dispesifikasikan secara manual atau secara otomatis melalui salah satu dari tiga mode yang dijelaskan dibawah ini:
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230;[Mode, Nilai, Ilustrasi, Tujuan, Valid, Same, Full]
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230;[No padding, Hapus konvolusi terakhir jika dimensi tidak sesuai, Padding yang menghasilkan feature map dengan ukuran ⌈IS⌉, Ukuran keluaran cocok secara matematis, Juga disebut 'half' padding, Maximum padding menjadikan akhir konvolusi dipasangkan pada batasan dari input, Filter 'melihat' masukan end-to-end]
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230;Menyetel hiperparameter
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230;Kompabilitas parameter pada lapisan konvolusi - Dengan menuliskan I sebagai panjang dari ukuran volume masukan, F sebagai panjang dari filter, P sebagai jumlah dari zero padding, S sebagai stride, maka ukuran keluaran 0 dari feature map pada dimensi tersebut ditandai dengan:
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230;[Masukan, Filter, Keluaran]
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230;Catatan: sering, Pstart=Pend≜P, pada kasus tersebut kita dapat mengganti Pstart+Pend dengan 2P pada formula di atas.
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230;Memahami kompleksitas dari model - Untuk menilai kompleksitas dari sebuah model, sangatlah penting untuk menentukan jumlah parameter yang arsitektur dari model akan miliki. Pada sebuah convolutional neural network, hal tersebut dilakukan sebagai berikut:
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230;[Ilustrasi, Ukuran masukan, Ukuran keluaran, Jumlah parameter, Catatan]
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230;[Satu parameter bias per filter, Pada banyak kasus, S>F, sebuah pilihan umum untuk K adalah 2C]
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230;[Operasi pooling yang dilakukan dengan channel-wise, Pada banyak kasus, S=F]
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230;[Masukan diratakan, satu parameter bias untuk setiap neuron, Jumlah dari neuron FC adalah terbebas dari batasan struktural]
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230;Receptive field - Receptive field pada layer k adalah area yang dinotasikan RkxRk dari masukan yang setiap pixel dari k-th activation map dapat "melihat". Dengan menyebut Fj (sebagai) ukuran penyaring dari lapisan j dan Si (sebagai) nilai stride dari lapisan i dan dengan konvensi 50=1, receptive field pada lapisan k dapat dihitung dengan formula:
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230;Pada contoh dibawah ini, kita memiliki F1=F2=3 dan S1=S2=1, yang menghasilkan R2=1+2⋅1+2⋅1=5.
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230;Fungsi-fungsi aktifasi yang biasa dipakai
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230;Rectified Linear Unit - Layer rectified linear unit (ReLU) adalah sebuat fungsi aktivasi g yang digunakan pada seluruh elemen volume. Unit ini bertujuan untuk menempatkan non-linearitas pada jaringan. Variasi-variasi ReLU ini dirangkum pada tabel di bawah ini:
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230;[ReLU, Leaky ReLU, ELU, dengan]
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230;[Kompleksitas non-linearitas yang dapat ditafsirkan secara biologi, Menangani permasalahan dying ReLU yang bernilai negatif, Yang dapat dibedakan di mana pun]
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230;Softmax - Langkah softmax dapat dilihat sebagai sebuah fungsi logistik umum yang berperan sebagai masukan dari nilai skor vektor x∈Rn dan mengualarkan probabilitas produk vektor p∈Rn melalui sebuah fungsi softmax pada akhir dari jaringan arsitektur. Softmax didefinisikan sebagai berikut:
+
+<br>
+
+
+**48. where**
+
+&#10230;Di mana
+
+<br>
+
+
+**49. Object detection**
+
+&#10230;Deteksi objek
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230;Tipe-tipe model - Ada tiga tipe utama dari algoritma rekognisi objek, yang mana hakikat yang diprediksi tersebut berbeda. Tipe-tipe tersebut dijelaskan pada tabel di bawah ini:
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230;[Klasifikasi gambar, Klasifikasi w. lokalisasi, Deteksi]
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230;[Boneka beruang, Buku]
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230;[Mengklasifikasikan sebuah gambar, Memprediksi probabilitas dari objek, Mendeteksi objek pada sebuah gambar, Memprediksi probabilitas dari objek dan lokasinya pada gambar, Mendeteksi hingga beberapa objek pada sebuah gambar, Memprediksi probabilitas dari objek-objek dan dimana lokasi mereka]
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230;[CNN tradisional, Simplified YOLO, R-CNN, YOLO, R-CNN]
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230;Deteksi - Pada objek deteksi, metode yang berbeda digunakan tergantung apakah kita hanya ingin untuk mengetahui lokasi objek atau mendeteksi sebuah bentuk yang lebih rumit pada gambar. Dua metode yang utama dirangkum pada tabel dibawah ini:
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230;[Deteksi bounding box, Deteksi landmark]
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230;[Mendeteksi bagian dari gambar dinama objek berlokasi, Mendetek bentuk atau karakteristik dari sebuah objek (contoh: mata), Lebih granular]
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230;[Pusat dari box (bx,by), tinggi bh dan lebah bw, Poin referensi (l1x,l1y), ..., (lnx,lny)]
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230;[Intersection over Union - Intersection over Union, juga dikenal sebagai IoU, adalah sebuah fungsi yang mengkuantifikasi seberapa benar posisi dari sebuah prediksi bounding box Bp terhadap bounding box yang sebenarnya Ba. IoU didefinisikan sebagai berikut:]
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230;Perlu diperhatikan: kita selalu memiliki nilai IoU∈[0,1]. Umumnya, sebuah prediksi bounding box dianggap cukup bagus jika IoU(Bp,Ba)⩾0.5.
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230;Anchor boxes ― Anchor boxing adalah sebuah teknik yang digunakan untuk memprediksi bounding box yang overlap. Pada pengaplikasiannya, network diperbolehkan untuk memprediksi lebih dari satu box secara bersamaan, dimana setiap prediksi box dibatasi untuk memiliki kumpulan properti geometri. Contohnya, prediksi pertama dapat berupa sebuah box persegi panjang untuk sebuah bentuk, sedangkan prediksi kedua adalah persegi panjang lainnya dengan bentuk geometri yang berbeda.
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230;Non-max suppression ― Teknik non-max suppression bertujuan untuk menghapus duplikasi bounding box yang overlap satu sama lain dari sebuah objek yang sama dengan memilih box yang paling representatif. Setelah menghapus seluruh box dengan prediksi probability lebih kecil dari 0.6, langkah berikut diulang selama terdapat box tersisa.
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230;[Untuk sebuah kelas, Langkah 1: Pilih box dengan probabilitas prediksi tertinggi., Langkah 2: Singkirkan box manapun yang yang memiliki IoU⩾0.5 dengan box yang dipilih pada tahap 1.]
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230;[Prediksi-prediksi box, Seleksi box dari probabilitas tertinggi, Penghapusan overlap pada kelas yang sama, Bounding box akhir]
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230;YOLO - You Only Look Once (YOLO) adalah sebuah algoritma deteksi objek yang melakukan langkah-langkah berikut:
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230;Langkah 1: Bagi gambar masukan kedalam sebuah grid dengan ukuran GxG, Langkah 2: Untuk setiap sel grid, gunakan sebuah CNN yang memprediksi y dengan bentuk sebagai berikut; lakukan sebanyak k kali]
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230;dimana pc adalah deteksi probabilitas dari sebuah objek, bx,by,bh,bw adalah properti dari box bounding yang terdeteksi, c1,...,cp adalah representasi one-hot yang mana p classes terdeteksi, dan k adalah jumlah box anchor.
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230;Langkah 3: Jalankan algoritma non-max suppression yang menghapus duplikasi potensial yang mengoverlap box bounding yang sebenarnya.
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230;[Gambar asli, Pembagian kedalam grid berukuran GxG, Prediksi box bounding, Non-max suppression]
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230;Perlu diperhatikan: ketika pc=0, maka netwok tidak mendeteksi objek apapun. Pada kasus seperti itu, prediksi yang bersangkutan bx,...,cp harus diabaikan.
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230;R-CNN ― Region with Convolutional Neural Networks (R-CNN) adalah sebuah algoritma objek deteksi yang pertama-tama mensegmentasi gambar untuk menemukan potensial box-box bounding yang relevan dan selanjutnya menjalankan algoritma deteksi untuk menemukan objek yang paling memungkinkan pada box-box bounding tersebut..
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230;[Gambar asli, Segmentasi, Prediksi box bounding, Non-max suppressio]
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230;Perlu diperhatikan: meskipun algoritma asli dari R-CNN membutuhkan komputasi resource yang besar dan lambar, arsitektur terbaru memungkinan algoritma untuk memiliki performa yang lebih cepat, yang dikenal sebagai Fast R-CNN dan Faster R-CNN.
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230;Verifikasi wajah dan rekognisi
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230;Jenis-jenis model - Dua jenis tipe utama dirangkum pada tabel dibawah ini:
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230;[Ferivikasi wajah, Rekognisi wajah, Query, Referensi, Database]
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230;[Apakah ini adalah orang yang sesuai?, One-to-one lookup, Apakah ini salah satu dari K orang pada database?, One-to-many lookup]
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230;One Shot Learning ― One Shot Learning adalah sebuah algoritma verifikasi wajah yang menggunakan sebuah training set yang terbatas untuk belajar fungsi kemiripan yang mengkuantifikasi seberapa berbeda dua gambar yang diberikan. Fungsi kemiripan yang diaplikasikan pada dua gambar sering dinotasikan sebagai d(image 1,image 2).
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230;Siamese Network ― Siamese Networks didesain untuk mengkodekan gambar dan mengkuantifikasi seberapa berbeda dua buah gambar. Untuk sebuah gambar masukan x(i), keluaran yang dikodekan sering dinotasikan sebagai f(x(i)).
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230;Loss triplet - Loss triplet adalah sebuah fungsi loss yang dihitung pada representasi embedding dari sebuah tiga pasang gambar A (anchor), P (positif) dan N (negatif). Sampel anchor dan positif berdasal dari sebuah kelas yang sama, sedangkan sampel negatif berasal dari kelas yang lain. Dengan menuliskan α∈R+ sebagai parameter margin, fungsi loss ini dapat didefinisikan sebagai berikut:
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230;Transfer neural style
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230;Motivasi: Tujuan dari mentransfer Neural style adalah untuk menghasilakn sebuah gambar G berdasarkan sebuah konten dan sebuah style S.
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230;[Konten C, Style S, gambar yang dihasilkan G]
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230;Aktifasi - Pada sebuah layer l, aktifasi dinotasikan sebagai a[l] dan berdimensi nH×nw×nc
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230;Fungsi cost content - Fungsi cost content Jcontent(C,G) digunakan untuk menghitung perbedaan antara gambar yang dihasilkan dan gambar konten yang sebenarnya C. Fungsi cost content didefinsikan sebagai berikut:
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230;Matriks style - Matriks style G[l] dari sebuah layer l adalah sebuah matrix Gram dimana setiap elemennya G[l]kk′ mengkuantifikasi seberapa besar korelasi antara channel k dan k'. Matriks style didefinisikan terhadap aktifasi a[l] sebagai berikut:
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230;Perlu diperhatikan: matriks style untuk gambar style dan gambar yang dihasilkan masing-masing dituliskan sebagai G[l] (S) dan G[l] (G).
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230;Fungsi cost style - Fungsi cost style Jstyle(S,G) digunakan untuk menentukan perbedaan antara gambar yang dihasilkan G dengan style yang diberikan S. Fungsi tersebut definisikan sebagai berikut:
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230;Fungsi cost overall - Fungsi cost overall didefinisikan sebagai sebuah kombinasi dari fungsi cost konten dan syle, dibobotkan oleh parameter α,β, sebagai berikut:
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230;Perlu diperhatikan: semakin tinggi nilai α akan membuat model lebih memperhatikan konten sedangkan semakin tinggi nilai β akan membuat model lebih memprehatikan style.
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230;Arsitektur menggunakan trik komputasi.
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230;Generative Adversarial Network - Generative adversarial networks, juga dikenala sebagai GANs, terdiri dari sebuah generatif dan diskriminatif  model , dimana generatif model didesain untuk menghasilkan keluaran palsu yang mendekati keluaran sebenarnya yang akan diberikan kepada diskriminatif model yang didesain untuk membedakan gambar palsu dan gambar sebenarnya.
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230;[Training, Noise, Gambar real-world, Generator, Discriminator, Real Fake]
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230;Perlu diperhatikan: penggunaan dari variasi GANs meliputi sistem yang dapat mengubah teks ke gambar, dan menghasilkan dan mensintese musik.
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230; ResNet ― Arsitektur Residual Network (juga disebut ResNet) menggunakan blok-blok residual dengan jumlah layer yang banyak untuk mengurangi training error. Blok residual memiliki karakteristik formula sebagai berikut:
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230;Inception Network ― Arsitektur ini menggunakan modul inception dan didesain dengan tujuan untuk meningkatkan performa network melalu diversifikasi fitur dengan menggunakan CNN yang berbeda-beda. Khususnya, inception model menggunakan trik 1×1 CNN untuk membatasi beban komputasi.
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230;Deep Learning cheatsheet sekarang tersedia di [Bahasa Indonesia]
+
+<br>
+
+
+**98. Original authors**
+
+&#10230;Penulis asli
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230;Diterjemahkan oleh X, Y dan Z
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230;Diulas oleh X, Y dan Z
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230;Lihat versi PDF pada GitHub
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230;Oleh X dan Y
+
+<br>
diff --git a/he/refresher-linear-algebra.md b/ja/cs-229-linear-algebra.md
similarity index 51%
rename from he/refresher-linear-algebra.md
rename to ja/cs-229-linear-algebra.md
index a6b440d1e..c806cb4ca 100644
--- a/he/refresher-linear-algebra.md
+++ b/ja/cs-229-linear-algebra.md
@@ -1,339 +1,342 @@
 **1. Linear Algebra and Calculus refresher**
 
 &#10230;
-
+線形代数と微積分の復習
 <br>
 
 **2. General notations**
 
 &#10230;
-
+一般表記
 <br>
 
 **3. Definitions**
 
 &#10230;
-
+定義
 <br>
 
 **4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
 
 &#10230;
-
+ベクトル - x∈Rn はn個の要素を持つベクトルを表し、xi∈R はi番目の要素を表します。
 <br>
 
 **5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
 
 &#10230;
-
+行列 - m行n列の行列を A∈Rm×n と表記し、Ai,j∈R はi行目のj列目の要素を指します。
 <br>
 
 **6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
 
 &#10230;
-
+備考：上記で定義されたベクトル x は n×1 の行列と見なすことができ、列ベクトルと呼ばれます。
 <br>
 
 **7. Main matrices**
 
 &#10230;
-
+主な行列の種類
 <br>
 
 **8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
 
 &#10230;
-
+単位行列 - 単位行列 I∈Rn×n は、対角成分に 1 が並び、他は全て 0 となる正方行列です。
 <br>
 
 **9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
 
 &#10230;
-
+備考：すべての行列 A∈Rn×n に対して、A×I=I×A=A となります。
 <br>
 
 **10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
 
 &#10230;
-
+対角行列 - 対角行列 D∈Rn×n は、対角成分の値が 0 以外で、それ以外は 0 である正方行列です。
 <br>
 
 **11. Remark: we also note D as diag(d1,...,dn).**
 
 &#10230;
-
+備考：Dをdiag(d1,...,dn) とも表記します。
 <br>
 
 **12. Matrix operations**
 
 &#10230;
-
+行列演算
 <br>
 
 **13. Multiplication**
 
 &#10230;
-
+行列乗算
 <br>
 
 **14. Vector-vector ― There are two types of vector-vector products:**
 
 &#10230;
-
+ベクトル-ベクトル - ベクトル-ベクトル積には2種類あります。
 <br>
 
 **15. inner product: for x,y∈Rn, we have:**
 
 &#10230;
-
+内積: x,y∈Rn に対して、内積の定義は下記の通りです:
 <br>
 
 **16. outer product: for x∈Rm,y∈Rn, we have:**
 
 &#10230;
-
+外積: x∈Rm,y∈Rn に対して、外積の定義は下記の通りです:
 <br>
 
 **17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
 
 &#10230;
-
+行列-ベクトル - 行列 A∈Rm×n とベクトル x∈Rn の積は以下の条件を満たすようなサイズ Rn のベクトルです。
 <br>
 
 **18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
 
 &#10230;
-
+上記 aTr,i は A の行ベクトルで、ac,j は A の列ベクトルです。 xi は x の要素です。
 <br>
 
 **19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
 
 &#10230;
-
+行列-行列 - 行列 A∈Rm×n と B∈Rn×p の積は以下の条件を満たすようなサイズ Rm×p の行列です。 (There is a typo in the original: Rn×p)
 <br>
 
 **20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
 
 &#10230;
-
+aTr,i,bTr,i は A と B の行ベクトルで　ac,j,bc,j は A と B の列ベクトルです。
 <br>
 
 **21. Other operations**
 
 &#10230;
-
+その他の演算
 <br>
 
 **22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
 
 &#10230;
-
+転置 ― A∈Rm×n の転置行列は AT と表記し、A の行列要素が交換した行列です。
 <br>
 
 **23. Remark: for matrices A,B, we have (AB)T=BTAT**
 
 &#10230;
-
+備考： 行列AとBの場合、(AB)T=BTAT** となります。
 <br>
 
 **24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
 
 &#10230;
-
+逆行列 ― 可逆正方行列 A の逆行列は A-1 と表記し、 以下の条件を満たす唯一の行列です。
 <br>
 
 **25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
 
 &#10230;
-
+備考： すべての正方行列が可逆とは限りません。　行列 A,B については、(AB)−1=B−1A−1
 <br>
 
 **26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
 
 &#10230;
-
+跡 - 正方行列 A の跡は、tr(A) と表記し、その対角成分の要素の和です。
 <br>
 
 **27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
 
 &#10230;
-
+備考： 行列 A,B の場合：　tr(AT)=tr(A) と tr(AB)=tr(BA) となります。
 <br>
 
 **28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
 
 &#10230;
-
+行列式 ― 正方行列 A∈Rn×n の行列式は |A| または det(A) と表記し、以下のように i番目の行とj番目の列を抜いた行列A、Aij によって再帰的に表現されます。
+ それはi番目の行とj番目の列のない行列Aです。 次のように：
 <br>
 
 **29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
 
 &#10230;
-
+備考： |A|≠0の場合に限り、行列は可逆行列です。また |AB|=|A||B| と |AT|=|A|。
 <br>
 
 **30. Matrix properties**
 
 &#10230;
-
+行列の性質
 <br>
 
 **31. Definitions**
 
 &#10230;
-
+定義
 <br>
 
 **32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
 
 &#10230;
-
+対称分解 ― 行列Aは次のように対称および反対称的な部分で表現できます。
 <br>
 
 **33. [Symmetric, Antisymmetric]**
 
 &#10230;
-
+[対称、反対称]
 <br>
 
 **34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
 
 &#10230;
-
+ノルムは関数N:V⟶[0,+∞[　Vはすべての x,y∈V に対して、以下の条件を満たすようなベクトル空間です。
+]]
 <br>
 
 **35. N(ax)=|a|N(x) for a scalar**
 
 &#10230;
-
+スカラー a に対して N(ax)=|a|N(x) 
 <br>
 
 **36. if N(x)=0, then x=0**
 
 &#10230;
-
+N（x）= 0ならば x = 0
 <br>
 
 **37. For x∈V, the most commonly used norms are summed up in the table below:**
 
 &#10230;
-
+x∈Vに対して、最も多用されているノルムは、以下の表にまとめられています。
 <br>
 
 **38. [Norm, Notation, Definition, Use case]**
 
 &#10230;
-
+[ノルム、表記法、定義、使用事例]
 <br>
 
 **39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
 
 &#10230;
-
+線形従属 ― ベクトルの集合に対して、少なくともどれか一つのベクトルを他のベクトルの線形結合として定義できる場合、その集合が線形従属であるといいます。
 <br>
 
 **40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
 
 &#10230;
-
+備考：この方法でベクトルを書くことができない場合、ベクトルは線形独立していると言われます。
 <br>
 
 **41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
 
 &#10230;
-
+行列の階数　―　行列Aの階数は rank(A) と表記し、列空間の次元を表します。これは、Aの線形独立の列の最大数に相当します。
 <br>
 
 **42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
 
 &#10230;
-
+半正定値行列 ― 行列A、A∈Rn×nに対して、以下の式が成り立つならば、 Aを半正定値(PSD)といい、A⪰0 と表記します。
 <br>
 
 **43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
 
 &#10230;
-
+備考：　同様に、全ての非ゼロベクトルx、xTAx>0 に対して条件を満たすような行列Aは正定値行列といい、A≻0 と表記します。
 <br>
 
 **44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
 &#10230;
-
+固有値、固有ベクトル　―　行列A、A∈Rn×n に対して、以下の条件を満たすようなベクトルz、z∈Rn∖{0} が存在するならば、λ は固有値といい、z は固有ベクトルといいます。
 <br>
 
 **45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
 &#10230;
-
+スペクトル定理 ― A∈Rn×n とします。A が対称ならば、A は実直交行列 U∈Rn×n によって対角化可能です。Λ=diag(λ1,...,λn) と表記すると、次のように表現できます。
 <br>
 
 **46. diagonal**
 
 &#10230;
-
+対角
 <br>
 
 **47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
 
 &#10230;
-
+特異値分解 ― A を m×n の行列とします。特異値分解（SVD）は、ユニタリ行列 U m×m、Σ m×n の対角行列、およびユニタリ行列 V n×n の存在を保証する因数分解手法で、以下の条件を満たします。
 <br>
 
 **48. Matrix calculus**
 
 &#10230;
-
+行列微積分
 <br>
 
 **49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
 
 &#10230;
-
+勾配 ― f:Rm×n→R を関数とし、A∈Rm×n を行列とします。 A に対する f の勾配は m×n 行列で、∇Af(A) と表記し、次の条件を満たします。
 <br>
 
 **50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
 
 &#10230;
-
+備考：　f の勾配は、f がスカラーを返す関数であるときに限り存在します。
 <br>
 
 **51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
 
 &#10230;
-
+ヘッセ行列 ― f:Rn→R を関数とし、x∈Rn をベクトルとします。 x に対する f のヘッセ行列は、n×n 対称行列で ∇2xf(x) と表記し、以下の条件を満たします。
 <br>
 
 **52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
 
 &#10230;
-
+備考：　f のヘッセ行列は、f がスカラーを返す関数である場合に限り存在します。
 <br>
 
 **53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
 
 &#10230;
-
+勾配演算 ― 行列 A,B,C の場合、特に以下の勾配の性質を意識する甲斐があります。
 <br>
 
 **54. [General notations, Definitions, Main matrices]**
 
 &#10230;
-
+[表記, 定義, 主な行列の種類]
 <br>
 
 **55. [Matrix operations, Multiplication, Other operations]**
 
 &#10230;
-
+[行列演算, 乗算, その他の演算]
 <br>
 
 **56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
 
 &#10230;
-
+[行列特性, 行列ノルム, 固有値/固有ベクトル, 特異値分解]
 <br>
 
 **57. [Matrix calculus, Gradient, Hessian, Operations]**
 
 &#10230;
+[行列微積分, 勾配, ヘッセ行列, 演算]
diff --git a/ja/cs-229-probability.md b/ja/cs-229-probability.md
new file mode 100644
index 000000000..16fca9ea5
--- /dev/null
+++ b/ja/cs-229-probability.md
@@ -0,0 +1,381 @@
+**1. Probabilities and Statistics refresher**
+
+&#10230;確率と統計の復習
+
+<br>
+
+**2. Introduction to Probability and Combinatorics**
+
+&#10230;確率と組合せの導入
+
+<br>
+
+**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
+
+&#10230;標本空間 - ある試行のすべての起こりうる結果の集合はその試行の標本空間として知られ、Sと表します。
+
+<br>
+
+**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
+
+&#10230;事象 - 標本空間の任意の部分集合Eを事象と言います。つまり、ある事象はある試行の起こりうる結果により構成された集合です。ある試行結果がEに含まれるなら、Eが起きたと言います。
+
+<br>
+
+**5. Axioms of probability ― For each event E, we denote P(E) as the probability of event E occuring.**
+
+&#10230;確率の公理 - 各事象Eに対して、事象Eが起こる確率をP(E)と書きます。
+
+<br>
+
+**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
+
+&#10230;公理1 - すべての確率は0と1を含んでその間にあります。すなわち：
+
+<br>
+
+**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
+
+&#10230;公理2 - 標本空間全体において少なくとも一つの根元事象が起こる確率は1です。すなわち：
+
+<br>
+
+**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
+
+&#10230;公理3 - 互いに排反な事象の任意の数列E1,...,Enに対し、次が成り立ちます：
+
+<br>
+
+**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
+
+&#10230;順列(Permutation) - 順列とはn個のものの中からr個をある順序で並べた配列です。このような配列の数はP(n,r)と表し、次のように定義します:
+
+<br>
+
+**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
+
+&#10230;組合せ(Combination) - 組合せはn個の中からr個の順番を勘案しない配列です。このような配列の数はC(n,r)と表し、次のように定義します:
+
+<br>
+
+**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
+
+&#10230;注釈: 0⩽r⩽nのとき、P(n,r)⩾C(n,r)となります。
+
+<br>
+
+**12. Conditional Probability**
+
+&#10230;条件付き確率
+
+<br>
+
+**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
+
+&#10230;ベイズの定理 - P(B)>0であるような事象A, Bに対して、次が成り立ちます:
+
+<br>
+
+**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
+
+&#10230;注釈: P(A∩B)=P(A)P(B|A)=P(A|B)P(B)となります。
+
+<br>
+
+**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
+
+&#10230;分割(Partition) - {Ai,i∈[[1,n]]}はすべてのiに対してAi≠∅としましょう。次が成り立つとき、{Ai}は分割であると言います:
+
+<br>
+
+**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
+
+&#10230;注釈: 標本空間において任意の事象Bに対して、P(B)=n∑i=1P(B|Ai)P(Ai)が成り立ちます。
+
+<br>
+
+**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
+
+&#10230;ベイズの定理の応用 - {Ai,i∈[[1,n]]}を標本空間の分割とすると、次が成り立ちます:
+
+<br>
+
+**18. Independence ― Two events A and B are independent if and only if we have:**
+
+&#10230;独立性 - 次が成り立ちかつその場合に限り（必要十分）、2つの事象AとBは独立であるといいます:
+
+<br>
+
+**19. Random Variables**
+
+&#10230;確率変数
+
+<br>
+
+**20. Definitions**
+
+&#10230;定義
+
+<br>
+
+**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
+
+&#10230;確率変数 - 確率変数は、よくXと表記され、ある標本空間のすべての要素を実数直線に対応させる関数です。
+
+<br>
+
+**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
+
+&#10230;累積分布関数(CDF) - 累積分布関数Fは、単調非減少かつlimx→−∞F(x)=0 and limx→+∞F(x)=1であり、次のように定義されます:
+
+<br>
+
+**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
+
+&#10230;注釈: P(a<X⩽B)=F(b)−F(a)が成り立ちます。
+
+<br>
+
+**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
+
+&#10230;確率密度関数(PDF) - 確率密度関数fは確率変数Xが2つの隣接する実現値の間の値をとる確率です。
+
+<br>
+
+**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
+
+&#10230;PDFとCDFについての関係性 - 離散値(D)と連続値(C)のそれぞれの場合について知っておくべき重要な特性をここに挙げます。
+
+<br>
+
+**26. [Case, CDF F, PDF f, Properties of PDF]**
+
+&#10230;[種類、CDF F、PDF f、PDFの特性]
+
+<br>
+
+**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
+
+&#10230;分布の期待値と積率 - 離散値と連続値のそれぞれの場合における期待値E[X]、一般化した期待値E[g(X)]、k次の積率E[Xk]と特性関数ψ(ω)をここに挙げます:
+
+<br>
+
+**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
+
+&#10230;分散(Variance) - 確率変数の分散は、よくVar(X)またはσ2と表記され、その確率変数の分布関数のばらつきの尺度です。次のように計算されます。
+
+<br>
+
+**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
+
+&#10230;標準偏差(Standard deviation) - 確率変数の標準偏差は、よくσと表記され、その確率変数の分布関数のばらつきの尺度であり、その確率変数の単位に則ったものです。次のように計算されます。
+
+<br>
+
+**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
+
+&#10230;確率変数の変換 - 変数XとYはなんらかの関数により関連づけられているとします。fXとfYをそれぞれXとYの分布関数として表記すると次が成り立ちます:
+
+<br>
+
+**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
+
+&#10230;ライプニッツの積分則 - gをxと潜在的にcの関数とし、a,bをcに従属的な境界とすると、次が成り立ちます。
+
+<br>
+
+**32. Probability Distributions**
+
+&#10230;確率分布
+
+<br>
+
+**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
+
+&#10230;チェビシェフの不等式 - Xを期待値μの確率変数とします。k,σ>0のとき次の不等式が成り立ちます:
+
+<br>
+
+**34. Main distributions ― Here are the main distributions to have in mind:**
+
+&#10230;主な分布 - 覚えておくべき主な分布をここに挙げます:
+
+<br>
+
+**35. [Type, Distribution]**
+
+&#10230;[種類、分布]
+
+<br>
+
+**36. Jointly Distributed Random Variables**
+
+&#10230;同時分布の確率変数
+
+<br>
+
+**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
+
+&#10230;周辺密度と累積分布 - 同時確率密度関数fXYから次が成り立ちます。
+
+<br>
+
+**38. [Case, Marginal density, Cumulative function]**
+
+&#10230;[種類、周辺密度、累積関数]
+
+<br>
+
+**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
+
+&#10230;条件付き密度(Conditional density) - Yに対するXの条件付き密度はよくfX|Yと表記され、次のように定義されます:
+
+<br>
+
+**40. Independence ― Two random variables X and Y are said to be independent if we have:**
+
+&#10230;独立性(Independence) - 2つの確率変数XとYは次が成り立つとき、独立であると言います:
+
+<br>
+
+**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
+
+&#10230;共分散(Covariance) - 2つの確率変数XとYの共分散を、σ2XYまたはより一般的にはCov(X,Y)と表記し、次のように定義します:
+
+<br>
+
+**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
+
+&#10230;相関係数(Correlation) - X, Yの標準偏差をσX,σYと表記し、確率変数X,Yの相関関係をρXYと表記し、次のように定義します:
+
+<br>
+
+**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
+
+&#10230;注釈 1: 任意の確率変数X,Yに対してρXY∈[−1,1]が成り立ちます。
+
+<br>
+
+**44. Remark 2: If X and Y are independent, then ρXY=0.**
+
+&#10230;注釈 2: XとYが独立ならば、ρXY=0です。
+
+<br>
+
+**45. Parameter estimation**
+
+&#10230;母数推定
+
+<br>
+
+**46. Definitions**
+
+&#10230;定義
+
+<br>
+
+**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
+
+&#10230;確率標本(Random sample) - 確率標本とはXに従う独立同分布のn個の確率変数X1,...,Xnの集合です。
+
+<br>
+
+**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
+
+&#10230;推定量(Estimator) - 推定量とは統計モデルにおける未知のパラメータの値を推定するのに用いられるデータの関数です。
+
+<br>
+
+**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
+
+&#10230;偏り(Bias) - 推定量^θの偏りは^θのの分布の期待値と真の値との差として定義されます。すなわち:
+
+<br>
+
+**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
+
+&#10230;注釈: E[^θ]=θが成り立つとき、推定量は不偏であるといいます。
+
+<br>
+
+**51. Estimating the mean**
+
+&#10230;平均の推定
+
+<br>
+
+**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
+
+&#10230;標本平均(Sample mean) - 確率標本の標本平均は、ある分布の真の平均μを推定するのに用いられ、よく¯¯¯¯¯Xと表記され、次のように定義されます:
+
+<br>
+
+**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
+
+&#10230;注釈: 標本平均は不偏です。すなわちE[¯¯¯¯¯X]=μが成り立ちます。
+
+<br>
+
+**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
+
+&#10230;中心極限定理 - 確率標本X1,...,Xnが平均μと分散σ2を持つある分布に従うとすると、次が成り立ちます:
+
+<br>
+
+**55. Estimating the variance**
+
+&#10230;分散の推定
+
+<br>
+
+**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
+
+&#10230;標本分散 - 確率標本の標本分散は、ある分布の真の分散σ2を推定するのに用いられ、よくs2または^σ2と表記され、次のように定義されます:
+
+<br>
+
+**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
+
+&#10230;注釈: 標本分散は不偏です。すなわちE[s2]=σ2が成り立ちます。
+
+<br>
+
+**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
+
+&#10230;標本分散とカイ二乗分布との関係 - 確率標本の標本分散をs2とすると、次が成り立ちます:
+
+<br>
+
+**59. [Introduction, Sample space, Event, Permutation]**
+
+&#10230;[導入、標本空間、事象、順列]
+
+<br>
+
+**60. [Conditional probability, Bayes' rule, Independence]**
+
+&#10230;[条件付き確率、ベイズの定理、独立]
+
+<br>
+
+**61. [Random variables, Definitions, Expectation, Variance]**
+
+&#10230;[確率変数、定義、期待値、分散]
+
+<br>
+
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
+
+&#10230;[確率分布、チェビシェフの不等式、主な分布]
+
+<br>
+
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
+
+&#10230;[同時分布の確率変数、密度、共分散、相関係数]
+
+<br>
+
+**64. [Parameter estimation, Mean, Variance]**
+
+&#10230;[母数推定、平均、分散]
diff --git a/ja/cs-229-supervised-learning.md b/ja/cs-229-supervised-learning.md
new file mode 100644
index 000000000..71f63afdd
--- /dev/null
+++ b/ja/cs-229-supervised-learning.md
@@ -0,0 +1,567 @@
+**1. Supervised Learning cheatsheet**
+
+&#10230;教師あり学習チートシート
+
+<br>
+
+**2. Introduction to Supervised Learning**
+
+&#10230;教師あり学習入門
+
+<br>
+
+**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
+
+&#10230;入力が{x(1),...,x(m)}、出力が{y(1),...,y(m)}であるとき、xからyを予測する分類器を構築したい。
+
+<br>
+
+**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
+
+&#10230;予測の種類 ― 様々な種類の予測モデルは下表に集約される：
+
+<br>
+
+**5. [Regression, Classifier, Outcome, Examples]**
+
+&#10230;回帰、分類、出力、例
+
+<br>
+
+**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
+
+&#10230;連続値、クラス、線形回帰、ロジスティック回帰、SVM、ナイーブベイズ
+
+<br>
+
+**7. Type of model ― The different models are summed up in the table below:**
+
+&#10230;モデルの種類 ― 様々な種類のモデルは下表に集約される：
+
+<br>
+
+**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
+
+&#10230;判別モデル、生成モデル、目的、学習対象、イメージ図、例
+
+<br>
+
+**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
+
+&#10230;P(y|x)の直接推定、後にP(y|x)を推測するためのP(x|y)の推定、決定境界、データの確率分布、回帰、SVM、GDA、ナイーブベイズ
+
+<br>
+
+**10. Notations and general concepts**
+
+&#10230;記法と全般的な概念
+
+<br>
+
+**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
+
+&#10230;仮説 ― 仮説はhθと表され、選択されたモデルのことである。与えられた入力x(i)に対して、モデルの予測結果はhθ(x(i))である。
+
+<br>
+
+**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
+
+&#10230;損失関数 ― 損失関数とは(z,y)∈R×Y⟼L(z,y)∈Rを満たす関数Lで、予測値zとそれに対応する正解データ値yを入力とし、その誤差を出力するものである。一般的な損失関数は次表に集約される：
+
+<br>
+
+**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
+
+&#10230;最小2乗誤差、ロジスティック損失、ヒンジ損失、交差エントロピー
+
+<br>
+
+**14. [Linear regression, Logistic regression, SVM, Neural Network]**
+
+&#10230;線形回帰、ロジスティック回帰、SVM、ニューラルネットワーク
+
+<br>
+
+**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
+
+&#10230;コスト関数 ― コスト関数Jは一般的にモデルの性能を評価するために用いられ、損失関数をLとして次のように定義される：
+
+<br>
+
+**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
+
+&#10230;勾配降下法 ― 学習率をα∈Rとし、勾配降下法における更新ルールは学習率とコスト関数Jを用いて次のように表される：
+
+<br>
+
+**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
+
+&#10230;備考：確率的勾配降下法(SGD)は学習標本全体を用いてパラメータを更新し、バッチ勾配降下法は学習標本の各バッチ毎に更新する。
+
+<br>
+
+**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
+
+&#10230;尤度 ― パラメータをθとすると、あるモデルの尤度L(θ)を最大にすることにより最適なパラメータを求められる。実際には、最適化しやすい対数尤度ℓ(θ)=log(L(θ))を用いる。すなわち：
+
+<br>
+
+**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
+
+&#10230;ニュートン法 ― ニュートン法とはℓ′(θ)=0となるθを求める数値法である。その更新ルールは次の通りである：
+
+<br>
+
+**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
+
+&#10230;備考：多次元一般化またはニュートン-ラフソン法の更新ルールは次の通りである：
+
+<br>
+
+**21. Linear models**
+
+&#10230;線形モデル
+
+<br>
+
+**22. Linear regression**
+
+&#10230;線形回帰
+
+<br>
+
+**23. We assume here that y|x;θ∼N(μ,σ2)**
+
+&#10230;ここでy|x;θ∼N(μ,σ2)であるとする。
+
+<br>
+
+**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
+
+&#10230;正規方程式 ― Xを行列とすると、コスト関数を最小化するθの値は次のような閉形式の解である：
+
+<br>
+
+**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
+
+&#10230;最小2乗法 ― 学習率をαとすると、m個のデータ点からなる学習データに対する最小2乗法（LMSアルゴリズム）による更新ルールは、ウィドロウ-ホフの学習規則としても知られており、次の通りである：
+
+<br>
+
+**26. Remark: the update rule is a particular case of the gradient ascent.**
+
+&#10230;備考：この更新ルールは勾配上昇法の特殊な例である。
+
+<br>
+
+**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
+
+&#10230;局所重み付き回帰 ― 局所重み付き回帰は、LWRとも呼ばれ、線形回帰の派生形である。パラメータをτ∈Rとして次のように定義されるw(i)(x)により、個々の学習標本をそのコスト関数において重み付けする：
+
+<br>
+
+**28. Classification and logistic regression**
+
+&#10230;分類とロジスティック回帰
+
+<br>
+
+**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
+
+&#10230;シグモイド関数 ― シグモイド関数gは、ロジスティック関数とも呼ばれ、次のように定義される：
+
+<br>
+
+**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
+
+&#10230;ロジスティック回帰 ― ここでy|x;θ∼Bernoulli(ϕ)であるとすると、次の形式を得る：
+
+<br>
+
+**31. Remark: there is no closed form solution for the case of logistic regressions.**
+
+&#10230;備考：ロジスティック回帰については閉形式の解は存在しない。
+
+<br>
+
+**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
+
+&#10230;ソフトマックス回帰 ― ソフトマックス回帰は、多クラス分類ロジスティック回帰とも呼ばれ、3個以上の結果クラスがある場合にロジスティック回帰を一般化するためのものである。慣習的に、θK=0とすると、各クラスiのベルヌーイ分布のパラメータϕiは次と等しくなる：
+
+<br>
+
+**33. Generalized Linear Models**
+
+&#10230;一般化線形モデル
+
+<br>
+
+**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
+
+&#10230;指数分布族 ― ある分布の集合は指数分布族と呼ばれ、正準パラメータまたはリンク関数とも呼ばれる自然パラメータη、十分統計量T(y)及び対数分配関数a(η)を用いて、次のように表される：
+
+<br>
+
+**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
+
+&#10230;備考：T(y)=yとすることが多い。また、exp(−a(η))は確率の合計が1になることを保証する正規化定数と見なせる。
+
+<br>
+
+**36. Here are the most common exponential distributions summed up in the following table:**
+
+&#10230;最も一般的な指数分布族は下表に集約される：
+
+<br>
+
+**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
+
+&#10230;分布、ベルヌーイ、ガウス、ポワソン、幾何
+
+<br>
+
+**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function of x∈Rn+1 and rely on the following 3 assumptions:**
+
+&#10230;GLMの仮定 ― 一般化線形モデル(GLM)はランダムな変数yをx∈Rn+1の関数として予測することを目的とし、次の3つの仮定に依拠する：
+
+<br>
+
+**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
+
+&#10230;備考：最小2乗回帰とロジスティック回帰は一般化線形モデルの特殊な例である。
+
+<br>
+
+**40. Support Vector Machines**
+
+&#10230;サポートベクターマシン
+
+<br>
+
+**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
+
+&#10230;サポートベクターマシンの目的は、データ点からの最短距離が最大となる境界線を求めることである。
+
+<br>
+
+**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
+
+&#10230;最適マージン分類器 ― 最適マージン分類器hは次のようなものである：
+
+<br>
+
+**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
+
+&#10230;ここで、(w,b)∈Rn×Rは次の最適化問題の解である：
+
+<br>
+
+**44. such that**
+
+&#10230;ただし
+
+<br>
+
+**45. support vectors**
+
+&#10230;サポートベクター
+
+<br>
+
+**46. Remark: the line is defined as wTx−b=0.**
+
+&#10230;備考：直線はwTx−b=0と定義する。
+
+<br>
+
+**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
+
+&#10230;ヒンジ損失 ― ヒンジ損失はSVMの設定に用いられ、次のように定義される：
+
+<br>
+
+**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
+
+&#10230;カーネル ― 特徴写像をϕとすると、カーネルKは次のように定義される：
+
+<br>
+
+**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
+
+&#10230;実際には、K(x,z)=exp(−||x−z||22σ2)と定義され、ガウシアンカーネルと呼ばれるカーネルKがよく使われる。
+
+<br>
+
+**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
+
+&#10230;非線形分離問題、カーネル写像の適用、元の空間における決定境界
+
+<br>
+
+**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
+
+&#10230;備考：カーネルを用いてコスト関数を計算する「カーネルトリック」を用いる。なぜなら、明示的な写像ϕを実際には知る必要はないし、それはしばしば非常に複雑になってしまうからである。代わりに、K(x,z)の値のみが必要である。
+
+<br>
+
+**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
+
+&#10230;ラグランジアン ― ラグランジアンL(w,b)を次のように定義する：
+
+<br>
+
+**53. Remark: the coefficients βi are called the Lagrange multipliers.**
+
+&#10230;備考：係数βiはラグランジュ乗数と呼ばれる。
+
+<br>
+
+**54. Generative Learning**
+
+&#10230;生成学習
+
+<br>
+
+**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
+
+&#10230;生成モデルは、P(x|y)を推定することによりデータがどのように生成されるのかを学習しようとする。それはベイズの定理を用いてP(y|x)を推定するために使える。
+
+<br>
+
+**56. Gaussian Discriminant Analysis**
+
+&#10230;ガウシアン判別分析
+
+<br>
+
+**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
+
+&#10230;前提条件 ― ガウシアン判別分析はyとx|y=0とx|y=1は次のようであることを前提とする：
+
+<br>
+
+**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
+
+&#10230;推定 ― 尤度を最大にすると得られる推定量は下表に集約される：
+
+<br>
+
+**59. Naive Bayes**
+
+&#10230;ナイーブベイズ
+
+<br>
+
+**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
+
+&#10230;仮定 ― ナイーブベイズモデルは、個々のデータ点の特徴量が全て独立であると仮定する：
+
+<br>
+
+**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
+
+&#10230;解 ― 対数尤度を最大にすると次の解を得る。ただし、k∈{0,1},l∈[[1,L]]とする。
+
+<br>
+
+**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
+
+&#10230;備考：ナイーブベイズはテキスト分類やスパム検知に幅広く使われている。
+
+<br>
+
+**63. Tree-based and ensemble methods**
+
+&#10230;決定木とアンサンブル学習
+
+<br>
+
+**64. These methods can be used for both regression and classification problems.**
+
+&#10230;これらの方法は回帰と分類問題の両方に使える。
+
+<br>
+
+**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
+
+&#10230;CART ― 分類・回帰木 (CART)は、一般には決定木として知られ、二分木として表される。非常に解釈しやすいという利点がある。
+
+<br>
+
+**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
+
+&#10230;ランダムフォレスト ― これは決定木をベースにしたもので、ランダムに選択された特徴量の集合から構築された多数の決定木を用いる。単純な決定木と異なり、非常に解釈しにくいが、一般的に良い性能が出るのでよく使われるアルゴリズムである。
+
+<br>
+
+**67. Remark: random forests are a type of ensemble methods.**
+
+&#10230;備考：ランダムフォレストはアンサンブル学習の一種である。
+
+<br>
+
+**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
+
+&#10230;ブースティング ― ブースティングの考え方は、複数の弱い学習器を束ねることで1つのより強い学習器を作るというものである。主なものは次の表に集約される：
+
+<br>
+
+**69. [Adaptive boosting, Gradient boosting]**
+
+&#10230;[適応的ブースティング、勾配ブースティング]
+
+<br>
+
+**70. High weights are put on errors to improve at the next boosting step**
+
+&#10230;次のブースティングステップにて改善すべき誤分類に大きい重みが課される。
+
+<br>
+
+**71. Weak learners trained on remaining errors**
+
+&#10230;残っている誤分類を弱い学習器が学習する。
+
+<br>
+
+**72. Other non-parametric approaches**
+
+&#10230;他のノンパラメトリックな手法
+
+<br>
+
+**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
+
+&#10230;k近傍法 ― k近傍法は、一般的にk-NNとして知られ、あるデータ点の応答はそのk個の最近傍点の性質によって決まるノンパラメトリックな手法である。分類と回帰の両方に用いることができる。
+
+<br>
+
+**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
+
+&#10230;備考：パラメータkが大きくなるほど、バイアスが大きくなる。パラメータkが小さくなるほど、分散が大きくなる。
+
+<br>
+
+**75. Learning Theory**
+
+&#10230;学習理論
+
+<br>
+
+**76. Union bound ― Let A1,...,Ak be k events. We have:**
+
+&#10230;和集合上界 ― A1,...,Akというk個の事象があるとき、次が成り立つ：
+
+<br>
+
+**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
+
+&#10230;ヘフディング不等式 ― パラメータϕのベルヌーイ分布から得られるm個の独立同分布変数をZ1,..,Zmとする。その標本平均をˆϕとし、γは正の定数であるとすると、次が成り立つ：
+
+<br>
+
+**78. Remark: this inequality is also known as the Chernoff bound.**
+
+&#10230;備考：この不等式はチェルノフ上界としても知られる。
+
+<br>
+
+**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
+
+&#10230;学習誤差 ― ある分類器hに対して、学習誤差、あるいは経験損失か経験誤差としても知られるˆϵ(h)を次のように定義する：
+
+<br>
+
+**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
+
+&#10230;確率的に近似的に正しい (PAC) ― PACとは、その下で学習理論に関する様々な業績が証明されてきたフレームワークであり、次の前提がある：
+
+<br>
+
+**81: the training and testing sets follow the same distribution **
+
+&#10230;学習データと検証データは同じ分布に従う。
+
+<br>
+
+**82. the training examples are drawn independently**
+
+&#10230;学習標本は独立に取得される。
+
+<br>
+
+**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
+
+&#10230;細分化 ― 集合S={x(1),...,x(d)}と分類器の集合Hがあるとき、もし任意のラベル{y(1),...,y(d)}の集合に対して次が成り立つとき、HはSを細分化する：
+
+<br>
+
+**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
+
+&#10230;上界定理 ― Hを|H|=kで有限の仮説集合とし、δとサンプルサイズmは定数とする。そのとき、少なくとも1-δの確率で次が成り立つ：
+
+<br>
+
+**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
+
+&#10230;VC次元 ― ある仮説集合Hのヴァプニク・チェルヴォーネンキス次元 (VC)は、VC(H)と表記され、それはHによって細分化される最大の集合のサイズである。
+
+<br>
+
+**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
+
+&#10230;備考：2次元の線形分類器の集合であるHのVC次元は3である。
+
+<br>
+
+**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
+
+&#10230;定理（ヴァプニク） ― あるHについてVC(H)=dであり、mを学習標本の数とする。少なくとも1−δの確率で次が成り立つ：
+
+<br>
+
+**88. [Introduction, Type of prediction, Type of model]**
+
+&#10230;[導入、予測の種類、モデルの種類]
+
+<br>
+
+**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
+
+&#10230;[記法と全般的な概念、損失関数、勾配降下、尤度]
+
+<br>
+
+**90. [Linear models, linear regression, logistic regression, generalized linear models]**
+
+&#10230;
+
+<br>[線形モデル、線形回帰、ロジスティック回帰、一般化線形モデル]
+
+**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
+
+&#10230;
+
+<br>[サポートベクターマシン、最適マージン分類器、ヒンジ損失、カーネル]
+
+**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
+
+&#10230;
+
+<br>[生成学習、ガウシアン判別分析、ナイーブベイズ]
+
+**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
+
+&#10230;[ツリーとアンサンブル学習、CART、ランダムフォレスト、ブースティング]
+
+<br>
+
+**94. [Other methods, k-NN]**
+
+&#10230;[他の手法、k近傍法]
+
+<br>
+
+**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
+
+&#10230;[学習理論、ヘフディング不等式、PAC、VC次元]
diff --git a/ja/cs-229-unsupervised-learning.md b/ja/cs-229-unsupervised-learning.md
new file mode 100644
index 000000000..cc8111e7c
--- /dev/null
+++ b/ja/cs-229-unsupervised-learning.md
@@ -0,0 +1,339 @@
+**1. Unsupervised Learning cheatsheet**
+
+&#10230;教師なし学習チートシート
+
+<br>
+
+**2. Introduction to Unsupervised Learning**
+
+&#10230;教師なし学習の概要
+
+<br>
+
+**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
+
+&#10230;モチベーション - 教師なし学習の目的はラベルのないデータ{x(1),...,x(m)}に隠されたパターンを探すことです。
+
+<br>
+
+**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
+
+&#10230;イェンセンの不等式 - fを凸関数、Xを確率変数とすると、次の不等式が成り立ちます:
+
+<br>
+
+**5. Clustering**
+
+&#10230;クラスタリング
+
+<br>
+
+**6. Expectation-Maximization**
+
+&#10230;期待値最大化
+
+<br>
+
+**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
+
+&#10230;潜在変数 - 潜在変数は推定問題を困難にする隠れた/観測されていない変数であり、多くの場合zで示されます。潜在変数がある最も一般的な設定は次のとおりです:
+
+<br>
+
+**8. [Setting, Latent variable z, Comments]**
+
+&#10230;[設定、潜在変数z、コメント]
+
+<br>
+
+**9. [Mixture of k Gaussians, Factor analysis]**
+
+&#10230;[k個のガウス分布の混合、因子分析]
+
+<br>
+
+**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230;アルゴリズム - EMアルゴリズムは次のように尤度の下限の構築(E-ステップ)と、その下限の最適化(M-ステップ)を繰り返し行うことによる最尤推定によりパラメーターθを推定する効率的な方法を提供します:
+
+<br>
+
+**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
+
+&#10230;E-ステップ: 各データポイントx(i)が特定クラスターz(i)に由来する事後確率Qi(z(i))を次のように評価します:
+
+<br>
+
+**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
+
+&#10230;M-ステップ: 事後確率Qi(z(i))をデータポイントx(i)のクラスター固有の重みとして使い、次のように各クラスターモデルを個別に再推定します:
+
+<br>
+
+**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
+
+&#10230;[ガウス分布初期化、期待値ステップ、最大化ステップ、収束]
+
+<br>
+
+**14. k-means clustering**
+
+&#10230;k平均法
+
+<br>
+
+**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
+
+&#10230;データポイントiのクラスタをc(i)、クラスタjの中心をμjと表記します。
+
+<br>
+
+**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230;クラスターの重心μ1,μ2,...,μk∈Rnをランダムに初期化後、k-meansアルゴリズムが収束するまで次のようなステップを繰り返します:
+
+<br>
+
+**17. [Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230; [平均の初期化、クラスター割り当て、平均の更新、収束]
+
+<br>
+
+**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
+
+&#10230;ひずみ関数 - アルゴリズムが収束するかどうかを確認するため、次のように定義されたひずみ関数を参照します:
+
+<br>
+
+**19. Hierarchical clustering**
+
+&#10230; 階層的クラスタリング
+
+<br>
+
+**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
+
+&#10230;アルゴリズム - これは入れ子になったクラスタを逐次的に構築する凝集階層アプローチによるクラスタリングアルゴリズムです。
+
+<br>
+
+**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
+
+&#10230; 種類 ― 様々な目的関数を最適化するための様々な種類の階層クラスタリングアルゴリズムが以下の表にまとめられています。
+
+<br>
+
+**22. [Ward linkage, Average linkage, Complete linkage]**
+
+&#10230; [ウォードリンケージ、平均リンケージ、完全リンケージ]
+
+<br>
+
+**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
+
+&#10230; [クラスター内の距離最小化、クラスターペア間の平均距離の最小化、クラスターペア間の最大距離の最小化]
+
+<br>
+
+**24. Clustering assessment metrics**
+
+&#10230; クラスタリング評価指標
+
+<br>
+
+**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
+
+&#10230; 教師なし学習では、教師あり学習の場合のような正解ラベルがないため、モデルの性能を評価することが困難な場合が多くあります。
+
+<br>
+
+**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
+
+&#10230; シルエット係数 ― ある1つのサンプルと同じクラス内のその他全ての点との平均距離をa、そのサンプルから最も近いクラスタ内の全ての点との平均距離をbと表記すると、そのサンプルのシルエット係数sは次のように定義されます:
+
+<br>
+
+**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
+
+&#10230; Calinski-Harabazインデックス ― クラスタの数をkと表記すると、クラスタ間およびクラスタ内の分散行列であるBkおよびWkはそれぞれ以下のように定義されます。
+
+<br>
+
+**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
+
+&#10230; Calinski-Harabazインデックスs(k)はクラスタリングモデルが各クラスタをどの程度適切に定義しているかを示します。つまり、スコアが高いほど、各クラスタはより密で、十分に分離されています。それは次のように定義されます:
+
+<br>
+
+**29. Dimension reduction**
+
+&#10230; 次元削減
+
+<br>
+
+**30. Principal component analysis**
+
+&#10230; 主成分分析
+
+<br>
+
+**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
+
+&#10230; これは分散を最大にするデータの射影方向を見つける次元削減手法です。
+
+<br>
+
+**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230; 固有値、固有ベクトル - 行列 A∈Rn×nが与えられたとき、次の式で固有ベクトルと呼ばれるベクトルz∈Rn∖{0}が存在した場合に、λはAの固有値と呼ばれます。
+
+<br>
+
+**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230; スペクトル定理 - A∈Rn×nとする。Aが対称のとき、Aは実直交行列U∈Rn×nを用いて対角化可能です。Λ=diag(λ1,...,λn)と表記することで、次の式を得ます。
+
+<br>
+
+**34. diagonal**
+
+&#10230; diagonal
+
+<br>
+
+**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230; 注釈: 最大固有値に対応する固有ベクトルは行列Aの第1固有ベクトルと呼ばれる。
+
+<br>
+
+**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:**
+
+&#10230; アルゴリズム ― 主成分分析（PCA）の過程は、次のようにデータの分散を最大化することによりデータをk次元に射影する次元削減の技術です。
+
+<br>
+
+**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230; ステップ1：平均が0で標準偏差が1となるようにデータを正規化します。
+
+<br>
+
+**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
+
+&#10230; ステップ2：実固有値に関して対称であるΣ=1mm∑i=1x(i)x(i)T∈Rn×nを計算します。
+
+<br>
+
+**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
+
+&#10230; ステップ3：k個のΣの対角主値固有ベクトルu1,...,uk∈Rn、すなわちk個の最大の固有値の対角固有ベクトルを計算します。
+
+<br>
+
+**40. Step 4: Project the data on spanR(u1,...,uk).**
+
+&#10230; ステップ4：データをspanR(u1,...,uk)に射影します。
+
+<br>
+
+**41. This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230; この過程は全てのk次元空間の間の分散を最大化します。
+
+<br>
+
+**42. [Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230; [特徴空間内のデータ、主成分の探索、主成分空間内のデータ]
+
+<br>
+
+**43. Independent component analysis**
+
+&#10230; 独立成分分析
+
+<br>
+
+**44. It is a technique meant to find the underlying generating sources.**
+
+&#10230; 隠れた生成源を見つけることを意図した技術です。
+
+<br>
+
+**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
+
+&#10230; 仮定 ― 混合かつ非特異行列Aを通じて、データxはn次元の元となるベクトルs=(s1,...,sn)から次のように生成されると仮定します。ただしsiは独立でランダムな変数です：
+
+<br>
+
+**46. The goal is to find the unmixing matrix W=A−1.**
+
+&#10230; 非混合行列W=A−1を見つけることが目的です。
+
+<br>
+
+**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
+
+&#10230; ベルとシノスキーのICAアルゴリズム ― このアルゴリズムは非混合行列Wを次のステップによって見つけます：
+
+<br>
+
+**48. Write the probability of x=As=W−1s as:**
+
+&#10230; x=As=W−1sの確率を次のように表します：
+
+<br>
+
+**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
+
+&#10230; 学習データを{x(i),i∈[[1,m]]}、シグモイド関数をgとし、対数尤度を次のように表します：
+
+<br>
+
+**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
+
+&#10230; そのため、確率的勾配上昇法の学習規則は、学習サンプルx(i)に対して次のようにWを更新するものです：
+
+<br>
+
+**51. The Machine Learning cheatsheets are now available in [target language].**
+
+&#10230; 機械学習チートシートは日本語で読めます。
+
+<br>
+
+**52. Original authors**
+
+&#10230; 原著者
+
+<br>
+
+**53. Translated by X, Y and Z**
+
+&#10230; X・Y・Z 訳
+
+<br>
+
+**54. Reviewed by X, Y and Z**
+
+&#10230; X・Y・Z 校正
+
+<br>
+
+**55. [Introduction, Motivation, Jensen's inequality]**
+
+&#10230; [導入、動機、イェンセンの不等式]
+
+<br>
+
+**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
+
+&#10230;[クラスタリング、期待値最大化法、k-means、階層クラスタリング、指標]
+
+<br>
+
+**57. [Dimension reduction, PCA, ICA]**
+
+&#10230; [次元削減、PCA、ICA]
diff --git a/ja/cs-230-convolutional-neural-networks.md b/ja/cs-230-convolutional-neural-networks.md
new file mode 100644
index 000000000..178592414
--- /dev/null
+++ b/ja/cs-230-convolutional-neural-networks.md
@@ -0,0 +1,717 @@
+**Convolutional Neural Networks translation**
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230; 畳み込みニューラルネットワーク チートシート
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - ディープラーニング
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+&#10230; [概要、アーキテクチャ構造]
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230; [層の種類、畳み込み、プーリング、全結合]
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230; [フィルタハイパーパラメータ、次元、ストライド、パディング]
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230; [ハイパーパラメータの調整、パラメータの互換性、モデルの複雑さ、受容野]
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230; [活性化関数、正規化線形ユニット、ソフトマックス]
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230; [物体検出、モデルの種類、検出、IoU、非極大抑制、YOLO、R-CNN]
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230; [顔認証/認識、One shot学習、シャムネットワーク、トリプレット損失]
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230; [ニューラルスタイル変換、活性化、スタイル行列、スタイル/コンテンツコスト関数]
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230; [計算トリックアーキテクチャ、敵対的生成ネットワーク、ResNet、インセプションネットワーク]
+
+<br>
+
+
+**12. Overview**
+
+&#10230; 概要
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230; 伝統的な畳み込みニューラルネットワークのアーキテクチャ - CNNとしても知られる畳み込みニューラルネットワークは一般的に次の層で構成される特定種類のニューラルネットワークです。
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230; 畳み込み層とプーリング層は次のセクションで説明されるハイパーパラメータに関してファインチューニングできます。
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230; 層の種類
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230; 畳み込み層 (CONV) - 畳み込み層 (CONV)は入力Iを各次元に関して走査する時に、畳み込み演算を行うフィルタを使用します。畳み込み層のハイパーパラメータにはフィルタサイズFとストライドSが含まれます。結果出力Oは特徴マップまたは活性化マップと呼ばれます。
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230; 注: 畳み込みステップは1次元や3次元の場合にも一般化できます。
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230; プーリング (POOL) - プーリング層 (POOL)は位置不変性をもつ縮小操作で、通常は畳み込み層の後に適用されます。特に、最大及び平均プーリングはそれぞれ最大と平均値が取られる特別な種類のプーリングです。
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230; [種類、目的、図、コメント]
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230; [最大プーリング、平均プーリング、各プーリング操作は現在のビューの中から最大値を選ぶ、各プーリング操作は現在のビューに含まれる値を平均する]
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230; [検出された特徴の保持、最も一般的な利用、特徴マップをダウンサンプリング、LeNetでの利用]
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230; 全結合 (FC) - 全結合 (FC) 層は平坦化された入力に対して演算を行います。各入力は全てのニューロンに接続されています。FC層が存在する場合、通常CNNアーキテクチャの末尾に向かって見られ、クラススコアなどの目的を最適化するため利用できます。
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230; フィルタハイパーパラメータ
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230; 畳み込み層にはハイパーパラメータの背後にある意味を知ることが重要なフィルタが含まれています。
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230; フィルタの次元 - C個のチャネルを含む入力に適用されるF×Fサイズのフィルタの体積はF×F×Cで、それはI×I×Cサイズの入力に対して畳み込みを実行してO×O×1サイズの特徴マップ（活性化マップとも呼ばれる）出力を生成します。
+
+
+<br>
+
+
+**26. Filter**
+
+&#10230; フィルタ
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230; 注: F×FサイズのK個のフィルタを適用すると、O×O×Kサイズの特徴マップの出力を得られます。
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230; ストライド - 畳み込みまたはプーリング操作において、ストライドSは各操作の後にウィンドウを移動させるピクセル数を表します。
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230; ゼロパディング - ゼロパディングとは入力の各境界に対してP個のゼロを追加するプロセスを意味します。この値は手動で指定することも、以下に詳述する３つのモードのいずれかを使用して自動的に設定することもできます。
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230; [モード、値、図、目的、Valid、Same、Full]
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230; [パディングなし、次元が合わなかったら場合の最後の畳み込みの終了, 特徴マップのサイズが⌈IS⌉になるようなパディング、出力サイズは数学的に扱いやすい、「ハーフ」パディングとも呼ばれる、入力の一番端まで畳み込みが適用されるような最大パディング, フィルタは入力を端から端まで「見る」]
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230; ハイパーパラメータの調整
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230; 畳み込み層内のパラメータ互換性 - Iを入力ボリュームサイズの長さ、Fをフィルタの長さ、Pをゼロパディングの量, Sをストライドとすると、その次元に沿った特徴マップの出力サイズOは次式で与えられます:
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230; [入力、フィルタ、出力]
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230; 注: 多くの場合Pstart=Pend≜Pであり、上記の式のPstart+Pendを2Pに置き換える事ができます。
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230; モデルの複雑さを理解する - モデルの複雑さを評価するために、モデルのアーキテクチャが持つパラメータの数を測定することがしばしば有用です。畳み込みニューラルネットワークの各層では、以下のように行なわれます:
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230; [図、入力サイズ、出力サイズ、パラメータの数、備考]
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230; [フィルタごとの1つのバイアスパラメータ、ほとんどの場合、S<F、Kの一般的な選択は2C]
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230; [チャネルごとに行われるプーリング操作、ほとんどの場合、S=F]
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230; [入力は平坦化される、ニューロンごとにひとつのバイアスパラメータ、FCのニューロンの数には構造的制約がない]
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230; 受容野 - 層kにおける受容野は、k番目の活性化マップの各ピクセルが「見る」ことができる入力のRk×Rkの領域です。層jのフィルタサイズをFj、層iのストライド値をSiとし、慣例に従ってS0=1とすると、層kでの受容野は次の式で計算されます：
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230; 下記の例のようにF1=F2=3、S1=S2=1とすると、R2=1+2⋅1+2⋅1=5となります。
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230; よく使われる活性化関数
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230; 正規化線形ユニット - 正規化線形ユニット層(ReLU)はボリュームの全ての要素に利用される活性化関数gです。ReLUの目的は非線型性をネットワークに導入することです。変種は以下の表でまとめられています：
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230;[ReLU、Leaky ReLU、ELU、ただし]
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230; [生物学的に解釈可能な非線形複雑性、負の値に対してReLUが死んでいる問題への対処、どこても微分可能]
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230; ソフトマックス - ソフトマックスのステップは入力としてスコアx∈Rnのベクトルを取り、アーキテクチャの最後にあるソフトマックス関数を通じて確率p∈Rnのベクトルを出力する一般化されたロジスティック関数として見ることができます。次のように定義されます:
+
+<br>
+
+
+**48. where**
+
+&#10230; ここで
+
+<br>
+
+
+**49. Object detection**
+
+&#10230; 物体検出
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230; モデルの種類 - 物体認識アルゴリズムは主に3つの種類があり、予測されるものの性質は異なります。次の表で説明されています:
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230; [画像分類、位置特定を伴う分類、検出]
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230; [テディベア、本]
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230; [画像の分類、物体の確率の予測, 画像内の物体の検出、物体の確率とその位置の予測、画像内の複数の物体の検出、複数の物体の確率と位置の予測]
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230; [伝統的なCNN、単純されたYOLO、R-CNN、YOLO、R-CNN]
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230; 検出 - 物体検出の文脈では、画像内の物体の位置を特定したいだけなのかあるいは複雑な形状を検出したいのかによって、異なる方法が使用されます。二つの主なものは次の表でまとめられています：
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230; [バウンディングボックス検出、ランドマーク検出]
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230; [物体が配置されている画像の部分の検出、物体（たとえば目）の形状または特徴の検出、詳細]
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230; [中心(bx, by)、高さbh、幅bwのボックス、参照点(l1x,l1y), ..., (lnx,lny)]
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230; Intersection over Union - Intersection over Union （IoUとしても知られる）は予測された境界ボックスBpが実際の境界ボックスBaに対してどれだけ正しく配置されているかを定量化する関数です。次のように定義されます：
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230; 注：常にIoU∈[0,1]となります。慣例では、IoU(Bp,Ba)⩾0.5の場合、予測された境界ボックスBpはそこそこ良いと見なされます。
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230; アンカーボックス - アンカーボクシングは重なり合う境界ボックスを予測するために使用される手法です。 実際には、ネットワークは同時に複数のボックスを予測することを許可されており、各ボックスの予測は特定の幾何学的属性の組み合わせを持つように制約されます。例えば、最初の予測は特定の形式の長方形のボックスになる可能性があり、2番目の予測は異なる幾何学的形式の別の長方形のボックスになります。
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230; 非極大抑制 - 非極大抑制技術のねらいは、最も代表的なものを選択することによって、同じ物体の重複した重なり合う境界ボックスを除去することです。0.6未満の予測確率を持つボックスを全て除去した後、残りのボックスがある間、以下の手順が繰り返されます:
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230; [特定のクラスに対して、ステップ1: 最大の予測確率を持つボックスを選ぶ。ステップ2: そのボックスに対してIoU⩾0.5となる全てのボックスを破棄する。]
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230; [ボックス予測、最大確率のボックス選択、同じクラスの重複除去、最終的な境界ボックス]
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230; YOLO - You Only Look Once (YOLO)は次の手順を実行する物体検出アルゴリズムです:
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230; [ステップ1: 入力画像をGxGグリッドに分割する。ステップ2: 各グリッドセルに対して次の形式のyを予測するCNNを実行する:,k回繰り返す。]
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230; ここで、pcは物体を検出する確率、bx,by,bh,bwは検出された境界ボックスの属性、c1, ..., cpはp個のクラスのうちどれが検出されたかのOne-hot表現、kはアンカーボックスの数です。
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230; ステップ3: 重複する可能性のある重なり合う境界ボックスを全て除去するため、非極大抑制アルゴリズムを実行する。
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230; [元の画像、GxGグリッドでの分割、境界ボックス予測、非極大抑制]
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230; 注: pc=0のとき、ネットワークは物体を検出しません。その場合には、対応する予測 bx, ..., cpは無視する必要があります。
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230; R-CNN - Region with Convolutional Neural Networks (R-CNN)は物体検出アルゴリズムで、最初に画像をセグメント化して潜在的に関連する境界ボックスを見つけ、次に検出アルゴリズムを実行してそれらの境界ボックス内で最も可能性の高い物体を見つけます。
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230; [元の画像、セグメンテーション、境界ボックス予測、非極大抑制]
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230; 注: 元のアルゴリズムは計算コストが高くて遅いですが、Fast R-CNNやFaster R-CNNなどの、より新しいアーキテクチャではアルゴリズムをより速く実行できます。
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230; 顔認証及び認識
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230; モデルの種類 - 2種類の主要なモデルが次の表にまとめられています:
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230; [顔認証、顔認識、クエリ、参照、データベース]
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230; [これは正しい人ですか?、1対1検索、これはデータベース内のK人のうちの1人ですか？、1対多検索]
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230; ワンショット学習 - ワンショット学習は限られた学習セットを利用して、2つの与えられた画像の違いを定量化する類似度関数を学習する顔認証アルゴリズムです。2つの画像に適用される類似度関数はしばしばd(画像1, 画像2)と記されます。
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230; シャムネットワーク - シャムネットワークは画像のエンコード方法を学習して2つの画像の違いを定量化することを目的としています。与えられた入力画像x(i)に対してエンコードされた出力はしばしばf(x(i))と記されます。
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230; トリプレット損失 - トリプレット損失ℓは3つ組の画像A(アンカー)、P(ポジティブ)、N(ネガティブ)の埋め込み表現で計算される損失関数です。アンカーとポジティブ例は同じクラスに属し、ネガティブ例は別のクラスに属します。マージンパラメータをα∈R+と呼ぶことによってこの損失は次のように定義されます:
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230; ニューラルスタイル変換
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230; モチベーション - ニューラルスタイル変換の目的は与えられたコンテンツCとスタイルSに基づく画像Gを生成することです。
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230; [コンテンツC、スタイルS、生成された画像G]
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230; 活性化 - 層lにおける活性化はa[l]と表記され、次元はnH×nw×ncです。
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230; コンテンツコスト関数 - Jcontent(C, G)というコンテンツコスト関数は生成された画像Gと元のコンテンツ画像Cとの違いを測定するため利用されます。以下のように定義されます:
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230; スタイル行列 - 与えられた層lのスタイル行列G[l]はグラム行列で、各要素G[l]kk′がチャネルkとk′の相関関係を定量化します。活性化a[l]に関して次のように定義されます:
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230; 注: スタイル画像及び生成された画像に対するスタイル行列はそれぞれG[l] (S)、G[l] (G)と表記されます。
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230;　スタイルコスト関数 - スタイルコスト関数Jstyle(S,G)は生成された画像GとスタイルSとの違いを測定するため利用されます。以下のように定義されます:
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230; 全体のコスト関数 - 全体のコスト関数は以下のようにパラメータα,βによって重み付けされたコンテンツ及びスタイルコスト関数の組み合わせとして定義されます:
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230; 注: αの値を大きくするとモデルはコンテンツを重視し、βの値を大きくするとスタイルを重視します。
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230; 計算トリックを使うアーキテクチャ
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230; 敵対的生成ネットワーク - 敵対的生成ネットワーク（GANsとも呼ばれる）は生成モデルと識別モデルで構成されます。生成モデルの目的は、生成された画像と本物の画像を区別することを目的とする識別モデルに与えられる、最も本物らしい出力を生成することです。
+
+<br>
+
+
+**93. [Training set, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230; [学習セット、ノイズ、現実世界の画像、生成器、識別器、真偽]
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230; 注: GANsの変種を使用するユースケースにはテキストからの画像生成, 音楽生成及び合成があります。
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230; ResNet - Residual Networkアーキテクチャ（ResNetとも呼ばれる）は学習エラーを減らすため多数の層がある残差ブロックを使用します。残差ブロックは次の特性方程式を有します:
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230; インセプションネットワーク - このアーキテクチャはインセプションモジュールを利用し、特徴量の多様化を通じてパーフォーマンスを向上させるため、様々な畳み込みを試すことを目的としています。特に、計算負荷を限定するため1×1畳み込みトリックを使います。
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230; ディープラーニングのチートシートが[日本語]で利用可能になりました。
+
+<br>
+
+
+**98. Original authors**
+
+&#10230; 原著者
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230; X・Y・Z 訳
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230; X・Y・Z 校正
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230; GitHubでPDF版を見る
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230; X・Y 著
+
+<br>
diff --git a/ja/cs-230-deep-learning-tips-and-tricks.md b/ja/cs-230-deep-learning-tips-and-tricks.md
new file mode 100644
index 000000000..a7de15349
--- /dev/null
+++ b/ja/cs-230-deep-learning-tips-and-tricks.md
@@ -0,0 +1,457 @@
+**Deep Learning Tips and Tricks translation**
+
+<br>
+
+**1. Deep Learning Tips and Tricks cheatsheet**
+
+&#10230;深層学習（ディープラーニング）のアドバイスやコツのチートシート
+
+<br> 
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230;CS 230 - 深層学習
+
+<br>
+
+
+**3. Tips and tricks**
+
+&#10230;アドバイスやコツ
+
+<br>
+
+
+**4. [Data processing, Data augmentation, Batch normalization]**
+
+&#10230;データ処理、Data augmentation (データ拡張)、Batch normalization (バッチ正規化)
+
+<br>
+
+
+**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
+
+&#10230;ニューラルネットワークの学習、エポック、ミニバッチ、交差エントロピー誤差、誤差逆伝播法、勾配降下法、重み更新、勾配チェック
+
+<br>
+
+
+**6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
+
+&#10230;パラメータチューニング、Xavier初期化、転移学習、学習率、適応学習率
+
+<br>
+
+
+**7. [Regularization, Dropout, Weight regularization, Early stopping]**
+
+&#10230;正規化、Dropout (ドロップアウト)、重みの正規化、Early stopping (学習の早々な終了)
+
+<br>
+
+
+**8. [Good practices, Overfitting small batch, Gradient checking]**
+
+&#10230;おすすめの技法、小さいバッチの過学習、勾配チェック
+
+<br>
+
+
+**9. View PDF version on GitHub**
+
+&#10230;GitHubでPDF版を見る
+
+<br>
+
+
+**10. Data processing**
+
+&#10230;データ処理
+
+<br>
+
+
+**11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
+
+&#10230;Data augmentation (データ拡張) - 大抵の場合は、深層学習のモデルを適切に訓練するには大量のデータが必要です。Data augmentation という技術を用いて既存のデータから、データを増やすことがよく役立ちます。以下、Data augmentation の主な手法はまとまっています。より正確には、以下の入力画像に対して、下記の技術を適用できます。
+
+<br>
+
+
+**12. [Original, Flip, Rotation, Random crop]**
+
+&#10230;元の画像、反転、回転、ランダムな切り抜き
+
+<br>
+
+
+**13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
+
+&#10230;何も変更されていない画像、画像の意味が変わらない軸における反転、わずかな角度の回転、不正確な水平線の校正（calibration）をシミュレートする、画像の一部へのランダムなフォーカス、連続して数回のランダムな切り抜きが可能
+
+<br>
+
+
+**14. [Color shift, Noise addition, Information loss, Contrast change]**
+
+&#10230;カラーシフト、ノイズの付加、情報損失、コントラスト（鮮やかさ）の修正
+
+<br>
+
+
+**15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
+
+&#10230;RGBのわずかな修正、照らされ方によるノイズを捉える、ノイズの付加、入力画像の品質のばらつきへの耐性の強化、画像の一部を無視、画像の一部が欠ける可能性を再現する、明るさの変化、時刻による露出の違いをコントロールする
+
+<br>
+
+
+**16. Remark: data is usually augmented on the fly during training.**
+
+&#10230;備考：データ拡張は基本的には学習時に臨機応変に行われる。
+
+<br>
+
+
+**17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
+
+&#10230;batch normalization - ハイパーパラメータ γ、β によってバッチ {xi} を正規化するステップです。修正を加えたいバッチの平均と分散をμB,σ2Bと表記すると、以下のように行えます。
+
+<br>
+
+
+**18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
+
+&#10230;より高い学習率を利用可能にし初期化への強い依存を減らすことを目的として、基本的には全結合層・畳み込み層のあとで非線形層の前に行います。
+
+<br>
+
+
+**19. Training a neural network**
+
+&#10230;ニューラルネットワークの学習
+
+<br>
+
+
+**20. Definitions**
+
+&#10230;定義
+
+<br>
+
+
+**21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
+
+&#10230;エポック - モデル学習においてエポックとは学習の繰り返しの中の1回を指す用語で、1エポックの間にモデルは全学習データからその重みを更新します。
+
+<br>
+
+
+**22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
+
+&#10230;ミニバッチ勾配降下法 - 学習段階では、計算が複雑になりすぎるため通常は全データを一度に使って重みを更新することはありません。またノイズが問題になるため1つのデータポイントだけを使って重みを更新することもありません。代わりに、更新はミニバッチごとに行われます。各バッチに含まれるデータポイントの数は調整可能なハイパーパラメータです。
+
+<br>
+
+
+**23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
+
+&#10230;損失関数 - 得られたモデルの性能を数値化するために、モデルの出力zが実際の出力yをどの程度正確に予測できているかを評価する損失関数Lが通常使われます。
+
+<br>
+
+
+**24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
+
+&#10230;交差エントロピー誤差 - ニューラルネットワークにおける二項分類では、交差エントロピー誤差L(z,y)が一般的に使用されており、以下のように定義されています。
+
+<br>
+
+
+**25. Finding optimal weights**
+
+&#10230;最適な重みの探索
+
+<br>
+
+
+**26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
+
+&#10230;誤差逆伝播法 - 実際の出力と期待される出力の差に基づいてニューラルネットワークの重みを更新する手法です。各重みwに関する微分は連鎖律を用いて計算されます。
+
+<br>
+
+
+**27. Using this method, each weight is updated with the rule:**
+
+&#10230;この方法を使用することで、それぞれの重みはそのルールにしたがって更新されます。
+
+<br>
+
+
+**28. Updating weights ― In a neural network, weights are updated as follows:**
+
+&#10230;重みの更新 - ニューラルネットワークでは、以下の方法にしたがって重みが更新されます。
+
+<br>
+
+
+**29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
+
+&#10230;ステップ１：訓練データのバッチを用いて順伝播で損失を計算します。ステップ２：損失を逆伝播させて各重みに関する損失の勾配を求めます。ステップ３：求めた勾配を用いてネットワークの重みを更新します。
+
+<br>
+
+
+**30. [Forward propagation, Backpropagation, Weights update]**
+
+&#10230;順伝播、逆伝播、重みの更新
+
+<br>
+
+
+**31. Parameter tuning**
+
+&#10230;パラメータチューニング
+
+<br>
+
+
+**32. Weights initialization**
+
+&#10230;重みの初期化
+
+<br>
+
+
+**33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
+
+&#10230;Xavier初期化 - 完全にランダムな方法で重みを初期化するのではなく、そのアーキテクチャのユニークな特徴を考慮に入れて重みを初期化する方法です。
+
+<br>
+
+
+**34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
+
+&#10230;転移学習 - 深層学習のモデルを学習させるには大量のデータと何よりも時間が必要です。膨大なデータセットから数日・数週間をかけて構築した学習済みモデルを利用し、自身のユースケースに活かすことは有益であることが多いです。手元にあるデータ量次第ではありますが、これを利用する以下の方法があります。
+
+<br>
+
+
+**35. [Training size, Illustration, Explanation]**
+
+&#10230;学習サイズ、図、解説
+
+<br>
+
+
+**36. [Small, Medium, Large]**
+
+&#10230;小、中、大
+
+<br>
+
+
+**37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
+
+&#10230;全層を凍結し、softmaxの重みを学習させる、大半の層を凍結し、最終層とsoftmaxの重みを学習させる、学習済みの重みで初期化して各層とsoftmaxの重みを学習させる
+
+<br>
+
+
+**38. Optimizing convergence**
+
+&#10230;収束の最適化
+
+<br>
+
+
+**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.
+**
+
+&#10230;学習率 - 多くの場合αや時々ηと表記される学習率とは、重みの更新速度を表しています。学習率は固定することもできる上に、適応的に変更することもできます。現在もっとも使用される手法は、学習率を適切に調整するAdamと呼ばれる手法です。
+
+<br>
+
+
+**40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
+
+&#10230;適応学習率法 - モデルを学習させる際に学習率を変動させると、学習時間の短縮や精度の向上につながります。Adamがもっとも一般的に使用されている手法ですが、他の手法も役立つことがあります。それらの手法を下記の表にまとめました。
+
+<br>
+
+
+**41. [Method, Explanation, Update of w, Update of b]**
+
+&#10230;手法、解説、wの更新、bの更新
+
+<br>
+
+
+**42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
+
+&#10230;Momentum（運動量）、振動を抑制する、SGDの改良、チューニングするパラメータは2つ
+
+<br>
+
+
+**43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
+
+&#10230;RMSprop, 二乗平均平方根のプロパゲーション、振動をコントロールすることで学習アルゴリズムを高速化する
+
+<br>
+
+
+**44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
+
+&#10230;Adam, Adaptive Moment estimation, もっとも人気のある手法、チューニングするパラメータは4つ
+
+<br>
+
+
+**45. Remark: other methods include Adadelta, Adagrad and SGD.**
+
+&#10230;備考：他にAdadelta, Adagrad, SGD などの手法があります。
+
+<br>
+
+
+**46. Regularization**
+
+&#10230;正則化
+
+<br>
+
+
+**47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
+
+&#10230;ドロップアウト - ドロップアウトとは、ニューラルネットワークで過学習を避けるためにp>0の確率でノードをドロップアウト（無効化）する手法です。モデルが特定の特徴量に依存しすぎることを避けるよう強制します。
+
+<br>
+
+
+**48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
+
+&#10230;備考：ほとんどの深層学習のフレームワークでは、ドロップアウトを'keep'というパラメータ（1-p)でパラメータ化します。
+
+<br>
+
+
+**49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
+
+&#10230;重みの正則化 - 重みが大きくなりすぎず、モデルが過学習しないようにするため、モデルの重みに対して正則化を行います。主な正則化手法は以下の表にまとめられています。
+
+<br>
+
+
+**50. [LASSO, Ridge, Elastic Net]**
+
+&#10230;LASSO, Ridge, Elastic Net
+
+<br>
+
+**50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230;bis. 係数を0へ小さくする、変数選択に適している、係数を小さくする、変数選択と小さい係数のトレードオフ
+
+<br>
+
+**51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
+
+&#10230;Early stopping - バリデーションの損失が変化しなくなるか、あるいは増加し始めたときに学習を早々に止める正則化方法
+
+<br>
+
+
+**52. [Error, Validation, Training, early stopping, Epochs]**
+
+&#10230;損失、評価、学習、early stopping、エポック
+
+<br>
+
+
+**53. Good practices**
+
+&#10230;おすすめの技法
+
+<br>
+
+
+**54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
+
+&#10230;小さいバッチの過学習 - モデルをデバッグするとき、モデル自体の構造に大きな問題がないか確認するため簡易的なテストが役に立つことが多いです。特に、モデルを正しく学習できることを確認するため、ミニバッチをネットワークに渡してそれを過学習できるかを見ます。もしできなければ、モデルは複雑すぎるか単純すぎるかのいずれかであることを意味し、普通サイズの学習データセットはもちろん、小さいバッチですら過学習できないのです。
+
+<br>
+
+
+**55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
+
+&#10230;Gradient checking (勾配チェック) - Gradient checking とは、ニューラルネットワークの逆伝播を実装する際に用いられる手法です。特定の点で解析的勾配と数値的勾配とを比較する手法で、逆伝播の実装が正しいことを確認できます。
+
+<br>
+
+
+**56. [Type, Numerical gradient, Analytical gradient]**
+
+&#10230;種類、数値的勾配、解析的勾配
+
+<br>
+
+
+**57. [Formula, Comments]**
+
+&#10230;公式、コメント
+
+<br>
+
+
+**58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
+
+&#10230;計算コストが高い；損失を次元ごとに２回計算する必要がある、解析的実装が正しいかのチェックに用いられる、hを選ぶ時に小さすぎると数値不安定になり、大きすぎると勾配近似が不正確になるというトレードオフがある
+
+<br>
+
+
+**59. ['Exact' result, Direct computation, Used in the final implementation]**
+
+&#10230;「正しい」結果、直接的な計算、最終的な実装で使われる
+
+<br>
+
+
+**60. The Deep Learning cheatsheets are now available in [target language].
+
+&#10230;深層学習のチートシートは[対象言語]で利用可能になりました。
+
+
+**61. Original authors**
+
+&#10230;原著者
+
+<br>
+
+**62.Translated by X, Y and Z**
+
+&#10230;X・Y・Z 訳
+
+<br>
+
+**63.Reviewed by X, Y and Z**
+
+&#10230;X・Y・Z 校正
+
+<br>
+
+**64.View PDF version on GitHub**
+
+&#10230;GitHubでPDF版を見る
+
+<br>
+
+**65.By X and Y**
+
+&#10230;X・Y 著
+
+<br>
diff --git a/ja/cs-230-recurrent-neural-networks.md b/ja/cs-230-recurrent-neural-networks.md
new file mode 100644
index 000000000..e366a86de
--- /dev/null
+++ b/ja/cs-230-recurrent-neural-networks.md
@@ -0,0 +1,678 @@
+**Recurrent Neural Networks translation**
+
+<br>
+
+**1. Recurrent Neural Networks cheatsheet**
+
+&#10230;リカレントニューラルネットワーク　チートシート
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230;ディープラーニング
+
+<br>
+
+
+**3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
+
+&#10230;[概要、アーキテクチャの構造、RNNの応用アプリケーション、損失関数、逆伝播]
+
+<br>
+
+
+**4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
+
+&#10230;[長期依存性関係の処理、活性化関数、勾配喪失と発散、勾配クリッピング、GRU/LTSM、ゲートの種類、双方向性RNN、ディープ(深層学習)RNN]
+
+<br>
+
+
+**5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
+
+&#10230;[単語出現の学習、ノーテーション、埋め込み行列、Word2vec、スキップグラム、ネガティブサンプリング、グローブ]
+
+<br>
+
+
+**6. [Comparing words, Cosine similarity, t-SNE]**
+
+&#10230;[単語の比較、コサイン類似度、t-SNE]
+
+<br>
+
+
+**7. [Language model, n-gram, Perplexity]**
+
+&#10230;[言語モデル、n-gramモデル、パープレキシティ]
+
+<br>
+
+
+**8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
+
+&#10230;[機械翻訳、ビームサーチ、単語長の正規化、エラー分析、BLEUスコア(機械翻訳比較スコア)]
+
+<br>
+
+
+**9. [Attention, Attention model, Attention weights]**
+
+&#10230;[アテンション、アテンションモデル、アテンションウェイト]
+
+<br>
+
+
+**10. Overview**
+
+&#10230;概要
+
+<br>
+
+
+**11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
+
+&#10230;一般的なRNNのアーキテクチャ - RNNとして知られるリカレントニューラルネットワークは、隠れ層の状態を利用して、前の出力を次の入力として取り扱うことを可能にするニューラルネットワークの一種です。一般的なモデルは下記のようになります:
+
+<br>
+
+
+**12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
+
+&#10230;それぞれの時点 t において活性化関数の状態 a<t> と出力 y<t> は下記のように表現されます:
+
+<br>
+
+
+**13. and**
+
+&#10230;そして
+
+<br>
+
+
+**14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
+
+&#10230;ここで、Wax,Waa,Wya,ba,by は全ての時点で共有される係数であり、g1,g2 は活性化関数です。
+
+<br>
+
+
+**15. The pros and cons of a typical RNN architecture are summed up in the table below:**
+
+&#10230;一般的なRNNのアーキテクチャ利用の長所・短所については下記の表にまとめられています。
+
+<br>
+
+
+**16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
+
+&#10230;[長所、任意の長さの入力の処理可能性、入力サイズに応じて大きくならないモデルサイズ、時系列情報を考慮した計算、全ての時点で共有される重み]
+
+<br>
+
+
+**17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
+
+&#10230;[短所、遅い計算、長い時間軸での情報の利用の困難性、現在の状態から将来の入力が予測不可能]
+
+<br>
+
+
+**18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
+
+&#10230;RNNの応用 - RNNモデルは主に自然言語処理と音声認識の分野で使用されます。さまざまな応用例が以下の表にとめられています:
+
+<br>
+
+
+**19. [Type of RNN, Illustration, Example]**
+
+&#10230;[RNNの種類、図、例]
+
+<br>
+
+
+**20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
+
+&#10230;[一対一、一対多、多対一、多対多]
+
+<br>
+
+
+**21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
+
+&#10230;[伝統的なニューラルネットワーク、音楽生成、感情分類、固有表現認識、機械翻訳]
+
+<br>
+
+
+**22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
+
+&#10230;損失関数 - リカレントニューラルネットワークの場合、時間軸全体での損失関数Lは、各時点での損失に基づき、次のように定義されます:
+
+<br>
+
+
+**23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
+
+&#10230;時間軸での誤差逆伝播法 - 誤差逆伝播(バックプロパゲーション)が各時点で行われます。時刻 T における、重み行列 W に関する損失 L の導関数は以下のように表されます:
+
+<br>
+
+
+**24. Handling long term dependencies**
+
+&#10230;長期依存関係の処理
+
+<br>
+
+
+**25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
+
+&#10230;一般的に使用される活性化関数 - RNNモジュールで使用される最も一般的な活性化関数を以下に説明します:
+
+<br>
+
+
+**26. [Sigmoid, Tanh, RELU]**
+
+&#10230;[シグモイド、ハイパボリックタンジェント、RELU]
+
+<br>
+
+
+**27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
+
+&#10230;勾配消失と勾配爆発について - 勾配消失と勾配爆発の現象は、RNNでよく見られます。これらの現象が起こる理由は、掛け算の勾配が層の数に対して指数関数的に減少/増加する可能性があるため、長期の依存関係を捉えるのが難しいからです。
+
+<br>
+
+
+**28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
+
+&#10230;勾配クリッピング - 誤差逆伝播法を実行するときに時折発生する勾配爆発問題に対処するために使用される手法です。勾配の上限値を定義することで、実際にこの現象が抑制されます。
+
+<br>
+
+
+**29. clipped**
+
+&#10230;clipped
+
+<br>
+
+
+**30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
+
+&#10230;ゲートの種類 - 勾配消失問題を解決するために、特定のゲートがいくつかのRNNで使用され、通常明確に定義された目的を持っています。それらは通常Γと記され、以下のように定義されます:
+
+<br>
+
+
+**31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
+
+&#10230;ここで、W、U、bはゲート固有の係数、σはシグモイド関数です。主なものは以下の表にまとめられています:
+
+<br>
+
+
+**32. [Type of gate, Role, Used in]**
+
+&#10230;[ゲートの種類、役割、下記で使用]
+
+<br>
+
+
+**33. [Update gate, Relevance gate, Forget gate, Output gate]**
+
+&#10230;[更新ゲート、関連ゲート、忘却ゲート、出力ゲート]
+
+<br>
+
+
+**34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
+
+&#10230;[過去情報はどのくらい重要ですか？、前の情報を削除しますか？、セルを消去しますか？しませんか？、セルをどのくらい見せますか？]
+
+<br>
+
+
+**35. [LSTM, GRU]**
+
+&#10230;[LSTM、GRU]
+
+<br>
+
+
+**36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
+
+&#10230;GRU/LSTM - ゲート付きリカレントユニット（GRU）およびロングショートタームメモリユニット（LSTM）は、従来のRNNが直面した勾配消失問題を解決しようとします。LSTMはGRUを一般化したものです。各アーキテクチャを特徴づける式を以下の表にまとめます:
+
+<br>
+
+
+**37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
+
+&#10230;[特徴づけ、ゲート付きリカレントユニット（GRU）、ロングショートタームメモリ（LSTM）、依存関係]
+
+<br>
+
+
+**38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
+
+&#10230;備考：記号 ⋆ は2つのベクトル間の要素ごとの乗算を表します。
+
+<br>
+
+
+**39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
+
+&#10230;RNNの変種 - 一般的に使用されている他のRNNアーキテクチャを以下の表にまとめます:
+
+<br>
+
+
+**40. [Bidirectional (BRNN), Deep (DRNN)]**
+
+&#10230;[双方向(BRNN)、ディープ(DRNN)] 
+
+<br>
+
+
+**41. Learning word representation**
+
+&#10230;単語表現の学習
+
+<br>
+
+
+**42. In this section, we note V the vocabulary and |V| its size.**
+
+&#10230;この節では、Vは語彙、そして|V|は語彙のサイズを表します。
+
+<br>
+
+
+**43. Motivation and notations**
+
+&#10230;動機と表記
+
+<br>
+
+
+**44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
+
+&#10230;表現のテクニック - 単語を表現する2つの主な方法は、以下の表にまとめられています。
+
+<br>
+
+
+**45. [1-hot representation, Word embedding]**
+
+&#10230;[1-hot表現、単語埋め込み（単語分散表現）]
+
+<br>
+
+
+**46. [teddy bear, book, soft]**
+
+&#10230;[テディベア、本、柔らかい]
+
+<br>
+
+
+**47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
+
+&#10230;[owの表記、素朴なアプローチ、類似性のない情報、ewの表記、単語の類似性の考慮]
+
+<br>
+
+
+**48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
+
+&#10230;埋め込み行列（分散表現行列） - 与えられた単語wに対して、埋め込み行列Eは、1-hot表現owを以下のように埋め込み行列ewに写像します:
+
+<br>
+
+
+**49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
+
+&#10230;注：埋め込み行列は、ターゲット/コンテキスト尤度モデルを使用して学習できます。
+
+<br>
+
+
+**50. Word embeddings**
+
+&#10230;単語の埋め込み
+
+<br>
+
+
+**51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
+
+&#10230;Word2vec - Word2vecは、ある単語が他の単語の周辺にある可能性を推定することで、単語の埋め込みの重みを学習することを目的としたフレームワークです。人気のあるモデルは、スキップグラム、ネガティブサンプリング、およびCBOWです。
+
+<br>
+
+
+**52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
+
+&#10230;[かわいいテディベアが読んでいる、テディベア、柔らかい、ペルシャ詩、芸術]
+
+<br>
+
+
+**53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
+
+&#10230;[代理タスクでのネットワークの訓練、高水準表現の抽出、単語埋め込み重みの計算]
+
+<br>
+
+
+**54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
+
+&#10230;スキップグラム - スキップグラムword2vecモデルは、あるターゲット単語tがコンテキスト単語cと一緒に出現する確率を評価することで単語の埋め込みを学習する教師付き学習タスクです。tに関するパラメータをθtと表記すると、その確率P(t|c) は以下の式で与えられます:
+
+<br>
+
+
+**55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
+
+&#10230;注：softmax部分の分母の語彙全体を合計するため、このモデルの計算コストは高くなります。 CBOWは、ある単語を予測するため周辺単語を使用する別のタイプのword2vecモデルです。
+
+<br>
+
+
+**56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
+
+&#10230;ネガティブサンプリング - ロジスティック回帰を使用したバイナリ分類器のセットで、特定の文脈とあるターゲット単語が同時に出現する確率を評価することを目的としています。モデルはk個のネガティブな例と1つのポジティブな例のセットで訓練されます。コンテキスト単語cとターゲット単語tが与えられると、予測は次のように表現されます。
+
+<br>
+
+
+**57. Remark: this method is less computationally expensive than the skip-gram model.**
+
+&#10230;注：この方法の計算コストは、スキップグラムモデルよりも少ないです。
+
+<br>
+
+
+**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
+
+&#10230;GloVe - GloVeモデルは、単語表現のためのグローバルベクトルの略で、共起行列Xを使用する単語の埋め込み手法です。ここで、各Xi,jは、ターゲットiがコンテキストjで発生した回数を表します。そのコスト関数Jは以下の通りです:
+
+<br>
+
+
+**58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
+Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
+
+&#10230;ここで、fはXi,j =0⟹f（Xi,j）= 0となるような重み関数です。このモデルでeとθが果たす対称性を考えると、最後の単語の埋め込みe（final）wは以下のようになります:
+
+<br>
+
+
+**59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
+
+&#10230;注：学習された単語の埋め込みの個々の要素は、必ずしも解釈可能ではありません。
+
+<br>
+
+
+**60. Comparing words**
+
+&#10230;単語の比較
+
+<br>
+
+
+**61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
+
+&#10230;コサイン類似度 - 単語w1とw2のコサイン類似度は次のように表されます
+
+<br>
+
+
+**62. Remark: θ is the angle between words w1 and w2.**
+
+&#10230;注：θは単語w1とw2の間の角度です。
+
+<br>
+
+
+**63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
+
+&#10230; t-SNE − t-SNE（ｔ−分布型確率的近傍埋め込み）は、高次元埋め込みから低次元埋め込み空間への次元削減を目的とした手法です。実際には、2次元空間で単語ベクトルを視覚化するために使用されます。
+
+<br>
+
+
+**64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
+
+&#10230;[文学、芸術、本、文化、詩、読書、知識、面白い、愛らしい、幼年期、親切、テディベア、柔らかい、抱擁、かわいい、愛らしい] 
+
+<br>
+
+
+**65. Language model**
+
+&#10230;言語モデル
+
+<br>
+
+
+**66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
+
+&#10230;概要 - 言語モデルは文の確率P(y)を推定することを目的としています。
+
+<br>
+
+
+**67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
+
+&#10230;n-gramモデル - このモデルは、トレーニングデータでの出現数を数えることによって、ある表現がコーパスに出現する確率を定量化することを目的とした単純なアプローチです。
+
+<br>
+
+
+**68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
+
+&#10230;パープレキシティ - 言語モデルは一般的に、PPとも呼ばれるパープレキシティメトリックを使用して評価されます。これは、単語数Tにより正規化されたデータセットの逆確率と解釈できます。パープレキシティは低いほど良く、次のように定義されます:
+(訳注:パープレキシティの数値はより低いものがより選択しやすい単語として評価されます。10であれば10個の中から1つ、10000であれば10000個の中から1つ選択されます。)
+
+<br>
+
+
+**69. Remark: PP is commonly used in t-SNE.**
+
+&#10230;注：PPはt-SNEで一般的に使用されています。
+
+<br>
+
+
+**70. Machine translation**
+
+&#10230;機械翻訳
+
+<br>
+
+
+**71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
+
+&#10230;概要 - 機械翻訳モデルは、エンコーダーネットワークのロジックが最初に付加されている以外は、言語モデルと似ています。このため、条件付き言語モデルと呼ばれることもあります。目的は次のような文yを見つけることです:
+
+<br>
+
+
+**72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
+
+&#10230;ビーム検索 - 入力xが与えられたとき最も可能性の高い文yを見つけるために、機械翻訳と音声認識で使用されるヒューリスティック探索アルゴリズムです。
+
+<br>
+
+
+**73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
+
+&#10230;［ステップ１：上位Ｂ個の高い確率を持つ単語y<1>を見つけ、ステップ２：条件付き確率y<k>|x,y<1>,...,y<k−1>を計算し、ステップ３：上位Ｂ個の組み合わせx,y<1>,...,y<k>を保持し、あるストップワードでプロセスを終了します]
+
+<br>
+
+
+**74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
+
+&#10230;注意：ビーム幅が1に設定されている場合、これは単純な貪欲法と同等です。
+
+<br>
+
+
+**75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
+
+&#10230;ビーム幅 - ビーム幅Bはビーム検索のパラメータです。 Bの値を大きくするとより良い結果が得られますが、探索パフォーマンスは低下し、メモリ使用量が増加します。 Bの値が小さいと結果が悪くなりますが、計算量は少なくなります。 Bの標準値は10前後です。
+
+<br>
+
+
+**76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
+
+&#10230;文章の長さの正規化 - 数値の安定性を向上させるために、ビーム検索は通常、正規化（対数尤度正規化）された目的関数に対して適用され、次のように定義されます:
+
+<br>
+
+
+**77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
+
+&#10230;注：パラメータαは緩衝パラメータと見なされ、その値は通常、0.5から1の間です。
+
+<br>
+
+
+**78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
+
+&#10230;エラー分析 - 予測されたˆyの翻訳が良くない場合、以下のようなエラー分析を実行することで、なぜy∗のような良い翻訳を得られなかったのか考えることが可能です:
+
+<br>
+
+
+**79. [Case, Root cause, Remedies]**
+
+&#10230;[症例、根本原因、改善策]
+
+<br>
+
+
+**80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
+
+&#10230;[ビーム検索の誤り、RNNの誤り、ビーム幅の拡大、さまざまなアーキテクチャを試す、正則化、データをさらに取得] 
+
+<br>
+
+
+**81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
+
+&#10230;Bleuスコア - Bleu（Bilingual evaluation understudy）スコアは、n-gramの精度に基づき類似性スコアを計算することで、機械翻訳がどれほど優れているかを定量化します。以下のように定義されています:
+
+<br>
+
+
+**82. where pn is the bleu score on n-gram only defined as follows:**
+
+&#10230;ここで、pnはn-gramでのbleuスコアで下記のようにだけ定義されています:
+
+<br>
+
+
+**83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
+
+&#10230;注：人為的に水増しされたブルースコアを防ぐために、短い翻訳評価には簡潔さへのペナルティが適用される場合があります。
+
+<br>
+
+
+**84. Attention**
+
+&#10230;アテンション
+
+<br>
+
+
+**85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
+
+&#10230;アテンションモデル - このモデルを使用するとRNNは重要であると考えられる入力の特定部分に注目することができ、得られるモデルの性能が実際に向上します。時刻tにおいて、出力y<t>が活性化関数a<t'>とコンテキストc<t>とに払うべき注意量をα<t,t′>と表記すると次のようになります:
+
+<br>
+
+
+**86. with**
+
+&#10230;および
+
+<br>
+
+
+**87. Remark: the attention scores are commonly used in image captioning and machine translation.**
+
+&#10230;注：アテンションスコアは、一般的に画像のキャプション作成および機械翻訳で使用されています。
+
+<br>
+
+
+**88. A cute teddy bear is reading Persian literature.**
+
+&#10230;かわいいテディベアがペルシャ文学を読んでいます。
+
+<br>
+
+
+**89. Attention weight ― The amount of attention that the output y<t> should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
+
+&#10230;アテンションの重み - 出力y<t>が活性化関数a<t'>に払うべき注意量α<t,t′>は次のように計算されます。
+
+<br>
+
+
+**90. Remark: computation complexity is quadratic with respect to Tx.**
+
+&#10230;注：この計算の複雑さはTxに関して2次です。
+
+<br>
+
+
+**91. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230;ディープラーニングのチートシートが[日本語]で利用可能になりました。
+
+<br>
+
+**92. Original authors**
+
+&#10230;原著者
+
+<br>
+
+**93. Translated by X, Y and Z**
+
+&#10230;X・Y・Z 訳
+
+<br>
+
+**94. Reviewed by X, Y and Z**
+
+&#10230;X・Y・Z 校正
+
+<br>
+
+**95. View PDF version on GitHub**
+
+&#10230;GitHubでPDF版を見る
+
+<br>
+
+**96. By X and Y**
+
+&#10230;X・Y 著
+
+<br>
diff --git a/ko/cs-229-linear-algebra.md b/ko/cs-229-linear-algebra.md
new file mode 100644
index 000000000..2342a1619
--- /dev/null
+++ b/ko/cs-229-linear-algebra.md
@@ -0,0 +1,340 @@
+**1. Linear Algebra and Calculus refresher**
+
+&#10230; 선형대수와 미적분학 복습
+
+<br>
+
+**2. General notations**
+
+&#10230; 일반적인 표기법
+
+<br>
+
+**3. Definitions**
+
+&#10230; 정의
+
+<br>
+
+**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
+
+&#10230; 벡터 - x∈Rn는 n개의 요소를 가진 벡터이고, xi∈R는 i번째 요소이다.
+
+<br>
+
+**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
+
+&#10230; 행렬 - A∈Rm×n는 m개의 행과 n개의 열을 가진 행렬이고, Ai,j∈R는 i번째 행, j번째 열에 있는 원소이다.
+
+<br>
+
+**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
+
+&#10230; 비고 : 위에서 정의된 벡터 x는 n×1행렬로 볼 수 있으며, 열벡터라고도 불린다.
+
+<br>
+
+**7. Main matrices**
+
+&#10230; 주요 행렬
+
+<br>
+
+**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
+
+&#10230; 단위행렬 - 단위행렬 I∈Rn×n는 대각성분이 모두 1이고 대각성분이 아닌 성분은 모두 0인 정사각행렬이다.
+
+<br>
+
+**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
+
+&#10230; 비고 : 모든 행렬 A∈Rn×n에 대하여, A×I=I×A=A를 만족한다.
+
+<br>
+
+**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
+
+&#10230; 대각행렬 - 대각행렬 D∈Rn×n는 대각성분은 모두 0이 아니고, 대각성분이 아닌 성분은 모두 0인 정사각행렬이다.
+
+<br>
+
+**11. Remark: we also note D as diag(d1,...,dn).**
+
+&#10230; 비고 : D를 diag(d1,...,dn)라고도 표시한다.
+
+<br>
+
+**12. Matrix operations**
+
+&#10230; 행렬 연산
+
+<br>
+
+**13. Multiplication**
+
+&#10230; 곱셈
+
+<br>
+
+**14. Vector-vector ― There are two types of vector-vector products:**
+
+&#10230; 벡터-벡터 – 벡터 간 연산에는 두 가지 종류가 있다.
+
+<br>
+
+**15. inner product: for x,y∈Rn, we have:**
+
+&#10230; 내적 : x,y∈Rn에 대하여, 
+
+<br>
+
+**16. outer product: for x∈Rm,y∈Rn, we have:**
+
+&#10230; 외적 : x∈Rm,y∈Rn에 대하여, 
+
+<br>
+
+**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
+
+&#10230; 행렬-벡터 - 행렬 A∈Rm×n와 벡터 x∈Rn의 곱은 다음을 만족하는 Rn크기의 벡터이다.
+
+<br>
+
+**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
+
+&#10230; aTr,i는 A의 벡터행, ac,j는 A의 벡터열, xi는 x의 성분이다.
+
+<br>
+
+**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
+
+&#10230; 행렬-행렬 - 행렬 A∈Rm×n와 행렬 B∈Rn×p의 곱은 다음을 만족하는 Rn×p크기의 행렬이다.
+
+<br>
+
+**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
+
+&#10230; aTr,i,bTr,i는 A,B의 벡터행, ac,j,bc,j는 A,B의 벡터열이다.
+
+<br>
+
+**21. Other operations**
+
+&#10230; 그 외 연산
+
+<br>
+
+**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
+
+&#10230; 전치 - 행렬 A∈Rm×n의 전치 AT는 모든 성분을 뒤집은 것이다.
+
+<br>
+
+**23. Remark: for matrices A,B, we have (AB)T=BTAT**
+
+&#10230; 비고 - 행렬 A,B에 대하여, (AB)T=BTAT가 성립힌다.
+
+<br>
+
+**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
+
+&#10230; 역행렬 - 가역행렬 A의 역행렬은 A-1로 표기하며, 유일하다.
+
+<br>
+
+**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
+
+&#10230; 모든 정사각행렬이 역행렬을 갖는 것은 아니다. 그리고, 행렬 A,B에 대하여 (AB)−1=B−1A−1가 성립힌다.
+
+<br>
+
+**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
+
+&#10230; 대각합 – 정사각행렬 A의 대각합 tr(A)는 대각성분의 합이다.
+
+<br>
+
+**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
+
+&#10230; 비고 : 행렬 A,B에 대하여, tr(AT)=tr(A)와 tr(AB)=tr(BA)가 성립힌다.
+
+<br>
+
+**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
+
+&#10230; 행렬식 - 정사각행렬 A∈Rn×n의 행렬식 |A| 또는 det(A)는 i번째 행과 j번째 열이 없는 행렬 A인 A∖i,∖j에 대해 재귀적으로 표현된다.
+
+<br>
+
+**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
+
+&#10230; 비고 : A가 가역일 필요충분조건은 |A|≠0이다. 또한 |AB|=|A||B|와 |AT|=|A|도 그렇다.
+
+<br>
+
+**30. Matrix properties**
+
+&#10230; 행렬의 성질
+
+<br>
+
+**31. Definitions**
+
+&#10230; 정의
+
+<br>
+
+**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
+
+&#10230; 대칭 분해 - 주어진 행렬 A는 다음과 같이 대칭과 비대칭 부분으로 표현될 수 있다.
+
+<br>
+
+**33. [Symmetric, Antisymmetric]**
+
+&#10230; [대칭, 비대칭]
+
+<br>
+
+**34. Norm ― A norm is a function N:V⟶[0,+∞] where V is a vector space, and such that for all x,y∈V, we have:**
+
+&#10230; 노름 – V는 벡터공간일 때, 노름은 모든 x,y∈V에 대해 다음을 만족하는 함수 N:V⟶[0,+∞]이다.
+
+<br>
+
+**35. N(ax)=|a|N(x) for a scalar**
+
+&#10230; scalar a에 대해서 N(ax)=|a|N(x)를 만족한다.
+
+<br>
+
+**36. if N(x)=0, then x=0**
+
+&#10230; N(x)=0이면 x=0이다.
+
+<br>
+
+**37. For x∈V, the most commonly used norms are summed up in the table below:**
+
+&#10230; x∈V에 대해, 가장 일반적으로 사용되는 규범이 아래 표에 요약되어 있다.
+
+<br>
+
+**38. [Norm, Notation, Definition, Use case]**
+
+&#10230; [규범, 표기법, 정의, 유스케이스]
+
+<br>
+
+**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
+
+&#10230; 일차 종속 - 집합 내의 벡터 중 하나가 다른 벡터들의 선형결합으로 정의될 수 있으면, 그 벡터 집합은 일차 종속이라고 한다.
+
+<br>
+
+**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
+
+&#10230; 비고 : 어느 벡터도 이런 방식으로 표현될 수 없다면, 그 벡터들은 일차 독립이라고 한다.
+
+<br>
+
+**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
+
+&#10230; 행렬 랭크 - 주어진 행렬 A의 랭크는 열에 의해 생성된 벡터공간의 차원이고, rank(A)라고 쓴다. 이는 A의 선형독립인 열의 최대 수와 동일하다.
+
+<br>
+
+**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
+
+&#10230; 양의 준정부호 행렬 – 행렬 A∈Rn×n는 다음을 만족하면 양의 준정부호(PSD)라고 하고 A⪰0라고 쓴다.
+
+<br>
+
+**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
+
+&#10230; 비고 : 마찬가지로 PSD 행렬이 모든 0이 아닌 벡터 x에 대하여 xTAx>0를 만족하면 행렬 A를 양의 정부호라고 말하고 A≻0라고 쓴다.
+
+<br>
+
+**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230; 고유값, 고유벡터 - 주어진 행렬 A∈Rn×n에 대하여, 다음을 만족하는 벡터 z∈Rn∖{0}가 존재하면, z를 고유벡터라고 부르고, λ를 A의 고유값이라고 부른다.
+
+<br>
+
+**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230; 스펙트럼 정리 – A∈Rn×n라고 하자. A가 대칭이면, A는 실수 직교행렬 U∈Rn×n에 의해 대각화 가능하다. Λ=diag(λ1,...,λn)인 것에 주목하면, 다음을 만족한다.
+
+<br>
+
+**46. diagonal**
+
+&#10230; 대각
+
+<br>
+
+**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
+
+&#10230; 특이값 분해 – 주어진 m×n차원 행렬 A에 대하여, 특이값 분해(SVD)는 다음과 같이 U m×m 유니터리와 Σ m×n 대각 및 V n×n 유니터리 행렬의 존재를 보증하는 인수분해 기술이다.
+
+<br>
+
+**48. Matrix calculus**
+
+&#10230; 행렬 미적분
+
+<br>
+
+**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
+
+&#10230; 그라디언트 – f:Rm×n→R는 함수이고 A∈Rm×n는 행렬이라 하자. A에 대한 f의 그라디언트 ∇Af(A)는 다음을 만족하는 m×n 행렬이다.
+
+<br>
+
+**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.** 비고 : f의 그라디언트는 f가 스칼라를 반환하는 함수일 때만 정의된다. 
+
+&#10230;
+
+<br>
+
+**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
+
+&#10230; 헤시안 – f:Rn→R는 함수이고 x∈Rn는 벡터라고 하자. x에 대한 f의 헤시안 ∇2xf(x)는 다음을 만족하는 n×n 대칭행렬이다.
+
+<br>
+
+**52. Remark: the hessian of f is only defined when f is a function that returns a scalar** 
+
+&#10230; 비고 : f의 헤시안은 f가 스칼라를 반환하는 함수일 때만 정의된다.
+
+<br>
+
+**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
+
+&#10230; 그라디언트 연산 – 행렬 A,B,C에 대하여, 다음 그라디언트 성질을 염두해두는 것이 좋다.
+
+<br>
+
+**54. [General notations, Definitions, Main matrices]**
+
+&#10230; [일반적인 표기법, 정의, 주요 행렬]
+
+<br>
+
+**55. [Matrix operations, Multiplication, Other operations]**
+
+&#10230; [행렬 연산, 곱셈, 다른 연산]
+
+<br>
+
+**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
+
+&#10230; [행렬 성질, 노름, 고유값/고유벡터, 특이값 분해]
+
+<br>
+
+**57. [Matrix calculus, Gradient, Hessian, Operations]**
+
+&#10230; [행렬 미적분, 그라디언트, 헤시안, 연산]
+
diff --git a/ko/cs-229-machine-learning-tips-and-tricks.md b/ko/cs-229-machine-learning-tips-and-tricks.md
new file mode 100644
index 000000000..d6732e145
--- /dev/null
+++ b/ko/cs-229-machine-learning-tips-and-tricks.md
@@ -0,0 +1,285 @@
+﻿**1. Machine Learning tips and tricks cheatsheet**
+
+&#10230;머신러닝 팁과 트릭 치트시트
+
+<br>
+
+**2. Classification metrics**
+
+&#10230;분류 측정 항목
+
+<br>
+
+**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
+
+&#10230;이진 분류 상황에서 모델의 성능을 평가하기 위해 눈 여겨 봐야하는 주요 측정 항목이 여기에 있습니다.
+
+<br>
+
+**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
+
+&#10230;혼동 행렬 ― 혼동 행렬은 모델의 성능을 평가할 때, 보다 큰 그림을 보기위해 사용됩니다. 이는 다음과 같이 정의됩니다.
+
+<br>
+
+**5. [Predicted class, Actual class]**
+
+&#10230;[예측된 클래스, 실제 클래스]
+
+<br>
+
+**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
+
+&#10230;주요 측정 항목들 ― 다음 측정 항목들은 주로 분류 모델의 성능을 평가할 때 사용됩니다.
+
+<br>
+
+**7. [Metric, Formula, Interpretation]**
+
+&#10230;[측정 항목, 공식, 해석]
+
+<br>
+
+**8. Overall performance of model**
+
+&#10230;전반적인 모델의 성능
+
+<br>
+
+**9. How accurate the positive predictions are**
+
+&#10230;예측된 양성이 정확한 정도
+
+<br>
+
+**10. Coverage of actual positive sample**
+
+&#10230;실제 양성의 예측 정도
+
+<br>
+
+**11. Coverage of actual negative sample**
+
+&#10230;실제 음성의 예측 정도
+
+<br>
+
+**12. Hybrid metric useful for unbalanced classes**
+
+&#10230;불균형 클래스에 유용한 하이브리드 측정 항목
+
+<br>
+
+**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
+
+&#10230;ROC(Receiver Operating Curve) ― ROC 곡선은 임계값의 변화에 따른 TPR 대 FPR의 플롯입니다. 이 측정 항목은 아래 표에 요약되어 있습니다:
+
+<br>
+
+**14. [Metric, Formula, Equivalent]**
+
+&#10230;[측정 항목, 공식, 같은 측도]
+
+<br>
+
+**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
+
+&#10230;AUC(Area Under the receiving operating Curve) ― AUC 또는 AUROC라고도 하는 이 측정 항목은 다음 그림과 같이 ROC 곡선 아래의 영역입니다:
+
+<br>
+
+**16. [Actual, Predicted]**
+
+&#10230;[실제값, 예측된 값]
+
+<br>
+
+**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
+
+&#10230;기본 측정 항목 ― 회귀 모델 f가 주어졌을때, 다음의 측정 항목들은 모델의 성능을 평가할 때 주로 사용됩니다:
+
+<br>
+
+**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
+
+&#10230;[총 제곱합, 설명된 제곱합, 잔차 제곱합]
+
+<br>
+
+**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
+
+&#10230;결정 계수 ― 종종 R2 또는 r2로 표시되는 결정 계수는 관측된 결과가 모델에 의해 얼마나 잘 재현되는지를 측정하는 측도로서 다음과 같이 정의됩니다:
+
+<br>
+
+**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
+
+&#10230;주요 측정 항목들 ― 다음 측정 항목들은 주로 변수의 수를 고려하여 회귀 모델의 성능을 평가할 때 사용됩니다:
+
+<br>
+
+**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
+
+&#10230;여기서 L은 가능도이고 ^σ2는 각각의 반응과 관련된 분산의 추정값입니다.
+
+<br>
+
+**22. Model selection**
+
+&#10230;모델 선택
+
+<br>
+
+**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230;어휘 ― 모델을 선택할 때 우리는 다음과 같이 가지고 있는 데이터를 세 부분으로 구분합니다:
+
+<br>
+
+**24. [Training set, Validation set, Testing set]**
+
+&#10230;[학습 세트, 검증 세트, 테스트 세트]
+
+<br>
+
+**25. [Model is trained, Model is assessed, Model gives predictions]**
+
+&#10230;[모델 훈련, 모델 평가, 모델 예측]
+
+<br>
+
+**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
+
+&#10230;[주로 데이터 세트의 80%, 주로 데이터 세트의 20%]
+
+<br>
+
+**27. [Also called hold-out or development set, Unseen data]**
+
+&#10230;[홀드아웃 또는 개발 세트라고도하는, 보지 않은 데이터]
+
+<br>
+
+**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+&#10230;모델이 선택되면 전체 데이터 세트에 대해 학습을 하고 보지 않은 데이터에서 테스트합니다. 이는 아래 그림에 나타나있습니다.
+
+<br>
+
+**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
+
+&#10230;교차-검증 ― CV라고도하는 교차-검증은 초기의 학습 세트에 지나치게 의존하지 않는 모델을 선택하는데 사용되는 방법입니다. 다양한 유형이 아래 표에 요약되어 있습니다:
+
+<br>
+
+**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
+
+&#10230;[k-1 폴드에 대한 학습과 나머지 1폴드에 대한 평가, n-p개 관측치에 대한 학습과 나머지 p개 관측치에 대한 평가]
+
+<br>
+
+**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
+
+&#10230;[일반적으로 k=5 또는 10, p=1인 케이스는 leave-one-out]
+
+<br>
+
+**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
+
+&#10230;가장 일반적으로 사용되는 방법은 k-폴드 교차-검증이라고하며 이는 학습 데이터를 k개의 폴드로 분할하고, 그 중 k-1개의 폴드로 모델을 학습하는 동시에 나머지 1개의 폴드로 모델을 검증합니다. 이 작업을 k번 수행합니다. 오류는 k 폴드에 대해 평균화되고 교차-검증 오류라고 부릅니다. 
+
+<br>
+
+**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230;정규화 ― 정규화 절차는 데이터에 대한 모델의 과적합을 피하고 분산이 커지는 문제를 처리하는 것을 목표로 합니다. 다음의 표는 일반적으로 사용되는 정규화 기법의 여러 유형을 요약한 것입니다:
+
+<br>
+
+**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230;[계수를 0으로 축소, 변수 선택에 좋음, 계수를 작게 함, 변수 선택과 작은 계수 간의 트래이드오프]
+
+<br>
+
+**35. Diagnostics**
+
+&#10230;진단
+
+<br>
+
+**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
+
+&#10230;편향 ― 모델의 편향은 기대되는 예측과 주어진 데이터 포인트에 대해 예측하려고하는 올바른 모델 간의 차이입니다.
+
+<br>
+
+**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
+
+&#10230;분산 ― 모델의 분산은 주어진 데이터 포인트에 대한 모델 예측의 가변성입니다.
+
+<br>
+
+**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
+
+&#10230;편향/분산 트래이드오프 ― 모델이 간단할수록 편향이 높아지고 모델이 복잡할수록 분산이 커집니다.
+
+<br>
+
+**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
+
+&#10230;[증상, 회귀 일러스트레이션, 분류 일러스트레이션, 딥러닝 일러스트레이션, 가능한 처리방법]
+
+<br>
+
+**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
+
+&#10230;[높은 학습 오류, 테스트 오류에 가까운 학습 오류, 높은 편향, 테스트 에러 보다 약간 낮은 학습 오류, 매우 낮은 학습 오류, 테스트 오류보다 훨씬 낮은 학습 오류, 높은 분산]
+
+<br>
+
+**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
+
+&#10230;[모델 복잡화, 특징 추가, 학습 증대, 정규화 수행, 추가 데이터 수집]
+
+<br>
+
+**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
+
+&#10230;오류 분석 ― 오류 분석은 현재 모델과 완벽한 모델 간의 성능 차이의 근본 원인을 분석합니다.
+
+<br>
+
+**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
+
+&#10230;애블러티브 분석 ― 애블러티브 분석은 현재 모델과 베이스라인 모델 간의 성능 차이의 근본 원인을 분석합니다.
+
+<br>
+
+**44. Regression metrics**
+
+&#10230;회귀 측정 항목
+
+<br>
+
+**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
+
+&#10230;[분류 측정 항목, 혼동 행렬, 정확도, 정밀도, 리콜, F1 스코어, ROC]
+
+<br>
+
+**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
+
+&#10230;[회귀 측정 항목, R 스퀘어, 맬로우의 CP, AIC, BIC]
+
+<br>
+
+**47. [Model selection, cross-validation, regularization]**
+
+&#10230;[모델 선택, 교차-검증, 정규화]
+
+<br>
+
+**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
+
+&#10230;[진단, 편향/분산 트래이드오프, 오류/애블러티브 분석]
diff --git a/ko/cs-229-probability.md b/ko/cs-229-probability.md
new file mode 100644
index 000000000..53ec90c53
--- /dev/null
+++ b/ko/cs-229-probability.md
@@ -0,0 +1,381 @@
+
+**1. Probabilities and Statistics refresher**
+
+&#10230;확률과 통계
+
+<br>
+
+**2. Introduction to Probability and Combinatorics**
+
+&#10230;확률과 조합론 소개
+
+<br>
+
+**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
+
+&#10230;표본 공간 ― 시행의 가능한 모든 결과 집합은 시행의 표본 공간으로 알려져 있으며 S로 표기합니다.
+
+<br>
+
+**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
+
+&#10230;사건 ― 표본 공간의 모든 부분 집합 E를 사건이라고 합니다. 즉, 사건은 시행 가능한 결과로 구성된 집합입니다. 시행 결과가 E에 포함된다면, E가 발생했다고 이야기합니다.
+
+<br>
+
+**5. Axioms of probability ― For each event E, we denote P(E) as the probability of event E occuring.**
+
+&#10230;확률의 공리 ― 각 사건 E에 대하여, 우리는 사건 E가 발생할 확률을 P(E)로 나타냅니다.
+
+<br>
+
+**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
+
+&#10230;공리 1 ― 모든 확률은 0과 1사이에 포함됩니다, 즉:
+
+<br>
+
+**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
+
+&#10230;공리 2 ― 전체 표본 공간에서 적어도 하나의 근원 사건이 발생할 확률은 1입니다. 즉:
+
+<br>
+
+**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
+
+&#10230;공리 3 ― 서로 배반인 어떤 연속적인 사건 E1,...,En 에 대하여, 우리는 다음을 가집니다:
+
+<br>
+
+**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
+
+&#10230;순열(Permutation) ― 순열은 n개의 객체들로부터 r개의 객체들의 순서를 고려한 배열입니다. 그러한 배열의 수는 P (n, r)에 의해 주어지며, 다음과 같이 정의됩니다:
+
+<br>
+
+**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
+
+&#10230;조합(Combination) ― 조합은 n개의 객체들로부터 r개의 객체들의 순서를 고려하지 않은 배열입니다. 그러한 배열의 수는 다음과 같이 정의되는 C(n, r)에 의해 주어집니다:
+
+<br>
+
+**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
+
+&#10230;비고 :우리는 for 0⩽r⩽n에 대해, P(n,r)⩾C(n,r)를 가집니다.
+
+<br>
+
+**12. Conditional Probability**
+
+&#10230;조건부 확률
+
+<br>
+
+**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
+
+&#10230;베이즈 규칙 ― P(B)>0인 사건 A, B에 대해, 우리는 다음을 가집니다:
+
+<br>
+
+**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
+
+&#10230;비고 :우리는 P(A∩B)=P(A)P(B|A)=P(A|B)P(B)를 가집니다.
+
+<br>
+
+**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
+
+&#10230;파티션(Partition)― {Ai, i∈ [[1, n]]}은 모든 i에 대해 Ai ≠ ∅이라고 해봅시다. 우리는 {Ai}가 다음과 같은 경우 파티션이라고 말합니다.
+
+<br>
+
+**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
+
+&#10230;비고 : 표본 공간에서 어떤 사건 B에 대해서 우리는 P(B) = nΣi = 1P (B | Ai) P (Ai)를 가집니다.
+
+<br>
+
+**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
+
+&#10230;베이즈 규칙의 확장된 형태 ― {Ai,i∈[[1,n]]}를 표본 공간의 파티션이라고 합시다. 우리는 다음을 가집니다.: 
+
+<br>
+
+**18. Independence ― Two events A and B are independent if and only if we have:**
+
+&#10230;독립성 ― 다음의 경우에만 두 사건 A, B가 독립적입니다:
+
+<br>
+
+**19. Random Variables**
+
+&#10230;확률 변수
+
+<br>
+
+**20. Definitions**
+
+&#10230;정의
+
+<br>
+
+**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
+
+&#10230;확률 변수 ― 주로 X라고 표기된 확률 변수는 표본 공간의 모든 요소를 ​​실선에 대응시키는 함수입니다.
+
+<br>
+
+**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
+
+&#10230;누적 분포 함수 (CDF) ― 단조 감소하지 않고 limx → -∞F (x) = 0 이고, limx → + ∞F (x) = 1 인 누적 분포 함수 F는 다음과 같이 정의됩니다:
+
+<br>
+
+**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
+
+&#10230;비고 : 우리는 P(a<X⩽B)=F(b)−F(a)를 가집니다.
+
+<br>
+
+**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
+
+&#10230;확률 밀도 함수 (PDF) ― 확률 밀도 함수 f는 인접한 두 확률 변수의 사이에 X가 포함될 확률입니다.
+
+<br>
+
+**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
+
+&#10230;PDF와 CDF의 관계 ― 이산 (D)과 연속 (C) 예시에서 알아야 할 중요한 특성이 있습니다.
+
+<br>
+
+**26. [Case, CDF F, PDF f, Properties of PDF]**
+
+&#10230;[예시, CDF F, PDF f, PDF의 특성]
+
+<br>
+
+**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
+
+&#10230;분포의 기대값과 적률 ― 이산 혹은 연속일 때, 기대값 E[X], 일반화된 기대값 E[g(X)], k번째 적률 E[Xk] 및 특성 함수 ψ(ω) :
+
+<br>
+
+**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
+
+&#10230;분산 (Variance) ― 주로 Var(X) 또는 σ2이라고 표기된 확률 변수의 분산은 분포 함수의 산포(Spread)를 측정한 값입니다. 이는 다음과 같이 결정됩니다:
+
+<br>
+
+**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
+
+&#10230;표준 편차(Standard Deviation) ― 표준 편차는 실제 확률 변수의 단위를 사용할 수 있는 분포 함수의 산포(Spread)를 측정하는 측도입니다. 이는 다음과 같이 결정됩니다:
+<br>
+
+**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
+
+&#10230;확률 변수의 변환 ― 변수 X와 Y를 어떤 함수로 연결되도록 해봅시다. fX와 fY에 각각 X와 Y의 분포 함수를 표기하면 다음과 같습니다:
+
+<br>
+
+**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
+
+&#10230;라이프니츠 적분 규칙 ― g를 x의 함수로, 잠재적으로 c라고 해봅시다. 그리고 c에 종속적인 경계 a, b에 대해 우리는 다음을 가집니다:
+
+<br>
+
+**32. Probability Distributions**
+
+&#10230;확률 분포
+
+<br>
+
+**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
+
+&#10230;체비쇼프 부등식 ― X를 기대값 μ의 확률 변수라고 해봅시다. k에 대하여, σ>0이면 다음과 같은 부등식을 가집니다:
+
+<br>
+
+**34. Main distributions ― Here are the main distributions to have in mind:**
+
+&#10230;주요 분포들― 기억해야 할 주요 분포들이 여기 있습니다:
+
+<br>
+
+**35. [Type, Distribution]**
+
+&#10230;[타입(Type), 분포]
+
+<br>
+
+**36. Jointly Distributed Random Variables**
+
+&#10230;결합 분포 확률 변수
+
+<br>
+
+**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
+
+&#10230;주변 밀도와 누적 분포 ― 결합 밀도 확률 함수 fXY로부터 우리는 다음을 가집니다
+
+<br>
+
+**38. [Case, Marginal density, Cumulative function]**
+
+&#10230;[예시, 주변 밀도, 누적 함수]
+
+<br>
+
+**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
+
+&#10230;조건부 밀도 ― 주로 fX|Y로 표기되는 Y에 대한 X의 조건부 밀도는 다음과 같이 정의됩니다:
+
+<br>
+
+**40. Independence ― Two random variables X and Y are said to be independent if we have:**
+
+&#10230;독립성 ― 두 확률 변수 X와 Y는 다음과 같은 경우에 독립적이라고 합니다:
+
+<br>
+
+**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
+
+&#10230;공분산 ― 다음과 같이 두 확률 변수 X와 Y의 공분산을 σ2XY 혹은 더 일반적으로는 Cov(X,Y)로 정의합니다:
+
+<br>
+
+**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
+
+&#10230;상관관계 ― σX, σY로 X와 Y의 표준 편차를 표기함으로써 ρXY로 표기된 임의의 변수 X와 Y 사이의 상관관계를 다음과 같이 정의합니다:
+
+<br>
+
+**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
+
+&#10230;비고 1 : 우리는 임의의 확률 변수 X, Y에 대해 ρXY∈ [-1,1]를 가진다고 말합니다. 
+
+<br>
+
+**44. Remark 2: If X and Y are independent, then ρXY=0.**
+
+&#10230;비고 2 : X와 Y가 독립이라면 ρXY=0입니다.
+
+<br>
+
+**45. Parameter estimation**
+
+&#10230;모수 추정
+
+<br>
+
+**46. Definitions**
+
+&#10230;정의
+
+<br>
+
+**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
+
+&#10230;확률 표본 ― 확률 표본은 X와 독립적으로 동일하게 분포하는 n개의 확률 변수 X1, ..., Xn의 모음입니다.
+
+<br>
+
+**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
+
+&#10230;추정량 ―  추정량은 통계 모델에서 알 수 없는 모수의 값을 추론하는 데 사용되는 데이터의 함수입니다.
+
+<br>
+
+**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
+
+&#10230;편향 ― 추정량 ^θ의 편향은 ^θ 분포의 기대값과 실제값 사이의 차이로 정의됩니다. 즉,:
+
+<br>
+
+**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
+
+&#10230;비고 : 추정량은 E [^ θ]=θ 일 때, 비 편향적이라고 말합니다.
+
+<br>
+
+**51. Estimating the mean**
+
+&#10230;평균 추정
+
+<br>
+
+**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
+
+&#10230;표본 평균 ― 랜덤 표본의 표본 평균은 분포의 실제 평균 μ를 추정하는 데 사용되며 종종 다음과 같이 정의됩니다:
+
+<br>
+
+**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
+
+&#10230;비고 : 표본 평균은 비 편향적입니다, 즉i.e E[¯¯¯¯¯X]=μ.
+
+<br>
+
+**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
+
+&#10230;중심 극한 정리 ― 평균 μ와 분산 σ2를 갖는 주어진 분포를 따르는 랜덤 표본 X1, ..., Xn을 가정해 봅시다 그러면 우리는 다음을 가집니다:
+
+<br>
+
+**55. Estimating the variance**
+
+&#10230;분산 추정
+
+<br>
+
+**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
+
+&#10230;표본 분산 ― 랜덤 표본의 표본 분산은 분포의 실제 분산 σ2를 추정하는 데 사용되며 종종 s2 또는 σ2로 표기되며 다음과 같이 정의됩니다:
+
+<br>
+
+**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
+
+&#10230;비고 : 표본 분산은 비 편향적입니다, 즉 E[s2]=σ2.
+
+<br>
+
+**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
+
+&#10230;표본 분산과 카이 제곱의 관계 ― s2를 랜덤 표본의 표분 분산이라고 합시다. 우리는 다음을 가집니다:
+
+<br>
+
+**59. [Introduction, Sample space, Event, Permutation]**
+
+&#10230;[소개, 표본 공간, 사건, 순열]
+
+<br>
+
+**60. [Conditional probability, Bayes' rule, Independence]**
+
+&#10230;[조건부 확률, 베이즈 규칙, 독립]
+
+<br>
+
+**61. [Random variables, Definitions, Expectation, Variance]**
+
+&#10230;[확률 변수, 정의, 기대값, 분산]
+
+<br>
+
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
+
+&#10230;[확률 분포, 체비쇼프 부등식, 주요 분포]
+
+<br>
+
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
+
+&#10230;[결합 분포의 확률 변수, 밀도, 공분산, 상관관계]
+
+<br>
+
+**64. [Parameter estimation, Mean, Variance]**
+
+&#10230;[모수 추정, 평균, 분산]
diff --git a/ko/cs-229-unsupervised-learning.md b/ko/cs-229-unsupervised-learning.md
new file mode 100644
index 000000000..e961a88cc
--- /dev/null
+++ b/ko/cs-229-unsupervised-learning.md
@@ -0,0 +1,340 @@
+**1. Unsupervised Learning cheatsheet**
+
+&#10230; 비지도 학습 cheatsheet
+
+<br>
+
+**2. Introduction to Unsupervised Learning**
+
+&#10230; 비지도 학습 소개
+
+<br>
+
+**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
+
+&#10230; 동기부여 - 비지도학습의 목표는 {x(1),...,x(m)}와 같이 라벨링이 되어있지 않은 데이터 내의 숨겨진 패턴을 찾는것이다.
+
+<br>
+
+**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
+
+&#10230; 옌센 부등식 - f를 볼록함수로 하며 X는 확률변수로 두고 아래와 같은 부등식을 따르도록 하자.
+
+<br>
+
+**5. Clustering**
+
+&#10230; 군집화
+
+<br>
+
+**6. Expectation-Maximization**
+
+&#10230; 기댓값 최대화
+
+<br>
+
+**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
+
+&#10230; 잠재변수 - 잠재변수들은 숨겨져있거나 관측되지 않는 변수들을 말하며, 이러한 변수들은 추정문제의 어려움을 가져온다. 그리고 잠재변수는 종종 z로 표기되어진다. 일반적인 잠재변수로 구성되어져있는 형태들을 살펴보자 
+
+<br>
+
+**8. [Setting, Latent variable z, Comments]**
+
+&#10230; 표기형태, 잠재변수 z, 주석
+
+<br>
+
+**9. [Mixture of k Gaussians, Factor analysis]**
+
+&#10230; 가우시안 혼합모델, 요인분석
+
+<br>
+
+**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter  θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230; 알고리즘 - 기댓값 최대화 (EM) 알고리즘은 모수 θ를 추정하는 효율적인 방법을 제공해준다. 모수 θ의 추정은 아래와 같이 우도의 아래 경계지점을 구성하는(E-step)과 그 우도의 아래 경계지점을 최적화하는(M-step)들의 반복적인 최대우도측정을 통해 추정된다. 
+
+<br>
+
+**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
+
+&#10230; E-step : 각각의 데이터 포인트 x(i)은 특정 클러스터 z(i)로 부터 발생한 후 사후확률Qi(z(i))를 평가한다. 아래의 식 참조
+
+<br>
+
+**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
+
+&#10230; M-step : 데이터 포인트 x(i)에 대한 클러스트의 특정 가중치로 사후확률 Qi(z(i))을 사용, 각 클러스트 모델을 개별적으로 재평가한다. 아래의 식 참조
+
+<br>
+
+**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
+
+&#10230; Gaussians 초기값, 기대 단계, 최대화 단계, 수렴
+
+<br>
+
+**14. k-means clustering**
+
+&#10230; k-평균 군집화
+
+<br>
+
+**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
+
+&#10230; c(i)는 데이터 포인트 i 와 j군집의 중앙인 μj 들의 군집이다.
+
+<br>
+
+**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230; 알고리즘 - 군집 중앙에 μ1,μ2,...,μk∈Rn 와 같이 무작위로 초기값을 잡은 후, k-평균 알고리즘이 수렴될때 까지 아래와 같은 단계를 반복한다.
+
+<br>
+
+**17. [Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230; 평균 초기값, 군집분할, 평균 재조정, 수렴
+
+<br>
+
+**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
+
+&#10230; 왜곡 함수 - 알고리즘이 수렴하는지를 확인하기 위해서는 아래와 같은 왜곡함수를 정의해야 한다.
+
+<br>
+
+**19. Hierarchical clustering**
+
+&#10230; 계층적 군집분석
+
+<br>
+
+**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
+
+&#10230; 알고리즘 - 연속적 방식으로 중첩된 클러스트를 구축하는 결합형 계층적 접근방식을 사용하는 군집 알고리즘이다.
+
+<br>
+
+**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
+
+&#10230; 종류 - 다양한 목적함수의 최적화를 목표로하는 다양한 종류의 계층적 군집분석 알고리즘들이 있으며, 아래 표와 같이 요약되어있다.
+
+<br>
+
+**22. [Ward linkage, Average linkage, Complete linkage]**
+
+&#10230; Ward 연결법, 평균 연결법, 완전 연결법
+
+<br>
+
+**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
+
+&#10230; 군집 거리 내에서의 최소화, 한쌍의 군집간 평균거리의 최소화, 한쌍의 군집간 최대거리의 최소화
+
+<br>
+
+**24. Clustering assessment metrics**
+
+&#10230; 군집화 평가 metrics
+
+<br>
+
+**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
+
+&#10230; 비지도학습 환경에서는, 지도학습 환경과는 다르게 실측자료에 라벨링이 없기 때문에 종종 모델에 대한 성능평가가 어렵다.
+
+<br>
+
+**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
+
+&#10230; 실루엣 계수 -  a와 b를 같은 클래스의 다른 모든점과 샘플 사이의 평균거리와 다음 가장 가까운 군집의 다른 모든 점과 샘플사이의 평균거리로 표기하면 단일 샘플에 대한 실루엣 계수 s는 다음과 같이 정의할 수 있다. 
+
+<br>
+
+**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
+
+&#10230; Calinski-Harabaz 색인 - k개 군집에 Bk와 Wk를 표기하면, 다음과 같이 각각 정의 된 군집간 분산행렬이다.
+
+<br>
+
+**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
+
+&#10230; Calinski-Harabaz 색인 s(k)는 군집모델이 군집화를 얼마나 잘 정의하는지를 나타낸다. 가령 높은 점수일수록 군집이 더욱 밀도있으며 잘 분리되는 형태이다. 아래와 같은 정의를 따른다. 
+
+<br>
+
+**29. Dimension reduction**
+
+&#10230; 차원 축소
+
+<br>
+
+**30. Principal component analysis**
+
+&#10230; 주성분 분석
+
+<br>
+
+**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
+
+&#10230; 차원축소 기술은 데이터를 반영하는 최대 분산방향을 찾는 기술이다.
+
+<br>
+
+**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230; 고유값, 고유벡터 - A∈Rn×n 행렬이 주어질때, λ는 A의 고유값이 되며, 만약 z∈Rn∖{0} 벡터가 있다면 고유함수이다. 
+
+<br>
+
+**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230; 스펙트럼 정리 - A∈Rn×n 이라고 하자 만약 A가 대칭이라면, A는 실수 직교 행렬 U∈Rn×n에 의해 대각행렬로 만들 수 있다.
+
+<br>
+
+**34. diagonal** 
+
+&#10230; 대각선
+
+<br>
+
+**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230; 참조: 가장 큰 고유값과 연관된 고유 벡터를 행렬 A의 주요 고유벡터라고 부른다
+
+<br>
+
+**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
+dimensions by maximizing the variance of the data as follows:**
+
+&#10230; 알고리즘 - 주성분 분석(PCA) 절차는 데이터 분산을 최대화하여 k 차원의 데이터를 투영하는 차원 축소 기술로 다음과 같이 따른다.
+
+<br>
+
+**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230; 1단계: 평균을 0으로 표준편차가 1이되도록 데이터를 표준화한다. 
+
+<br>
+
+**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
+
+&#10230; 2단계: 실제 고유값과 대칭인 Σ=1mm∑i=1x(i)x(i)T∈Rn×n를 계산합니다. 
+
+<br>
+
+**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
+
+&#10230; 3단계: k 직교 고유벡터의 합을 u1,...,uk∈Rn와 같이 계산한다. 다시말하면, 가장 큰 고유값 k의 직교 고유벡터이다. 
+
+<br>
+
+**40. Step 4: Project the data on spanR(u1,...,uk).**
+
+&#10230; 4단계: R(u1,...,uk) 범위에 데이터를 투영하자.
+
+<br>
+
+**41. This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230; 해당 절차는 모든 k-차원의 공간들 사이에 분산을 최대화 하는것이다. 
+
+<br>
+
+**42. [Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230; 변수공간의 데이터, 주요성분들 찾기, 주요성분공간의 데이터
+
+<br>
+
+**43. Independent component analysis**
+
+&#10230; 독립성분분석
+
+<br>
+
+**44. It is a technique meant to find the underlying generating sources.**
+
+&#10230; 근원적인 생성원을 찾기위한 기술을 의미한다.
+
+<br>
+
+**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
+
+&#10230; 가정 - 다음과 같이 우리는 데이터 x가 n차원의 소스벡터 s=(s1,...,sn)에서부터 생성되었음을 가정한다. 이때 si는 독립적인 확률변수에서 나왔으며, 혼합 및 비특이 행렬 A를 통해 생성된다고 가정한다. 
+
+<br>
+
+**46. The goal is to find the unmixing matrix W=A−1.**
+
+&#10230; 비혼합 행렬 W=A−1를 찾는 것을 목표로 한다.
+
+<br>
+
+**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
+
+&#10230; Bell과 Sejnowski 독립성분분석(ICA) 알고리즘 - 다음의 단계들을 따르는 비혼합 행렬 W를 찾는 알고리즘이다.
+
+<br>
+
+**48. Write the probability of x=As=W−1s as:**
+
+&#10230; x=As=W−1s의 확률을 다음과 같이 기술한다.
+
+<br>
+
+**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
+
+&#10230; 주어진 학습데이터 {x(i),i∈[[1,m]]}에 로그우도를 기술하고 시그모이드 함수 g를 다음과 같이 표기한다.
+
+<br>
+
+**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
+
+&#10230; 그러므로, 확률적 경사상승 학습 규칙은 각 학습예제 x(i)에 대해서 다음과 같이 W를 업데이트하는 것과 같다. 
+
+<br>
+
+**51. The Machine Learning cheatsheets are now available in Korean.**
+
+&#10230; 머신러닝 cheatsheets는 현재 한국어로 제공된다.
+
+<br>
+
+**52. Original authors**
+
+&#10230; 원저자
+
+<br>
+
+**53. Translated by X, Y and Z**
+
+&#10230; X,Y,Z에 의해 번역되다. 
+
+<br>
+
+**54. Reviewed by X, Y and Z**
+
+&#10230; X,Y,Z에 의해 검토되다.
+
+<br>
+
+**55. [Introduction, Motivation, Jensen's inequality]**
+
+&#10230; 소개, 동기부여, 얀센 부등식
+
+<br>
+
+**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
+
+&#10230; 군집화, 기댓값-최대화, k-means, 계층적 군집화, 측정지표
+
+<br>
+
+**57. [Dimension reduction, PCA, ICA]**
+
+&#10230; 차원축소, 주성분분석(PCA), 독립성분분석(ICA) 
diff --git a/pt/cheatsheet-machine-learning-tips-and-tricks.md b/pt/cheatsheet-machine-learning-tips-and-tricks.md
deleted file mode 100644
index 9712297b8..000000000
--- a/pt/cheatsheet-machine-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,285 +0,0 @@
-**1. Machine Learning tips and tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Classification metrics**
-
-&#10230;
-
-<br>
-
-**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
-
-&#10230;
-
-<br>
-
-**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**5. [Predicted class, Actual class]**
-
-&#10230;
-
-<br>
-
-**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
-
-&#10230;
-
-<br>
-
-**7. [Metric, Formula, Interpretation]**
-
-&#10230;
-
-<br>
-
-**8. Overall performance of model**
-
-&#10230;
-
-<br>
-
-**9. How accurate the positive predictions are**
-
-&#10230;
-
-<br>
-
-**10. Coverage of actual positive sample**
-
-&#10230;
-
-<br>
-
-**11. Coverage of actual negative sample**
-
-&#10230;
-
-<br>
-
-**12. Hybrid metric useful for unbalanced classes**
-
-&#10230;
-
-<br>
-
-**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**14. [Metric, Formula, Equivalent]**
-
-&#10230;
-
-<br>
-
-**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
-
-&#10230;
-
-<br>
-
-**16. [Actual, Predicted]**
-
-&#10230;
-
-<br>
-
-**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
-
-&#10230;
-
-<br>
-
-**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
-
-&#10230;
-
-<br>
-
-**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
-
-&#10230;
-
-<br>
-
-**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
-
-&#10230;
-
-<br>
-
-**22. Model selection**
-
-&#10230;
-
-<br>
-
-**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
-
-&#10230;
-
-<br>
-
-**24. [Training set, Validation set, Testing set]**
-
-&#10230;
-
-<br>
-
-**25. [Model is trained, Model is assessed, Model gives predictions]**
-
-&#10230;
-
-<br>
-
-**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
-
-&#10230;
-
-<br>
-
-**27. [Also called hold-out or development set, Unseen data]**
-
-&#10230;
-
-<br>
-
-**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
-
-&#10230;
-
-<br>
-
-**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
-
-&#10230;
-
-<br>
-
-**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
-
-&#10230;
-
-<br>
-
-**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
-
-&#10230;
-
-<br>
-
-**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
-
-&#10230;
-
-<br>
-
-**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-**35. Diagnostics**
-
-&#10230;
-
-<br>
-
-**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
-
-&#10230;
-
-<br>
-
-**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
-
-&#10230;
-
-<br>
-
-**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
-
-&#10230;
-
-<br>
-
-**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
-
-&#10230;
-
-<br>
-
-**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
-
-&#10230;
-
-<br>
-
-**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
-
-&#10230;
-
-<br>
-
-**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
-
-&#10230;
-
-<br>
-
-**44. Regression metrics**
-
-&#10230;
-
-<br>
-
-**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-**47. [Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/pt/cheatsheet-deep-learning.md b/pt/cs-229-deep-learning.md
similarity index 100%
rename from pt/cheatsheet-deep-learning.md
rename to pt/cs-229-deep-learning.md
diff --git a/pt/refresher-linear-algebra.md b/pt/cs-229-linear-algebra.md
similarity index 100%
rename from pt/refresher-linear-algebra.md
rename to pt/cs-229-linear-algebra.md
diff --git a/pt/cs-229-machine-learning-tips-and-tricks.md b/pt/cs-229-machine-learning-tips-and-tricks.md
new file mode 100644
index 000000000..4bad4360f
--- /dev/null
+++ b/pt/cs-229-machine-learning-tips-and-tricks.md
@@ -0,0 +1,284 @@
+**1. Machine Learning tips and tricks cheatsheet**
+
+&#10230; Dicas e Truques de Aprendizado de Máquina
+
+<br>
+
+**2. Classification metrics**
+
+&#10230; Métricas de classificação
+
+<br>
+
+**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
+
+&#10230; Em um contexto de classificação binária, estas são as principais métricas que são importantes acompanhar para avaliar a desempenho do modelo.
+
+<br>
+
+**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
+
+&#10230; Matriz de confusão ― A matriz de confusão (confusion matrix) é usada para termos um cenário mais completo quando estamos avaliando o desempenho de um modelo. Ela é definida conforme a seguir:
+
+<br>
+
+**5. [Predicted class, Actual class]**
+
+&#10230; [Classe prevista, Classe real]
+
+<br>
+
+**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
+
+&#10230; Principais métricas - As seguintes métricas são comumente usadas para avaliar o desempenho de modelos de classificação:
+
+<br>
+
+**7. [Metric, Formula, Interpretation]**
+
+&#10230; [Métrica, Fórmula, Interpretação]
+
+<br>
+
+**8. Overall performance of model**
+
+&#10230; Desempenho geral do modelo
+
+<br>
+
+**9. How accurate the positive predictions are**
+
+&#10230; Quão precisas são as predições positivas
+
+<br>
+
+**10. Coverage of actual positive sample**
+
+&#10230; Cobertura da amostra positiva real
+
+<br>
+
+**11. Coverage of actual negative sample**
+
+&#10230; Cobertura da amostra negativa real
+
+<br>
+
+**12. Hybrid metric useful for unbalanced classes**
+
+&#10230; Métrica híbrida útil para classes desequilibradas
+
+<br>
+
+**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
+
+&#10230; ROC - A curva de operação do receptor, também chamada ROC (Receiver Operating Characteristic), é o gráfico de TPR versus FPR variando o limiar. Essas métricas estão resumidas na tabela abaixo:
+
+<br>
+
+**14. [Metric, Formula, Equivalent]**
+
+&#10230; [Métrica, Fórmula, Equivalente]
+
+<br>
+
+**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
+
+&#10230; AUC - A área sob a curva de operação de recebimento, também chamado AUC ou AUROC, é a área abaixo da ROC como mostrada na figura a seguir:
+
+<br>
+
+**16. [Actual, Predicted]**
+
+&#10230; [Real, Previsto]
+
+<br>
+
+**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
+
+&#10230; Métricas básicas - Dado um modelo de regresão f, as seguintes métricas são geralmente utilizadas para avaliar o desempenho do modelo:
+
+<br>
+
+**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
+
+&#10230; [Soma total dos quadrados, Soma explicada dos quadrados, Soma residual dos quadrados]
+
+<br>
+
+**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
+
+&#10230; Coeficiente de determinação - O coeficiente de determinação, frequentemente escrito como R2 ou r2, fornece uma medida de quão bem os resultados observados são replicados pelo modelo e é definido como se segue:
+
+<br>
+
+**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
+
+&#10230; Principais métricas - As seguintes métricas são comumente utilizadas para avaliar o desempenho de modelos de regressão, levando em conta o número de variáveis n que eles consideram:
+
+<br>
+
+**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
+
+&#10230; onde L é a probabilidade e ˆσ2 é uma estimativa da variância associada com cada resposta.
+
+<br>
+
+**22. Model selection**
+
+&#10230; Seleção de Modelo
+
+<br>
+
+**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230; Vocabulário ― Ao selecionar um modelo, nós consideramos 3 diferentes partes dos dados que possuímos conforme a seguir:
+
+<br>
+
+**24. [Training set, Validation set, Testing set]**
+
+&#10230; [Conjunto de treino, Conjunto de validação, Conjunto de Teste]
+
+<br>
+
+**25. [Model is trained, Model is assessed, Model gives predictions]**
+
+&#10230; [Modelo é treinado, Modelo é avaliado, Modelo fornece previsões]
+
+<br>
+
+**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
+
+&#10230; [Geralmente 80% do conjunto de dados, Geralmente 20% do conjunto de dados]
+
+<br>
+
+**27. [Also called hold-out or development set, Unseen data]**
+
+&#10230; [Também chamado de hold-out ou conjunto de desenvolvimento, Dados não vistos]
+
+<br>
+
+**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+&#10230; Uma vez que o modelo é escolhido, ele é treinado no conjunto inteiro de dados e testado no conjunto de dados de testes não vistos. São representados na figura abaixo:
+
+<br>
+
+**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
+
+&#10230; Validação cruzada - Validação cruzada, também chamada de CV (Cross-Validation), é um método utilizado para selecionar um modelo que não depende muito do conjunto de treinamento inicial. Os diferente tipos estão resumidos na tabela abaixo:  
+
+<br>
+
+**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
+
+&#10230; [Treino em k-1 partes e teste sobre o restante, Treino em n-p observações e teste sobre p restantes]
+
+<br>
+
+**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
+
+&#10230; [Geralmente k=5 ou 10, caso p=1 é chamado leave-one-out (deixe-um-fora)]
+
+<br>
+
+**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
+
+&#10230; O método mais frequentemente usado é chamado k-fold cross validation e divide os dados de treinamento em k partes enquanto treina o modelo nas outras k-1 partes, todas estas em k vezes. O erro é então calculado sobre as k partes e é chamado erro de validação cruzada (cross-validation error).
+
+<br>
+
+**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230; Regularização ― O procedimento de regularização (regularization) visa evitar que o modelo sobreajuste os dados e portanto lide com os problemas de alta variância. A tabela a seguir resume os diferentes tipos de técnicas de regularização comumente utilizadas:
+<br>
+
+**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230; [Diminui coeficientes para 0, Bom para seleção de variáveis, Faz o coeficiente menor, Balanço entre seleção de variáveis e coeficientes pequenos]
+
+<br>
+
+**35. Diagnostics**
+
+&#10230; Diagnóstico
+
+<br>
+
+**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
+
+&#10230; Viés - O viés (bias) de um modelo é a diferença entre a predição esperada e o modelo correto que nós tentamos prever para determinados pontos de dados.
+
+<br>
+
+**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
+
+&#10230;  Variância - A variância (variance) de um modelo é a variabilidade da previsão do modelo para determinados pontos de dados.
+
+<br>
+
+**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
+
+&#10230; Balanço viés/variância ― Quanto mais simples o modelo, maior o viés e, quanto mais complexo o modelo, maior a variância.
+
+<br>
+
+**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
+
+&#10230; Sintomas, Exemplo de regressão, Exemplo de classificação, Exemplo de Deep Learning, possíveis remédios
+
+<br>
+
+**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
+
+&#10230; [Erro de treinamento elevado, Erro de treinamento próximo ao erro de teste, Viés elevado, Erro de treinamento ligeiramente menor que erro de teste, Erro de treinamento muito baixo, Erro de treinamento muito menor que erro de teste. Alta Variância]
+
+<br>
+
+**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
+
+&#10230; [Modelo de complexificação, Adicionar mais recursos, Treinar mais, Executar a regularização, Obter mais dados]
+
+<br>
+
+**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
+
+&#10230; Análise de erro - Análise de erro (error analysis) é a análise da causa raiz da diferença no desempenho entre o modelo atual e o modelo perfeito.
+
+<br>
+
+**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
+
+&#10230; Análise ablativa - Análise ablativa (ablative analysis) é a análise da causa raiz da diferença no desempenho entre o modelo atual e o modelo base.  
+
+<br>
+
+**44. Regression metrics**
+
+&#10230; Métricas de regressão
+
+<br>
+
+**45. [Classification metrics, confusion matrix, accuracy, precision, recall, specificity, F1 score, ROC]**
+
+&#10230;  [Métricas de classificação, Matriz de confusão, acurácia, precisão, revocação/sensibilidade, especifidade, F1 score, ROC]
+
+<br>
+
+**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
+
+&#10230; [Métricas de Regressão, R quadrado, Mallow's CP, AIC, BIC]
+
+<br>
+
+**47. [Model selection, cross-validation, regularization]**
+
+&#10230; [Seleção de modelo, validação cruzada, regularização]
+
+<br>
+
+**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
+
+&#10230; [Diagnóstico, Balanço viés/variância, Análise de erro/ablativa]
diff --git a/pt/refresher-probability.md b/pt/cs-229-probability.md
similarity index 100%
rename from pt/refresher-probability.md
rename to pt/cs-229-probability.md
diff --git a/pt/cheatsheet-supervised-learning.md b/pt/cs-229-supervised-learning.md
similarity index 100%
rename from pt/cheatsheet-supervised-learning.md
rename to pt/cs-229-supervised-learning.md
diff --git a/pt/cheatsheet-unsupervised-learning.md b/pt/cs-229-unsupervised-learning.md
similarity index 100%
rename from pt/cheatsheet-unsupervised-learning.md
rename to pt/cs-229-unsupervised-learning.md
diff --git a/pt/cs-230-convolutional-neural-networks.md b/pt/cs-230-convolutional-neural-networks.md
new file mode 100644
index 000000000..4934d7c2f
--- /dev/null
+++ b/pt/cs-230-convolutional-neural-networks.md
@@ -0,0 +1,718 @@
+**Convolutional Neural Networks translation**
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230; Dicas de Redes Neurais Convolucionais
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Aprendizagem profunda
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+&#10230; [Visão geral, Estrutura arquitetural]
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230; [Tipos de camadas, Convolução, Pooling, Totalmente conectada]
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230; [Hiperparâmetros de filtro, Dimensões, Passo, Preenchimento]
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230;[Ajustando hiperparâmetros, Compatibilidade de parâmetros, Complexidade de modelo, Campo receptivo]
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230; [Funções de Ativação, Unidade Linear Retificada, Softmax]
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230;[Detecção de objetos, Tipos de modelos, Detecção, Intersecção por União, Supressão não-máxima, YOLO, R-CNN]
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230; [Verificação / reconhecimento facial, Aprendizado de disparo único, Rede siamesa, Perda tripla]
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230; [Transferência de estilo neural, Ativação, Matriz de estilo, Função de custo de estilo/conteúdo]
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230; [Arquiteturas de truques computacionais, Rede Adversarial Generativa, ResNet, Rede de Iniciação]
+
+<br>
+
+
+**12. Overview**
+
+&#10230; Visão geral
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230; Arquitetura de uma RNC tradicional (CNN) - Redes neurais convolucionais, também conhecidas como CNN (em inglês), são tipos específicos de redes neurais que geralmente são compostas pelas seguintes camadas:
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230; A camada convolucional e a camadas de pooling podem ter um ajuste fino considerando os hiperparâmetros que estão descritos nas próximas seções. 
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230; Tipos de camadas
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230; Camada convolucional (CONV) - A camada convolucional (CONV) usa filtros que realizam operações de convolução conforme eles escaneiam a entrada I com relação a suas dimensões. Seus hiperparâmetros incluem o tamanho do filtro F e o passo S. O resultado O é chamado de mapa de recursos (feature map) ou mapa de ativação.
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230; Observação: o passo de convolução também pode ser generalizado para os casos 1D e 3D.
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230; Pooling (POOL) - A camada de pooling (POOL) é uma operação de amostragem (downsampling), tipicamente aplicada depois de uma camada convolucional, que faz alguma invariância espacial. Em particular, pooling máximo e médio são casos especiais de pooling onde o máximo e o médio valor são obtidos, respectivamente.
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230; [Tipo, Propósito, Ilustração, Comentários]
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230; [Pooling máximo, Pooling médio, Cada operação de pooling seleciona o valor máximo da exibição atual, Cada operação de pooling calcula a média dos valores da exibição atual]
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230; [Preserva os recursos detectados, Mais comumente usados, Mapa de recursos de amostragem (downsample), Usado no LeNet]
+
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230; Totalmente Conectado (FC) - A camada totalmente conectada (FC opera em uma entrada achatada, onde cada entrada é conectada a todos os neurônios. Se estiver presente, as camadas FC geralmente são encontradas no final das arquiteturas da CNN e podem ser usadas para otimizar objetivos, como pontuações de classes.
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230; Hiperparâmetros de filtros
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230; A camada de convolução contém filtros para os quais é importante conhecer o significado por trás de seus hiperparâmetros.
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230; Dimensões de um filtro - Um filtro de tamanho F×F aplicado a uma entrada contendo C canais é um volume de tamanho F×F×C que executa convoluções em uma entrada de tamanho I×I×C e produz um mapa de recursos (também chamado de mapa de ativação) da saída de tamanho O×O×1.
+
+<br>
+
+
+**26. Filter**
+
+&#10230; Filtros
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230; Observação: a aplicação de K filtros de tamanho F×F resulta em um mapa de recursos de saída de tamanho O×O×K.
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230; Passo - Para uma operação convolucional ou de pooling, o passo S denota o número de pixels que a janela se move após cada operação.
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230; Zero preenchimento (Zero-padding) - Zero preenchimento denota o processo de adicionar P zeros em cada lado das fronteiras de entrada. Esse valor pode ser especificado manualmente ou automaticamente ajustado através de um dos três modelos abaixo:
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230; [Modo, Valor, Ilustração, Propósito, Válido, Idêntico, Completo]
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230; [Sem preenchimento, Descarta a última convolução se as dimensões não corresponderem, Preenchimento de tal forma que o tamanho do mapa de recursos tenha tamanho ⌈IS⌉, Tamanho da saída é matematicamente conveniente, Também chamado de 'meio' preenchimento, Preenchimento máximo de tal forma que convoluções finais são aplicadas nos limites de a entrada, Filtro 'vê' a entrada de ponta a ponta]
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230; Ajuste de hiperparâmetros
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230; Compatibilidade de parâmetro na camada convolucional - Considerando I o comprimento do tamanho do volume da entrada, F o tamanho do filtro, P a quantidade de preenchimento de zero (zero-padding) e S o tamanho do passo, então o tamanho de saída O do mapa de recursos ao longo dessa dimensão é dado por:
+
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230; [Entrada, Filtro, Saída]
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230; Observação: diversas vezes, Pstart=Pend≜P, em cujo caso podemos substituir Pstart+Pen por 2P na fórmula acima.
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230; Entendendo a complexidade do modelo - Para avaliar a complexidade de um modelo, é geralmente útil determinar o número de parâmetros que a arquitetura deverá ter. Em uma determinada camada de uma rede neural convolucional, ela é dada da seguinte forma: 
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230; [Ilustração, Tamanho da entrada, Tamanho da saída, Número de parâmetros, Observações]
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230; [Um parâmetro de viés (bias parameter) por filtro, Na maioria dos casos, S<F, Uma escolha comum para K é 2C]
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230; [Operação de pooling feita pelo canal, Na maior parte dos casos, S=F]
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230; [Entrada é achatada, Um parâmetro de viés (bias parameter) por neurônio, O número de neurônios FC está livre de restrições estruturais]
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230; Campo receptivo - O campo receptivo na camada k é a área denotada por Rk×Rk da entrada que cada pixel do k-ésimo mapa de ativação pode 'ver'. Ao chamar Fj o tamanho do filtro da camada j e Si o valor do passo da camada i e com a convenção S0=1, o campo receptivo na camada k pode ser calculado com a fórmula:
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230; No exemplo abaixo, temos que F1=F2=3 e S1=S2=1, o que resulta em R2=1+2⋅1+2⋅1=5.
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230; Funções de ativação comumente usadas
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230; Unidade Linear Retificada (Rectified Linear Unit) - A camada unitária linear retificada (ReLU) é uma função de ativação g que é usada em todos os elementos do volume. Tem como objetivo introduzir não linearidades na rede. Suas variantes estão resumidas na tabela abaixo:
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230; [ReLU, Leaky ReLU, ELU, com]
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230; [Complexidades de não-linearidade biologicamente interpretáveis, Endereça o problema da ReLU para valores negativos, Diferenciável em todos os lugares]
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230; Softmax - O passo de softmax pode ser visto como uma função logística generalizada que pega como entrada um vetor de pontuações x∈Rn e retorna um vetor de probabilidades p∈Rn através de uma função softmax no final da arquitetura. É definida como:
+
+<br>
+
+
+**48. where**
+
+&#10230; onde
+
+<br>
+
+
+**49. Object detection**
+
+&#10230; Detecção de objeto
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230; Tipos de modelos - Existem 3 tipos de algoritmos de reconhecimento de objetos, para o qual a natureza do que é previsto é diferente para cada um. Eles estão descritos na tabela abaixo:
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230; [Classificação de imagem, Classificação com localização, Detecção]
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230; [Urso de pelúcia, Livro]
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230; [Classifica uma imagem, Prevê a probabilidade de um objeto, Detecta um objeto em uma imagem, Prevê a probabilidade de objeto e onde ele está localizado, Detecta vários objetos em uma imagem, Prevê probabilidades de objetos e onde eles estão localizados]
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230; [CNN tradicional, YOLO simplificado, R-CNN, YOLO, R-CNN]
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230; Detecção - No contexto da detecção de objetos, diferentes métodos são usados dependendo se apenas queremos localizar o objeto ou detectar uma forma mais complexa na imagem. Os dois principais são resumidos na tabela abaixo:
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230; [Detecção de caixa limite, Detecção de marco]
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230; [Detecta parte da imagem onde o objeto está localizado, Detecta a forma ou característica de um objeto (e.g. olhos), Mais granular]
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230; [Caixa central (bx,by), altura bh e largura bw, Pontos de referência (l1x,l1y), ..., (lnx,lny)]
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230; Interseção sobre União (Intersection over Union) - Interseção sobre União, também conhecida como IoU, é uma função que quantifica quão corretamente posicionado uma caixa de delimitação predita Bp está sobre a caixa de delimitação real Ba. É definida por:
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230; Observação: temos que IoU∈[0,1]. Por convenção, uma caixa de delimitação predita Bp é considerada razoavelmente boa se IoU(Bp,Ba)⩾0.5.
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230; Caixas de ancoragem (Anchor boxes) - Caixas de ancoragem é uma técnica usada para predizer caixas de delimitação que se sobrepõem. Na prática, a rede tem permissão para predizer mais de uma caixa simultaneamente, onde cada caixa prevista é restrita a ter um dado conjunto de propriedades geométricas. Por exemplo, a primeira predição pode ser potencialmente uma caixa retangular de uma determinada forma, enquanto a segunda pode ser outra caixa retangular de uma forma geométrica diferente.
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230; Supressão não máxima (Non-max suppression) - A técnica supressão não máxima visa remover caixas de delimitação de um mesmo objeto que estão duplicadas e se sobrepõem, selecionando as mais representativas. Depois de ter removido todas as caixas que contém uma predição menor que 0.6. os seguintes passos são repetidos enquanto existem caixas remanescentes:
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230; [Para uma dada classe, Passo 1: Pegue a caixa com a maior predição de probabilidade., Passo 2: Descarte todas as caixas que tem IoU⩾0.5 com a caixa anterior.]
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230; [Predição de caixa, Seleção de caixa com máxima probabilidade, Remoção de sobreposições da mesma classe, Caixas de delimitação final]
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230; YOLO - Você Apenas Vê Uma Vez (You Only Look Once - YOLO) é um algoritmo de detecção de objeto que realiza os seguintes passos:
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230; [Passo 1: Divide a imagem de entrada em uma grade G×G., Passo 2: Para cada célula da grade, roda uma CNN que prevê o valor y da seguinte forma:, repita k vezes]
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230; onde pc é a probabilidade de detecção do objeto, bx,by,bh,bw são as propriedades das caixas delimitadoras detectadas, c1,...,cp é uma representação única (one-hot representation) de quais das classes p foram detectadas, e k é o número de caixas de ancoragem.
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230; Passo 3:  Rode o algoritmo de supressão não máximo para remover qualquer caixa delimitadora duplicada e que se sobrepõe.
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230; [Imagem original, Divisão em uma grade GxG, Caixa delimitadora prevista, Supressão não máxima]
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230; Observação: Quando pc=0, então a rede não detecta nenhum objeto. Nesse caso, as predições correspondentes bx,...,cp devem ser ignoradas.
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230; R-CNN - Região com Redes Neurais Convolucionais (R-CNN) é um algoritmo de detecção de objetos que primeiro segmenta a imagem para encontrar potenciais caixas de delimitação relevantes e então roda o algoritmo de detecção para encontrar os objetos mais prováveis dentro das caixas de delimitação.
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230; [Imagem original, Segmentação, Predição da caixa delimitadora, Supressão não-máxima]
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230; Observação: embora o algoritmo original seja computacionalmente caro e lento, arquiteturas mais recentes, como o Fast R-CNN e o Faster R-CNN, permitiram que o algoritmo fosse executado mais rapidamente.
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230; Verificação facial e reconhecimento
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230; Tipos de modelos - Os dois principais tipos de modelos são resumidos na tabela abaixo:
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230; [Verificação facial, Reconhecimento facial, Consulta, Referência, Banco de dados]
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230; [Esta é a pessoa correta?, Pesquisa um-para-um, Esta é uma das K pessoas no banco de dados?, Pesquisa um-para-muitos]
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230; Aprendizado de Disparo Único (One Shot Learning) - One Shot Learning é um algoritmo de verificação facial que utiliza um conjunto de treinamento limitado para aprender uma função de similaridade que quantifica o quão diferentes são as duas imagens. A função de similaridade aplicada a duas imagens é frequentemente denotada como  d(imagem 1, imagem 2).
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230; Rede Siamesa (Siamese Network) - Siamese Networks buscam aprender como codificar imagens para depois quantificar quão diferentes são as duas imagens. Para uma imagem de entrada x(i), o resultado codificado é normalmente denotado como f(x(i)).
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230; Perda tripla (Triplet loss) - A perda tripla ℓ é uma função de perda (loss function) computada na representação da encorporação de três imagens A (âncora), P (positiva) e N (negativa). O exemplo da âncora e positivo pertencem à mesma classe, enquanto o exemplo negativo pertence a uma classe diferente. Chamando o parâmetro de margem de α∈R+, essa função de perda é definida da seguinte forma:
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230; Transferência de estilo neural
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230; Motivação - O objetivo da transferência de estilo neural é gerar uma imagem G baseada num dado conteúdo C com um estilo S. 
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230; [Conteúdo C, Estulo S, Imagem gerada G]
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230; Ativação - Em uma dada camada l, a ativação é denotada como a[l] e suas dimensões são nH×nw×nc
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230; Função de custo de conteúdo (Content cost function) - A função de custo de conteúdo Jcontent(C,G) é usada para determinar como a imagem gerada G difere da imagem de conteúdo original C. Ela é definida da seguinte forma:
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230; Matriz de estilo - A matriz de estilo G[l] de uma determinada camada l é a matriz de Gram em que cada um dos seus elementos G[l]kk′ quantificam quão correlacionados são os canais k e k′. Ela é definida com respeito às ativações a[l] da seguinte forma:
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230; Observação: a matriz de estilo para a imagem estilizada e para a imagem gerada são denotadas como G[l] (S) e G[l] (G), respectivamente.
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230; Função de custo de estilo (Style cost function) - A função de custo de estilo Jstyle(S,G) é usada para determinar como a imagem gerada G difere do estilo S. Ela é definida da seguinte forma:
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230; Função de custo geral (Overall cost function) é definida como sendo a combinação das funções de custo do conteúdo e do estilo, ponderada pelos parâmetros α,β, como mostrado abaixo:
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230; Observação: um valor de α maior irá fazer com que o modelo se preocupe mais com o conteúdo enquanto um maior valor de β irá fazer com que ele se preocupe mais com o estilo.
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230; Arquiteturas usando truques computacionais
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230; Rede Adversarial Gerativa (Generative Adversarial Network) - As Generaive Adversarial Networks, também conhecidas como GANs, são compostas de um modelo generativo e um modelo discriminativo, onde o modelo generativo visa gerar a saída mais verdadeira que será alimentada na discriminativa que visa diferenciar a imagem gerada e a imagem verdadeira.
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230; [Treinamento, Ruído, Imagem real, Gerador, Discriminador, Falsa real]
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230; Observação: casos de uso usando variações de GANs incluem texto para imagem, geração de música e síntese.
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230; ResNet - A arquitetura de Rede Residual (também chamada de ResNet) usa blocos residuais com um alto número de camadas para diminuir o erro de treinamento. O bloco residual possui a seguinte equação caracterizadora:
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230; Rede de Iniciação - Esta arquitetura utiliza módulos de iniciação e visa experimentar diferentes convoluções, a fim de aumentar seu desempenho através da diversificação de recursos. Em particular, ele usa o truque de convolução 1×1 para limitar a carga computacional.
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230; Os resumos de Aprendizagem Profunda estão disponíveis em português.
+
+<br>
+
+
+**98. Original authors**
+
+&#10230; Autores Originais
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230; Traduzido por Leticia Portella
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230; Revisado por Gabriel Fonseca
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230; Ver versão em PDF no GitHub.
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230; Por X e Y
+
+<br>
diff --git a/ru/cheatsheet-deep-learning.md b/ru/cheatsheet-deep-learning.md
deleted file mode 100644
index a5aa3756c..000000000
--- a/ru/cheatsheet-deep-learning.md
+++ /dev/null
@@ -1,321 +0,0 @@
-**1. Deep Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Neural Networks**
-
-&#10230;
-
-<br>
-
-**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
-
-&#10230;
-
-<br>
-
-**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
-
-&#10230;
-
-<br>
-
-**5. [Input layer, hidden layer, output layer]**
-
-&#10230;
-
-<br>
-
-**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
-
-&#10230;
-
-<br>
-
-**7. where we note w, b, z the weight, bias and output respectively.**
-
-&#10230;
-
-<br>
-
-**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
-
-&#10230;
-
-<br>
-
-**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
-
-&#10230;
-
-<br>
-
-**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
-
-&#10230;
-
-<br>
-
-**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
-
-&#10230;
-
-<br>
-
-**13. As a result, the weight is updated as follows:**
-
-&#10230;
-
-<br>
-
-**14. Updating weights ― In a neural network, weights are updated as follows:**
-
-&#10230;
-
-<br>
-
-**15. Step 1: Take a batch of training data.**
-
-&#10230;
-
-<br>
-
-**16. Step 2: Perform forward propagation to obtain the corresponding loss.**
-
-&#10230;
-
-<br>
-
-**17. Step 3: Backpropagate the loss to get the gradients.**
-
-&#10230;
-
-<br>
-
-**18. Step 4: Use the gradients to update the weights of the network.**
-
-&#10230;
-
-<br>
-
-**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
-
-&#10230;
-
-<br>
-
-**20. Convolutional Neural Networks**
-
-&#10230;
-
-<br>
-
-**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
-
-&#10230;
-
-<br>
-
-**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
-
-&#10230;
-
-<br>
-
-**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
-
-&#10230;
-
-<br>
-
-**24. Recurrent Neural Networks**
-
-&#10230;
-
-<br>
-
-**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
-
-&#10230;
-
-<br>
-
-**26. [Input gate, forget gate, gate, output gate]**
-
-&#10230;
-
-<br>
-
-**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
-
-&#10230;
-
-<br>
-
-**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
-
-&#10230;
-
-<br>
-
-**29. Reinforcement Learning and Control**
-
-&#10230;
-
-<br>
-
-**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
-
-&#10230;
-
-<br>
-
-**33. S is the set of states**
-
-&#10230;
-
-<br>
-
-**34. A is the set of actions**
-
-&#10230;
-
-<br>
-
-**35. {Psa} are the state transition probabilities for s∈S and a∈A**
-
-&#10230;
-
-<br>
-
-**36. γ∈[0,1[ is the discount factor**
-
-&#10230;
-
-<br>
-
-**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
-
-&#10230;
-
-<br>
-
-**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
-
-&#10230;
-
-<br>
-
-**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
-
-&#10230;
-
-<br>
-
-**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
-
-&#10230;
-
-<br>
-
-**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
-
-&#10230;
-
-<br>
-
-**42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
-
-&#10230;
-
-<br>
-
-**43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
-
-&#10230;
-
-<br>
-
-**44. 1) We initialize the value:**
-
-&#10230;
-
-<br>
-
-**45. 2) We iterate the value based on the values before:**
-
-&#10230;
-
-<br>
-
-**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
-
-&#10230;
-
-<br>
-
-**47. times took action a in state s and got to s′**
-
-&#10230;
-
-<br>
-
-**48. times took action a in state s**
-
-&#10230;
-
-<br>
-
-**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
-
-&#10230;
-
-<br>
-
-**50. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-**51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
-
-&#10230;
-
-<br>
-
-**52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
-
-&#10230;
-
-<br>
-
-**53. [Recurrent Neural Networks, Gates, LSTM]**
-
-&#10230;
-
-<br>
-
-**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
-
-&#10230;
diff --git a/ru/cheatsheet-machine-learning-tips-and-tricks.md b/ru/cheatsheet-machine-learning-tips-and-tricks.md
deleted file mode 100644
index 9712297b8..000000000
--- a/ru/cheatsheet-machine-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,285 +0,0 @@
-**1. Machine Learning tips and tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Classification metrics**
-
-&#10230;
-
-<br>
-
-**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
-
-&#10230;
-
-<br>
-
-**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**5. [Predicted class, Actual class]**
-
-&#10230;
-
-<br>
-
-**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
-
-&#10230;
-
-<br>
-
-**7. [Metric, Formula, Interpretation]**
-
-&#10230;
-
-<br>
-
-**8. Overall performance of model**
-
-&#10230;
-
-<br>
-
-**9. How accurate the positive predictions are**
-
-&#10230;
-
-<br>
-
-**10. Coverage of actual positive sample**
-
-&#10230;
-
-<br>
-
-**11. Coverage of actual negative sample**
-
-&#10230;
-
-<br>
-
-**12. Hybrid metric useful for unbalanced classes**
-
-&#10230;
-
-<br>
-
-**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**14. [Metric, Formula, Equivalent]**
-
-&#10230;
-
-<br>
-
-**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
-
-&#10230;
-
-<br>
-
-**16. [Actual, Predicted]**
-
-&#10230;
-
-<br>
-
-**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
-
-&#10230;
-
-<br>
-
-**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
-
-&#10230;
-
-<br>
-
-**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
-
-&#10230;
-
-<br>
-
-**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
-
-&#10230;
-
-<br>
-
-**22. Model selection**
-
-&#10230;
-
-<br>
-
-**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
-
-&#10230;
-
-<br>
-
-**24. [Training set, Validation set, Testing set]**
-
-&#10230;
-
-<br>
-
-**25. [Model is trained, Model is assessed, Model gives predictions]**
-
-&#10230;
-
-<br>
-
-**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
-
-&#10230;
-
-<br>
-
-**27. [Also called hold-out or development set, Unseen data]**
-
-&#10230;
-
-<br>
-
-**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
-
-&#10230;
-
-<br>
-
-**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
-
-&#10230;
-
-<br>
-
-**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
-
-&#10230;
-
-<br>
-
-**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
-
-&#10230;
-
-<br>
-
-**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
-
-&#10230;
-
-<br>
-
-**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-**35. Diagnostics**
-
-&#10230;
-
-<br>
-
-**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
-
-&#10230;
-
-<br>
-
-**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
-
-&#10230;
-
-<br>
-
-**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
-
-&#10230;
-
-<br>
-
-**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
-
-&#10230;
-
-<br>
-
-**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
-
-&#10230;
-
-<br>
-
-**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
-
-&#10230;
-
-<br>
-
-**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
-
-&#10230;
-
-<br>
-
-**44. Regression metrics**
-
-&#10230;
-
-<br>
-
-**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-**47. [Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/ru/cheatsheet-supervised-learning.md b/ru/cheatsheet-supervised-learning.md
deleted file mode 100644
index a6b19ea1c..000000000
--- a/ru/cheatsheet-supervised-learning.md
+++ /dev/null
@@ -1,567 +0,0 @@
-**1. Supervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Supervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
-
-&#10230;
-
-<br>
-
-**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**5. [Regression, Classifier, Outcome, Examples]**
-
-&#10230;
-
-<br>
-
-**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**7. Type of model ― The different models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
-
-&#10230;
-
-<br>
-
-**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**10. Notations and general concepts**
-
-&#10230;
-
-<br>
-
-**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
-
-&#10230;
-
-<br>
-
-**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
-
-&#10230;
-
-<br>
-
-**14. [Linear regression, Logistic regression, SVM, Neural Network]**
-
-&#10230;
-
-<br>
-
-**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
-
-&#10230;
-
-<br>
-
-**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
-
-&#10230;
-
-<br>
-
-**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
-
-&#10230;
-
-<br>
-
-**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
-
-&#10230;
-
-<br>
-
-**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
-
-&#10230;
-
-<br>
-
-**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
-
-&#10230;
-
-<br>
-
-**21. Linear models**
-
-&#10230;
-
-<br>
-
-**22. Linear regression**
-
-&#10230;
-
-<br>
-
-**23. We assume here that y|x;θ∼N(μ,σ2)**
-
-&#10230;
-
-<br>
-
-**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
-
-&#10230;
-
-<br>
-
-**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
-
-&#10230;
-
-<br>
-
-**26. Remark: the update rule is a particular case of the gradient ascent.**
-
-&#10230;
-
-<br>
-
-**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
-
-&#10230;
-
-<br>
-
-**28. Classification and logistic regression**
-
-&#10230;
-
-<br>
-
-**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
-
-&#10230;
-
-<br>
-
-**31. Remark: there is no closed form solution for the case of logistic regressions.**
-
-&#10230;
-
-<br>
-
-**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
-
-&#10230;
-
-<br>
-
-**33. Generalized Linear Models**
-
-&#10230;
-
-<br>
-
-**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
-
-&#10230;
-
-<br>
-
-**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
-
-&#10230;
-
-<br>
-
-**36. Here are the most common exponential distributions summed up in the following table:**
-
-&#10230;
-
-<br>
-
-**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
-
-&#10230;
-
-<br>
-
-**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
-
-&#10230;
-
-<br>
-
-**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
-
-&#10230;
-
-<br>
-
-**40. Support Vector Machines**
-
-&#10230;
-
-<br>
-
-**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
-
-&#10230;
-
-<br>
-
-**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
-
-&#10230;
-
-<br>
-
-**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
-
-&#10230;
-
-<br>
-
-**44. such that**
-
-&#10230;
-
-<br>
-
-**45. support vectors**
-
-&#10230;
-
-<br>
-
-**46. Remark: the line is defined as wTx−b=0.**
-
-&#10230;
-
-<br>
-
-**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
-
-&#10230;
-
-<br>
-
-**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
-
-&#10230;
-
-<br>
-
-**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
-
-&#10230;
-
-<br>
-
-**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
-
-&#10230;
-
-<br>
-
-**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the coefficients βi are called the Lagrange multipliers.**
-
-&#10230;
-
-<br>
-
-**54. Generative Learning**
-
-&#10230;
-
-<br>
-
-**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
-
-&#10230;
-
-<br>
-
-**56. Gaussian Discriminant Analysis**
-
-&#10230;
-
-<br>
-
-**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
-
-&#10230;
-
-<br>
-
-**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
-
-&#10230;
-
-<br>
-
-**59. Naive Bayes**
-
-&#10230;
-
-<br>
-
-**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
-
-&#10230;
-
-<br>
-
-**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
-
-&#10230;
-
-<br>
-
-**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
-
-&#10230;
-
-<br>
-
-**63. Tree-based and ensemble methods**
-
-&#10230;
-
-<br>
-
-**64. These methods can be used for both regression and classification problems.**
-
-&#10230;
-
-<br>
-
-**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
-
-&#10230;
-
-<br>
-
-**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
-
-&#10230;
-
-<br>
-
-**67. Remark: random forests are a type of ensemble methods.**
-
-&#10230;
-
-<br>
-
-**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**69. [Adaptive boosting, Gradient boosting]**
-
-&#10230;
-
-<br>
-
-**70. High weights are put on errors to improve at the next boosting step**
-
-&#10230;
-
-<br>
-
-**71. Weak learners trained on remaining errors**
-
-&#10230;
-
-<br>
-
-**72. Other non-parametric approaches**
-
-&#10230;
-
-<br>
-
-**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
-
-&#10230;
-
-<br>
-
-**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**75. Learning Theory**
-
-&#10230;
-
-<br>
-
-**76. Union bound ― Let A1,...,Ak be k events. We have:**
-
-&#10230;
-
-<br>
-
-**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
-
-&#10230;
-
-<br>
-
-**78. Remark: this inequality is also known as the Chernoff bound.**
-
-&#10230;
-
-<br>
-
-**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
-
-&#10230;
-
-<br>
-
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
-
-&#10230;
-
-<br>
-
-**81: the training and testing sets follow the same distribution **
-
-&#10230;
-
-<br>
-
-**82. the training examples are drawn independently**
-
-&#10230;
-
-<br>
-
-**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
-
-&#10230;
-
-<br>
-
-**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
-
-&#10230;
-
-<br>
-
-**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
-
-&#10230;
-
-<br>
-
-**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**88. [Introduction, Type of prediction, Type of model]**
-
-&#10230;
-
-<br>
-
-**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
-
-&#10230;
-
-<br>
-
-**90. [Linear models, linear regression, logistic regression, generalized linear models]**
-
-&#10230;
-
-<br>
-
-**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
-
-&#10230;
-
-<br>
-
-**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
-
-&#10230;
-
-<br>
-
-**94. [Other methods, k-NN]**
-
-&#10230;
-
-<br>
-
-**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
-
-&#10230;
diff --git a/ru/cheatsheet-unsupervised-learning.md b/ru/cheatsheet-unsupervised-learning.md
deleted file mode 100644
index e18b3f50f..000000000
--- a/ru/cheatsheet-unsupervised-learning.md
+++ /dev/null
@@ -1,340 +0,0 @@
-**1. Unsupervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Unsupervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
-
-&#10230;
-
-<br>
-
-**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
-
-&#10230;
-
-<br>
-
-**5. Clustering**
-
-&#10230;
-
-<br>
-
-**6. Expectation-Maximization**
-
-&#10230;
-
-<br>
-
-**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
-
-&#10230;
-
-<br>
-
-**8. [Setting, Latent variable z, Comments]**
-
-&#10230;
-
-<br>
-
-**9. [Mixture of k Gaussians, Factor analysis]**
-
-&#10230;
-
-<br>
-
-**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
-
-&#10230;
-
-<br>
-
-**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
-
-&#10230;
-
-<br>
-
-**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
-
-&#10230;
-
-<br>
-
-**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
-
-&#10230;
-
-<br>
-
-**14. k-means clustering**
-
-&#10230;
-
-<br>
-
-**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
-
-&#10230;
-
-<br>
-
-**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
-
-&#10230;
-
-<br>
-
-**17. [Means initialization, Cluster assignment, Means update, Convergence]**
-
-&#10230;
-
-<br>
-
-**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
-
-&#10230;
-
-<br>
-
-**19. Hierarchical clustering**
-
-&#10230;
-
-<br>
-
-**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
-
-&#10230;
-
-<br>
-
-**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**22. [Ward linkage, Average linkage, Complete linkage]**
-
-&#10230;
-
-<br>
-
-**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
-
-&#10230;
-
-<br>
-
-**24. Clustering assessment metrics**
-
-&#10230;
-
-<br>
-
-**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
-
-&#10230;
-
-<br>
-
-**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
-
-&#10230;
-
-<br>
-
-**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
-
-&#10230;
-
-<br>
-
-**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Dimension reduction**
-
-&#10230;
-
-<br>
-
-**30. Principal component analysis**
-
-&#10230;
-
-<br>
-
-**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
-
-&#10230;
-
-<br>
-
-**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**34. diagonal**
-
-&#10230;
-
-<br>
-
-**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
-
-&#10230;
-
-<br>
-
-**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
-dimensions by maximizing the variance of the data as follows:**
-
-&#10230;
-
-<br>
-
-**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
-
-&#10230;
-
-<br>
-
-**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
-
-&#10230;
-
-<br>
-
-**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
-
-&#10230;
-
-<br>
-
-**40. Step 4: Project the data on spanR(u1,...,uk).**
-
-&#10230;
-
-<br>
-
-**41. This procedure maximizes the variance among all k-dimensional spaces.**
-
-&#10230;
-
-<br>
-
-**42. [Data in feature space, Find principal components, Data in principal components space]**
-
-&#10230;
-
-<br>
-
-**43. Independent component analysis**
-
-&#10230;
-
-<br>
-
-**44. It is a technique meant to find the underlying generating sources.**
-
-&#10230;
-
-<br>
-
-**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
-
-&#10230;
-
-<br>
-
-**46. The goal is to find the unmixing matrix W=A−1.**
-
-&#10230;
-
-<br>
-
-**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
-
-&#10230;
-
-<br>
-
-**48. Write the probability of x=As=W−1s as:**
-
-&#10230;
-
-<br>
-
-**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
-
-&#10230;
-
-<br>
-
-**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
-
-&#10230;
-
-<br>
-
-**51. The Machine Learning cheatsheets are now available in Russian.**
-
-&#10230;
-
-<br>
-
-**52. Original authors**
-
-&#10230;
-
-<br>
-
-**53. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**54. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**55. [Introduction, Motivation, Jensen's inequality]**
-
-&#10230;
-
-<br>
-
-**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
-
-&#10230;
-
-<br>
-
-**57. [Dimension reduction, PCA, ICA]**
-
-&#10230;
diff --git a/ru/refresher-linear-algebra.md b/ru/refresher-linear-algebra.md
deleted file mode 100644
index a6b440d1e..000000000
--- a/ru/refresher-linear-algebra.md
+++ /dev/null
@@ -1,339 +0,0 @@
-**1. Linear Algebra and Calculus refresher**
-
-&#10230;
-
-<br>
-
-**2. General notations**
-
-&#10230;
-
-<br>
-
-**3. Definitions**
-
-&#10230;
-
-<br>
-
-**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
-
-&#10230;
-
-<br>
-
-**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
-
-&#10230;
-
-<br>
-
-**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
-
-&#10230;
-
-<br>
-
-**7. Main matrices**
-
-&#10230;
-
-<br>
-
-**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
-
-&#10230;
-
-<br>
-
-**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we also note D as diag(d1,...,dn).**
-
-&#10230;
-
-<br>
-
-**12. Matrix operations**
-
-&#10230;
-
-<br>
-
-**13. Multiplication**
-
-&#10230;
-
-<br>
-
-**14. Vector-vector ― There are two types of vector-vector products:**
-
-&#10230;
-
-<br>
-
-**15. inner product: for x,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**16. outer product: for x∈Rm,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
-
-&#10230;
-
-<br>
-
-**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
-
-&#10230;
-
-<br>
-
-**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
-
-&#10230;
-
-<br>
-
-**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
-
-&#10230;
-
-<br>
-
-**21. Other operations**
-
-&#10230;
-
-<br>
-
-**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
-
-&#10230;
-
-<br>
-
-**23. Remark: for matrices A,B, we have (AB)T=BTAT**
-
-&#10230;
-
-<br>
-
-**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
-
-&#10230;
-
-<br>
-
-**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
-
-&#10230;
-
-<br>
-
-**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
-
-&#10230;
-
-<br>
-
-**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
-
-&#10230;
-
-<br>
-
-**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
-
-&#10230;
-
-<br>
-
-**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
-
-&#10230;
-
-<br>
-
-**30. Matrix properties**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
-
-&#10230;
-
-<br>
-
-**33. [Symmetric, Antisymmetric]**
-
-&#10230;
-
-<br>
-
-**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
-
-&#10230;
-
-<br>
-
-**35. N(ax)=|a|N(x) for a scalar**
-
-&#10230;
-
-<br>
-
-**36. if N(x)=0, then x=0**
-
-&#10230;
-
-<br>
-
-**37. For x∈V, the most commonly used norms are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**38. [Norm, Notation, Definition, Use case]**
-
-&#10230;
-
-<br>
-
-**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
-
-&#10230;
-
-<br>
-
-**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
-
-&#10230;
-
-<br>
-
-**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
-
-&#10230;
-
-<br>
-
-**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
-
-&#10230;
-
-<br>
-
-**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
-
-&#10230;
-
-<br>
-
-**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**46. diagonal**
-
-&#10230;
-
-<br>
-
-**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
-
-&#10230;
-
-<br>
-
-**48. Matrix calculus**
-
-&#10230;
-
-<br>
-
-**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
-
-&#10230;
-
-<br>
-
-**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
-
-&#10230;
-
-<br>
-
-**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
-
-&#10230;
-
-<br>
-
-**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
-
-&#10230;
-
-<br>
-
-**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
-
-&#10230;
-
-<br>
-
-**54. [General notations, Definitions, Main matrices]**
-
-&#10230;
-
-<br>
-
-**55. [Matrix operations, Multiplication, Other operations]**
-
-&#10230;
-
-<br>
-
-**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
-
-&#10230;
-
-<br>
-
-**57. [Matrix calculus, Gradient, Hessian, Operations]**
-
-&#10230;
diff --git a/ru/refresher-probability.md b/ru/refresher-probability.md
deleted file mode 100644
index 5c9b34656..000000000
--- a/ru/refresher-probability.md
+++ /dev/null
@@ -1,381 +0,0 @@
-**1. Probabilities and Statistics refresher**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Probability and Combinatorics**
-
-&#10230;
-
-<br>
-
-**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-
-<br>
-
-**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-
-<br>
-
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-
-<br>
-
-**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-
-<br>
-
-**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-
-<br>
-
-**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-
-<br>
-
-**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-
-<br>
-
-**12. Conditional Probability**
-
-&#10230;
-
-<br>
-
-**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-
-<br>
-
-**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-
-<br>
-
-**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-
-<br>
-
-**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-
-<br>
-
-**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-
-<br>
-
-**18. Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-
-<br>
-
-**19. Random Variables**
-
-&#10230;
-
-<br>
-
-**20. Definitions**
-
-&#10230;
-
-<br>
-
-**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-
-<br>
-
-**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-
-<br>
-
-**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-
-<br>
-
-**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-
-<br>
-
-**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-
-<br>
-
-**26. [Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-
-<br>
-
-**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-
-<br>
-
-**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-
-<br>
-
-**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-
-<br>
-
-**32. Probability Distributions**
-
-&#10230;
-
-<br>
-
-**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-
-<br>
-
-**34. Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-
-<br>
-
-**35. [Type, Distribution]**
-
-&#10230;
-
-<br>
-
-**36. Jointly Distributed Random Variables**
-
-&#10230;
-
-<br>
-
-**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-
-<br>
-
-**38. [Case, Marginal density, Cumulative function]**
-
-&#10230;
-
-<br>
-
-**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**40. Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-
-<br>
-
-**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-
-<br>
-
-**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-
-<br>
-
-**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-
-<br>
-
-**44. Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-
-<br>
-
-**45. Parameter estimation**
-
-&#10230;
-
-<br>
-
-**46. Definitions**
-
-&#10230;
-
-<br>
-
-**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-
-<br>
-
-**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-
-<br>
-
-**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-
-<br>
-
-**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-
-<br>
-
-**51. Estimating the mean**
-
-&#10230;
-
-<br>
-
-**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
-
-&#10230;
-
-<br>
-
-**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-
-<br>
-
-**55. Estimating the variance**
-
-&#10230;
-
-<br>
-
-**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-
-<br>
-
-**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-
-<br>
-
-**59. [Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-
-<br>
-
-**60. [Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-
-<br>
-
-**61. [Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-
-<br>
-
-**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-
-<br>
-
-**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-
-<br>
-
-**64. [Parameter estimation, Mean, Variance]**
-
-&#10230;
diff --git a/template/cheatsheet-deep-learning.md b/template/cheatsheet-deep-learning.md
deleted file mode 100644
index a5aa3756c..000000000
--- a/template/cheatsheet-deep-learning.md
+++ /dev/null
@@ -1,321 +0,0 @@
-**1. Deep Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Neural Networks**
-
-&#10230;
-
-<br>
-
-**3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
-
-&#10230;
-
-<br>
-
-**4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
-
-&#10230;
-
-<br>
-
-**5. [Input layer, hidden layer, output layer]**
-
-&#10230;
-
-<br>
-
-**6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
-
-&#10230;
-
-<br>
-
-**7. where we note w, b, z the weight, bias and output respectively.**
-
-&#10230;
-
-<br>
-
-**8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
-
-&#10230;
-
-<br>
-
-**9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
-
-&#10230;
-
-<br>
-
-**10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
-
-&#10230;
-
-<br>
-
-**12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
-
-&#10230;
-
-<br>
-
-**13. As a result, the weight is updated as follows:**
-
-&#10230;
-
-<br>
-
-**14. Updating weights ― In a neural network, weights are updated as follows:**
-
-&#10230;
-
-<br>
-
-**15. Step 1: Take a batch of training data.**
-
-&#10230;
-
-<br>
-
-**16. Step 2: Perform forward propagation to obtain the corresponding loss.**
-
-&#10230;
-
-<br>
-
-**17. Step 3: Backpropagate the loss to get the gradients.**
-
-&#10230;
-
-<br>
-
-**18. Step 4: Use the gradients to update the weights of the network.**
-
-&#10230;
-
-<br>
-
-**19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
-
-&#10230;
-
-<br>
-
-**20. Convolutional Neural Networks**
-
-&#10230;
-
-<br>
-
-**21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
-
-&#10230;
-
-<br>
-
-**22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
-
-&#10230;
-
-<br>
-
-**23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
-
-&#10230;
-
-<br>
-
-**24. Recurrent Neural Networks**
-
-&#10230;
-
-<br>
-
-**25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
-
-&#10230;
-
-<br>
-
-**26. [Input gate, forget gate, gate, output gate]**
-
-&#10230;
-
-<br>
-
-**27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
-
-&#10230;
-
-<br>
-
-**28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
-
-&#10230;
-
-<br>
-
-**29. Reinforcement Learning and Control**
-
-&#10230;
-
-<br>
-
-**30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
-
-&#10230;
-
-<br>
-
-**33. S is the set of states**
-
-&#10230;
-
-<br>
-
-**34. A is the set of actions**
-
-&#10230;
-
-<br>
-
-**35. {Psa} are the state transition probabilities for s∈S and a∈A**
-
-&#10230;
-
-<br>
-
-**36. γ∈[0,1[ is the discount factor**
-
-&#10230;
-
-<br>
-
-**37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
-
-&#10230;
-
-<br>
-
-**38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
-
-&#10230;
-
-<br>
-
-**39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
-
-&#10230;
-
-<br>
-
-**40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
-
-&#10230;
-
-<br>
-
-**41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
-
-&#10230;
-
-<br>
-
-**42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
-
-&#10230;
-
-<br>
-
-**43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
-
-&#10230;
-
-<br>
-
-**44. 1) We initialize the value:**
-
-&#10230;
-
-<br>
-
-**45. 2) We iterate the value based on the values before:**
-
-&#10230;
-
-<br>
-
-**46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
-
-&#10230;
-
-<br>
-
-**47. times took action a in state s and got to s′**
-
-&#10230;
-
-<br>
-
-**48. times took action a in state s**
-
-&#10230;
-
-<br>
-
-**49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
-
-&#10230;
-
-<br>
-
-**50. View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-**51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
-
-&#10230;
-
-<br>
-
-**52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
-
-&#10230;
-
-<br>
-
-**53. [Recurrent Neural Networks, Gates, LSTM]**
-
-&#10230;
-
-<br>
-
-**54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
-
-&#10230;
diff --git a/template/cheatsheet-machine-learning-tips-and-tricks.md b/template/cheatsheet-machine-learning-tips-and-tricks.md
deleted file mode 100644
index 9712297b8..000000000
--- a/template/cheatsheet-machine-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,285 +0,0 @@
-**1. Machine Learning tips and tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Classification metrics**
-
-&#10230;
-
-<br>
-
-**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
-
-&#10230;
-
-<br>
-
-**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**5. [Predicted class, Actual class]**
-
-&#10230;
-
-<br>
-
-**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
-
-&#10230;
-
-<br>
-
-**7. [Metric, Formula, Interpretation]**
-
-&#10230;
-
-<br>
-
-**8. Overall performance of model**
-
-&#10230;
-
-<br>
-
-**9. How accurate the positive predictions are**
-
-&#10230;
-
-<br>
-
-**10. Coverage of actual positive sample**
-
-&#10230;
-
-<br>
-
-**11. Coverage of actual negative sample**
-
-&#10230;
-
-<br>
-
-**12. Hybrid metric useful for unbalanced classes**
-
-&#10230;
-
-<br>
-
-**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**14. [Metric, Formula, Equivalent]**
-
-&#10230;
-
-<br>
-
-**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
-
-&#10230;
-
-<br>
-
-**16. [Actual, Predicted]**
-
-&#10230;
-
-<br>
-
-**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
-
-&#10230;
-
-<br>
-
-**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
-
-&#10230;
-
-<br>
-
-**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
-
-&#10230;
-
-<br>
-
-**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
-
-&#10230;
-
-<br>
-
-**22. Model selection**
-
-&#10230;
-
-<br>
-
-**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
-
-&#10230;
-
-<br>
-
-**24. [Training set, Validation set, Testing set]**
-
-&#10230;
-
-<br>
-
-**25. [Model is trained, Model is assessed, Model gives predictions]**
-
-&#10230;
-
-<br>
-
-**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
-
-&#10230;
-
-<br>
-
-**27. [Also called hold-out or development set, Unseen data]**
-
-&#10230;
-
-<br>
-
-**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
-
-&#10230;
-
-<br>
-
-**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
-
-&#10230;
-
-<br>
-
-**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
-
-&#10230;
-
-<br>
-
-**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
-
-&#10230;
-
-<br>
-
-**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
-
-&#10230;
-
-<br>
-
-**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-**35. Diagnostics**
-
-&#10230;
-
-<br>
-
-**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
-
-&#10230;
-
-<br>
-
-**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
-
-&#10230;
-
-<br>
-
-**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
-
-&#10230;
-
-<br>
-
-**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
-
-&#10230;
-
-<br>
-
-**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
-
-&#10230;
-
-<br>
-
-**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
-
-&#10230;
-
-<br>
-
-**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
-
-&#10230;
-
-<br>
-
-**44. Regression metrics**
-
-&#10230;
-
-<br>
-
-**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-**47. [Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/template/cheatsheet-supervised-learning.md b/template/cheatsheet-supervised-learning.md
deleted file mode 100644
index a6b19ea1c..000000000
--- a/template/cheatsheet-supervised-learning.md
+++ /dev/null
@@ -1,567 +0,0 @@
-**1. Supervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Supervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
-
-&#10230;
-
-<br>
-
-**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**5. [Regression, Classifier, Outcome, Examples]**
-
-&#10230;
-
-<br>
-
-**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**7. Type of model ― The different models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
-
-&#10230;
-
-<br>
-
-**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**10. Notations and general concepts**
-
-&#10230;
-
-<br>
-
-**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
-
-&#10230;
-
-<br>
-
-**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
-
-&#10230;
-
-<br>
-
-**14. [Linear regression, Logistic regression, SVM, Neural Network]**
-
-&#10230;
-
-<br>
-
-**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
-
-&#10230;
-
-<br>
-
-**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
-
-&#10230;
-
-<br>
-
-**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
-
-&#10230;
-
-<br>
-
-**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
-
-&#10230;
-
-<br>
-
-**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
-
-&#10230;
-
-<br>
-
-**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
-
-&#10230;
-
-<br>
-
-**21. Linear models**
-
-&#10230;
-
-<br>
-
-**22. Linear regression**
-
-&#10230;
-
-<br>
-
-**23. We assume here that y|x;θ∼N(μ,σ2)**
-
-&#10230;
-
-<br>
-
-**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
-
-&#10230;
-
-<br>
-
-**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
-
-&#10230;
-
-<br>
-
-**26. Remark: the update rule is a particular case of the gradient ascent.**
-
-&#10230;
-
-<br>
-
-**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
-
-&#10230;
-
-<br>
-
-**28. Classification and logistic regression**
-
-&#10230;
-
-<br>
-
-**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
-
-&#10230;
-
-<br>
-
-**31. Remark: there is no closed form solution for the case of logistic regressions.**
-
-&#10230;
-
-<br>
-
-**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
-
-&#10230;
-
-<br>
-
-**33. Generalized Linear Models**
-
-&#10230;
-
-<br>
-
-**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
-
-&#10230;
-
-<br>
-
-**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
-
-&#10230;
-
-<br>
-
-**36. Here are the most common exponential distributions summed up in the following table:**
-
-&#10230;
-
-<br>
-
-**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
-
-&#10230;
-
-<br>
-
-**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
-
-&#10230;
-
-<br>
-
-**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
-
-&#10230;
-
-<br>
-
-**40. Support Vector Machines**
-
-&#10230;
-
-<br>
-
-**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
-
-&#10230;
-
-<br>
-
-**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
-
-&#10230;
-
-<br>
-
-**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
-
-&#10230;
-
-<br>
-
-**44. such that**
-
-&#10230;
-
-<br>
-
-**45. support vectors**
-
-&#10230;
-
-<br>
-
-**46. Remark: the line is defined as wTx−b=0.**
-
-&#10230;
-
-<br>
-
-**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
-
-&#10230;
-
-<br>
-
-**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
-
-&#10230;
-
-<br>
-
-**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
-
-&#10230;
-
-<br>
-
-**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
-
-&#10230;
-
-<br>
-
-**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the coefficients βi are called the Lagrange multipliers.**
-
-&#10230;
-
-<br>
-
-**54. Generative Learning**
-
-&#10230;
-
-<br>
-
-**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
-
-&#10230;
-
-<br>
-
-**56. Gaussian Discriminant Analysis**
-
-&#10230;
-
-<br>
-
-**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
-
-&#10230;
-
-<br>
-
-**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
-
-&#10230;
-
-<br>
-
-**59. Naive Bayes**
-
-&#10230;
-
-<br>
-
-**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
-
-&#10230;
-
-<br>
-
-**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
-
-&#10230;
-
-<br>
-
-**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
-
-&#10230;
-
-<br>
-
-**63. Tree-based and ensemble methods**
-
-&#10230;
-
-<br>
-
-**64. These methods can be used for both regression and classification problems.**
-
-&#10230;
-
-<br>
-
-**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
-
-&#10230;
-
-<br>
-
-**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
-
-&#10230;
-
-<br>
-
-**67. Remark: random forests are a type of ensemble methods.**
-
-&#10230;
-
-<br>
-
-**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**69. [Adaptive boosting, Gradient boosting]**
-
-&#10230;
-
-<br>
-
-**70. High weights are put on errors to improve at the next boosting step**
-
-&#10230;
-
-<br>
-
-**71. Weak learners trained on remaining errors**
-
-&#10230;
-
-<br>
-
-**72. Other non-parametric approaches**
-
-&#10230;
-
-<br>
-
-**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
-
-&#10230;
-
-<br>
-
-**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**75. Learning Theory**
-
-&#10230;
-
-<br>
-
-**76. Union bound ― Let A1,...,Ak be k events. We have:**
-
-&#10230;
-
-<br>
-
-**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
-
-&#10230;
-
-<br>
-
-**78. Remark: this inequality is also known as the Chernoff bound.**
-
-&#10230;
-
-<br>
-
-**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
-
-&#10230;
-
-<br>
-
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
-
-&#10230;
-
-<br>
-
-**81: the training and testing sets follow the same distribution **
-
-&#10230;
-
-<br>
-
-**82. the training examples are drawn independently**
-
-&#10230;
-
-<br>
-
-**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
-
-&#10230;
-
-<br>
-
-**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
-
-&#10230;
-
-<br>
-
-**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
-
-&#10230;
-
-<br>
-
-**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**88. [Introduction, Type of prediction, Type of model]**
-
-&#10230;
-
-<br>
-
-**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
-
-&#10230;
-
-<br>
-
-**90. [Linear models, linear regression, logistic regression, generalized linear models]**
-
-&#10230;
-
-<br>
-
-**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
-
-&#10230;
-
-<br>
-
-**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
-
-&#10230;
-
-<br>
-
-**94. [Other methods, k-NN]**
-
-&#10230;
-
-<br>
-
-**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
-
-&#10230;
diff --git a/template/cheatsheet-unsupervised-learning.md b/template/cheatsheet-unsupervised-learning.md
deleted file mode 100644
index 827d815a3..000000000
--- a/template/cheatsheet-unsupervised-learning.md
+++ /dev/null
@@ -1,340 +0,0 @@
-**1. Unsupervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Unsupervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
-
-&#10230;
-
-<br>
-
-**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
-
-&#10230;
-
-<br>
-
-**5. Clustering**
-
-&#10230;
-
-<br>
-
-**6. Expectation-Maximization**
-
-&#10230;
-
-<br>
-
-**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
-
-&#10230;
-
-<br>
-
-**8. [Setting, Latent variable z, Comments]**
-
-&#10230;
-
-<br>
-
-**9. [Mixture of k Gaussians, Factor analysis]**
-
-&#10230;
-
-<br>
-
-**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
-
-&#10230;
-
-<br>
-
-**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
-
-&#10230;
-
-<br>
-
-**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
-
-&#10230;
-
-<br>
-
-**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
-
-&#10230;
-
-<br>
-
-**14. k-means clustering**
-
-&#10230;
-
-<br>
-
-**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
-
-&#10230;
-
-<br>
-
-**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
-
-&#10230;
-
-<br>
-
-**17. [Means initialization, Cluster assignment, Means update, Convergence]**
-
-&#10230;
-
-<br>
-
-**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
-
-&#10230;
-
-<br>
-
-**19. Hierarchical clustering**
-
-&#10230;
-
-<br>
-
-**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
-
-&#10230;
-
-<br>
-
-**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**22. [Ward linkage, Average linkage, Complete linkage]**
-
-&#10230;
-
-<br>
-
-**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
-
-&#10230;
-
-<br>
-
-**24. Clustering assessment metrics**
-
-&#10230;
-
-<br>
-
-**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
-
-&#10230;
-
-<br>
-
-**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
-
-&#10230;
-
-<br>
-
-**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
-
-&#10230;
-
-<br>
-
-**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Dimension reduction**
-
-&#10230;
-
-<br>
-
-**30. Principal component analysis**
-
-&#10230;
-
-<br>
-
-**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
-
-&#10230;
-
-<br>
-
-**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**34. diagonal**
-
-&#10230;
-
-<br>
-
-**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
-
-&#10230;
-
-<br>
-
-**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
-dimensions by maximizing the variance of the data as follows:**
-
-&#10230;
-
-<br>
-
-**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
-
-&#10230;
-
-<br>
-
-**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
-
-&#10230;
-
-<br>
-
-**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
-
-&#10230;
-
-<br>
-
-**40. Step 4: Project the data on spanR(u1,...,uk).**
-
-&#10230;
-
-<br>
-
-**41. This procedure maximizes the variance among all k-dimensional spaces.**
-
-&#10230;
-
-<br>
-
-**42. [Data in feature space, Find principal components, Data in principal components space]**
-
-&#10230;
-
-<br>
-
-**43. Independent component analysis**
-
-&#10230;
-
-<br>
-
-**44. It is a technique meant to find the underlying generating sources.**
-
-&#10230;
-
-<br>
-
-**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
-
-&#10230;
-
-<br>
-
-**46. The goal is to find the unmixing matrix W=A−1.**
-
-&#10230;
-
-<br>
-
-**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
-
-&#10230;
-
-<br>
-
-**48. Write the probability of x=As=W−1s as:**
-
-&#10230;
-
-<br>
-
-**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
-
-&#10230;
-
-<br>
-
-**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
-
-&#10230;
-
-<br>
-
-**51. The Machine Learning cheatsheets are now available in Japanese.**
-
-&#10230;
-
-<br>
-
-**52. Original authors**
-
-&#10230;
-
-<br>
-
-**53. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**54. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**55. [Introduction, Motivation, Jensen's inequality]**
-
-&#10230;
-
-<br>
-
-**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
-
-&#10230;
-
-<br>
-
-**57. [Dimension reduction, PCA, ICA]**
-
-&#10230;
diff --git a/template/cs-221-logic-models.md b/template/cs-221-logic-models.md
new file mode 100644
index 000000000..8be03acc4
--- /dev/null
+++ b/template/cs-221-logic-models.md
@@ -0,0 +1,462 @@
+**Logic-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-logic-models)
+
+<br>
+
+**1. Logic-based models with propositional and first-order logic**
+
+&#10230;
+
+<br>
+
+
+**2. Basics**
+
+&#10230;
+
+<br>
+
+
+**3. Syntax of propositional logic ― By noting f,g formulas, and ¬,∧,∨,→,↔ connectives, we can write the following logical expressions:**
+
+&#10230;
+
+<br>
+
+
+**4. [Name, Symbol, Meaning, Illustration]**
+
+&#10230;
+
+<br>
+
+
+**5. [Affirmation, Negation, Conjunction, Disjunction, Implication, Biconditional]**
+
+&#10230;
+
+<br>
+
+
+**6. [not f, f and g, f or g, if f then g, f, that is to say g]**
+
+&#10230;
+
+<br>
+
+
+**7. Remark: formulas can be built up recursively out of these connectives.**
+
+&#10230;
+
+<br>
+
+
+**8. Model ― A model w denotes an assignment of binary weights to propositional symbols.**
+
+&#10230;
+
+<br>
+
+
+**9. Example: the set of truth values w={A:0,B:1,C:0} is one possible model to the propositional symbols A, B and C.**
+
+&#10230;
+
+<br>
+
+
+**10. Interpretation function ― The interpretation function I(f,w) outputs whether model w satisfies formula f:**
+
+&#10230;
+
+<br>
+
+
+**11. Set of models ― M(f) denotes the set of models w that satisfy formula f. Mathematically speaking, we define it as follows:**
+
+&#10230;
+
+<br>
+
+
+**12. Knowledge base**
+
+&#10230;
+
+<br>
+
+
+**13. Definition ― The knowledge base KB is the conjunction of all formulas that have been considered so far. The set of models of the knowledge base is the intersection of the set of models that satisfy each formula. In other words:**
+
+&#10230;
+
+<br>
+
+
+**14. Probabilistic interpretation ― The probability that query f is evaluated to 1 can be seen as the proportion of models w of the knowledge base KB that satisfy f, i.e.:**
+
+&#10230;
+
+<br>
+
+
+**15. Satisfiability ― The knowledge base KB is said to be satisfiable if at least one model w satisfies all its constraints. In other words:**
+
+&#10230;
+
+<br>
+
+
+**16. satisfiable**
+
+&#10230;
+
+<br>
+
+
+**17. Remark: M(KB) denotes the set of models compatible with all the constraints of the knowledge base.**
+
+&#10230;
+
+<br>
+
+
+**18. Relation between formulas and knowledge base - We define the following properties between the knowledge base KB and a new formula f:**
+
+&#10230;
+
+<br>
+
+
+**19. [Name, Mathematical formulation, Illustration, Notes]**
+
+&#10230;
+
+<br>
+
+
+**20. [KB entails f, KB contradicts f, f contingent to KB]**
+
+&#10230;
+
+<br>
+
+
+**21. [f does not bring any new information, Also written KB⊨f, No model satisfies the constraints after adding f, Equivalent to KB⊨¬f, f does not contradict KB, f adds a non-trivial amount of information to KB]**
+
+&#10230;
+
+<br>
+
+
+**22. Model checking ― A model checking algorithm takes as input a knowledge base KB and outputs whether it is satisfiable or not.**
+
+&#10230;
+
+<br>
+
+
+**23. Remark: popular model checking algorithms include DPLL and WalkSat.**
+
+&#10230;
+
+<br>
+
+
+**24. Inference rule ― An inference rule of premises f1,...,fk and conclusion g is written:**
+
+&#10230;
+
+<br>
+
+
+**25. Forward inference algorithm ― From a set of inference rules Rules, this algorithm goes through all possible f1,...,fk and adds g to the knowledge base KB if a matching rule exists. This process is repeated until no more additions can be made to KB.**
+
+&#10230;
+
+<br>
+
+
+**26. Derivation ― We say that KB derives f (written KB⊢f) with rules Rules if f already is in KB or gets added during the forward inference algorithm using the set of rules Rules.**
+
+&#10230;
+
+<br>
+
+
+**27. Properties of inference rules ― A set of inference rules Rules can have the following properties:**
+
+&#10230;
+
+<br>
+
+
+**28. [Name, Mathematical formulation, Notes]**
+
+&#10230;
+
+<br>
+
+
+**29. [Soundness, Completeness]**
+
+&#10230;
+
+<br>
+
+
+**30. [Inferred formulas are entailed by KB, Can be checked one rule at a time, "Nothing but the truth", Formulas entailing KB are either already in the knowledge base or inferred from it, "The whole truth"]**
+
+&#10230;
+
+<br>
+
+
+**31. Propositional logic**
+
+&#10230;
+
+<br>
+
+
+**32. In this section, we will go through logic-based models that use logical formulas and inference rules. The idea here is to balance expressivity and computational efficiency.**
+
+&#10230;
+
+<br>
+
+
+**33. Horn clause ― By noting p1,...,pk and q propositional symbols, a Horn clause has the form:**
+
+&#10230;
+
+<br>
+
+
+**34. Remark: when q=false, it is called a "goal clause", otherwise we denote it as a "definite clause".**
+
+&#10230;
+
+<br>
+
+
+**35. Modus ponens ― For propositional symbols f1,...,fk and p, the modus ponens rule is written:**
+
+&#10230;
+
+<br>
+
+
+**36. Remark: it takes linear time to apply this rule, as each application generate a clause that contains a single propositional symbol.**
+
+&#10230;
+
+<br>
+
+
+**37. Completeness ― Modus ponens is complete with respect to Horn clauses if we suppose that KB contains only Horn clauses and p is an entailed propositional symbol. Applying modus ponens will then derive p.**
+
+&#10230;
+
+<br>
+
+
+**38. Conjunctive normal form ― A conjunctive normal form (CNF) formula is a conjunction of clauses, where each clause is a disjunction of atomic formulas.**
+
+&#10230;
+
+<br>
+
+
+**39. Remark: in other words, CNFs are ∧ of ∨.**
+
+&#10230;
+
+<br>
+
+
+**40. Equivalent representation ― Every formula in propositional logic can be written into an equivalent CNF formula. The table below presents general conversion properties:**
+
+&#10230;
+
+<br>
+
+
+**41. [Rule name, Initial, Converted, Eliminate, Distribute, over]**
+
+&#10230;
+
+<br>
+
+
+**42. Resolution rule ― For propositional symbols f1,...,fn, and g1,...,gm as well as p, the resolution rule is written:**
+
+&#10230;
+
+<br>
+
+
+**43. Remark: it can take exponential time to apply this rule, as each application generates a clause that has a subset of the propositional symbols.**
+
+&#10230;
+
+<br>
+
+
+**44. [Resolution-based inference ― The resolution-based inference algorithm follows the following steps:, Step 1: Convert all formulas into CNF, Step 2: Repeatedly apply resolution rule, Step 3: Return unsatisfiable if and only if False, is derived]**
+
+&#10230;
+
+<br>
+
+
+**45. First-order logic**
+
+&#10230;
+
+<br>
+
+
+**46. The idea here is to use variables to yield more compact knowledge representations.**
+
+&#10230;
+
+<br>
+
+
+**47. [Model ― A model w in first-order logic maps:, constant symbols to objects, predicate symbols to tuple of objects]**
+
+&#10230;
+
+<br>
+
+
+**48. Horn clause ― By noting x1,...,xn variables and a1,...,ak,b atomic formulas, the first-order logic version of a horn clause has the form:**
+
+&#10230;
+
+<br>
+
+
+**49. Substitution ― A substitution θ maps variables to terms and Subst[θ,f] denotes the result of substitution θ on f.**
+
+&#10230;
+
+<br>
+
+
+**50. Unification ― Unification takes two formulas f and g and returns the most general substitution θ that makes them equal:**
+
+&#10230;
+
+<br>
+
+
+**51. such that**
+
+&#10230;
+
+<br>
+
+
+**52. Note: Unify[f,g] returns Fail if no such θ exists.**
+
+&#10230;
+
+<br>
+
+
+**53. Modus ponens ― By noting x1,...,xn variables, a1,...,ak and a′1,...,a′k atomic formulas and by calling θ=Unify(a′1∧...∧a′k,a1∧...∧ak) the first-order logic version of modus ponens can be written:**
+
+&#10230;
+
+<br>
+
+
+**54. Completeness ― Modus ponens is complete for first-order logic with only Horn clauses.**
+
+&#10230;
+
+<br>
+
+
+**55. Resolution rule ― By noting f1,...,fn, g1,...,gm, p, q formulas and by calling θ=Unify(p,q), the first-order logic version of the resolution rule can be written:**
+
+&#10230;
+
+<br>
+
+
+**56. [Semi-decidability ― First-order logic, even restricted to only Horn clauses, is semi-decidable., if KB⊨f, forward inference on complete inference rules will prove f in finite time, if KB⊭f, no algorithm can show this in finite time]**
+
+&#10230;
+
+<br>
+
+
+**57. [Basics, Notations, Model, Interpretation function, Set of models]**
+
+&#10230;
+
+<br>
+
+
+**58. [Knowledge base, Definition, Probabilistic interpretation, Satisfiability, Relationship with formulas, Forward inference, Rule properties]**
+
+&#10230;
+
+<br>
+
+
+**59. [Propositional logic, Clauses, Modus ponens, Conjunctive normal form, Representation equivalence, Resolution]**
+
+&#10230;
+
+<br>
+
+
+**60. [First-order logic, Substitution, Unification, Resolution rule, Modus ponens, Resolution, Semi-decidability]**
+
+&#10230;
+
+<br>
+
+
+**61. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**62. Original authors**
+
+&#10230;
+
+<br>
+
+
+**63. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**64. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**65. By X and Y**
+
+&#10230;
+
+<br>
+
+
+**66. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230;
diff --git a/template/cs-221-reflex-models.md b/template/cs-221-reflex-models.md
new file mode 100644
index 000000000..f64a380b0
--- /dev/null
+++ b/template/cs-221-reflex-models.md
@@ -0,0 +1,539 @@
+**Reflex-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-reflex-models)
+
+<br>
+
+**1. Reflex-based models with Machine Learning**
+
+&#10230;
+
+<br>
+
+
+**2. Linear predictors**
+
+&#10230;
+
+<br>
+
+
+**3. In this section, we will go through reflex-based models that can improve with experience, by going through samples that have input-output pairs.**
+
+&#10230;
+
+<br>
+
+
+**4. Feature vector ― The feature vector of an input x is noted ϕ(x) and is such that:**
+
+&#10230;
+
+<br>
+
+
+**5. Score ― The score s(x,w) of an example (ϕ(x),y)∈Rd×R associated to a linear model of weights w∈Rd is given by the inner product:**
+
+&#10230;
+
+<br>
+
+
+**6. Classification**
+
+&#10230;
+
+<br>
+
+
+**7. Linear classifier ― Given a weight vector w∈Rd and a feature vector ϕ(x)∈Rd, the binary linear classifier fw is given by:**
+
+&#10230;
+
+<br>
+
+
+**8. if**
+
+&#10230;
+
+<br>
+
+
+**9. Margin ― The margin m(x,y,w)∈R of an example (ϕ(x),y)∈Rd×{−1,+1} associated to a linear model of weights w∈Rd quantifies the confidence of the prediction: larger values are better. It is given by:**
+
+&#10230;
+
+<br>
+
+
+**10. Regression**
+
+&#10230;
+
+<br>
+
+
+**11. Linear regression ― Given a weight vector w∈Rd and a feature vector ϕ(x)∈Rd, the output of a linear regression of weights w denoted as fw is given by:**
+
+&#10230;
+
+<br>
+
+
+**12. Residual ― The residual res(x,y,w)∈R is defined as being the amount by which the prediction fw(x) overshoots the target y:**
+
+&#10230;
+
+<br>
+
+
+**13. Loss minimization**
+
+&#10230;
+
+<br>
+
+
+**14. Loss function ― A loss function Loss(x,y,w) quantifies how unhappy we are with the weights w of the model in the prediction task of output y from input x. It is a quantity we want to minimize during the training process.**
+
+&#10230;
+
+<br>
+
+
+**15. Classification case - The classification of a sample x of true label y∈{−1,+1} with a linear model of weights w can be done with the predictor fw(x)≜sign(s(x,w)). In this situation, a metric of interest quantifying the quality of the classification is given by the margin m(x,y,w), and can be used with the following loss functions:**
+
+&#10230;
+
+<br>
+
+
+**16. [Name, Illustration, Zero-one loss, Hinge loss, Logistic loss]**
+
+&#10230;
+
+<br>
+
+
+**17. Regression case - The prediction of a sample x of true label y∈R with a linear model of weights w can be done with the predictor fw(x)≜s(x,w). In this situation, a metric of interest quantifying the quality of the regression is given by the margin res(x,y,w) and can be used with the following loss functions:**
+
+&#10230;
+
+<br>
+
+
+**18. [Name, Squared loss, Absolute deviation loss, Illustration]**
+
+&#10230;
+
+<br>
+
+
+**19. Loss minimization framework ― In order to train a model, we want to minimize the training loss is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**20. Non-linear predictors**
+
+&#10230;
+
+<br>
+
+
+**21. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
+
+&#10230;
+
+<br>
+
+
+**22. Remark: the higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
+
+&#10230;
+
+<br>
+
+
+**23. Neural networks ― Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks. The vocabulary around neural networks architectures is described in the figure below:**
+
+&#10230;
+
+<br>
+
+
+**24. [Input layer, Hidden layer, Output layer]**
+
+&#10230;
+
+<br>
+
+
+**25. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
+
+&#10230;
+
+<br>
+
+
+**26. where we note w, b, x, z the weight, bias, input and non-activated output of the neuron respectively.**
+
+&#10230;
+
+<br>
+
+
+**27. For a more detailed overview of the concepts above, check out the Supervised Learning cheatsheets!**
+
+&#10230;
+
+<br>
+
+
+**28. Stochastic gradient descent**
+
+&#10230;
+
+<br>
+
+
+**29. Gradient descent ― By noting η∈R the learning rate (also called step size), the update rule for gradient descent is expressed with the learning rate and the loss function Loss(x,y,w) as follows:**
+
+&#10230;
+
+<br>
+
+
+**30. Stochastic updates ― Stochastic gradient descent (SGD) updates the parameters of the model one training example (ϕ(x),y)∈Dtrain at a time. This method leads to sometimes noisy, but fast updates.**
+
+&#10230;
+
+<br>
+
+
+**31. Batch updates ― Batch gradient descent (BGD) updates the parameters of the model one batch of examples (e.g. the entire training set) at a time. This method computes stable update directions, at a greater computational cost.**
+
+&#10230;
+
+<br>
+
+
+**32. Fine-tuning models**
+
+&#10230;
+
+<br>
+
+
+**33. Hypothesis class ― A hypothesis class F is the set of possible predictors with a fixed ϕ(x) and varying w:**
+
+&#10230;
+
+<br>
+
+
+**34. Logistic function ― The logistic function σ, also called the sigmoid function, is defined as:**
+
+&#10230;
+
+<br>
+
+
+**35. Remark: we have σ′(z)=σ(z)(1−σ(z)).**
+
+&#10230;
+
+<br>
+
+
+**36. Backpropagation ― The forward pass is done through fi, which is the value for the subexpression rooted at i, while the backward pass is done through gi=∂out∂fi and represents how fi influences the output.**
+
+&#10230;
+
+<br>
+
+
+**37. Approximation and estimation error ― The approximation error ϵapprox represents how far the entire hypothesis class F is from the target predictor g∗, while the estimation error ϵest quantifies how good the predictor ^f is with respect to the best predictor f∗ of the hypothesis class F.**
+
+&#10230;
+
+<br>
+
+
+**38. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230;
+
+<br>
+
+
+**39. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230;
+
+<br>
+
+
+**40. Hyperparameters ― Hyperparameters are the properties of the learning algorithm, and include features, regularization parameter λ, number of iterations T, step size η, etc.**
+
+&#10230;
+
+<br>
+
+
+**41. Sets vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230;
+
+<br>
+
+
+**42. [Training set, Validation set, Testing set]**
+
+&#10230;
+
+<br>
+
+
+**43. [Model is trained, Usually 80% of the dataset, Model is assessed, Usually 20% of the dataset, Also called hold-out or development set, Model gives predictions, Unseen data]**
+
+&#10230;
+
+<br>
+
+
+**44. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+&#10230;
+
+<br>
+
+
+**45. [Dataset, Unseen data, train, validation, test]**
+
+&#10230;
+
+<br>
+
+
+**46. For a more detailed overview of the concepts above, check out the Machine Learning tips and tricks cheatsheets!**
+
+&#10230;
+
+<br>
+
+
+**47. Unsupervised Learning**
+
+&#10230;
+
+<br>
+
+
+**48. The class of unsupervised learning methods aims at discovering the structure of the data, which may have of rich latent structures.**
+
+&#10230;
+
+<br>
+
+
+**49. k-means**
+
+&#10230;
+
+<br>
+
+
+**50. Clustering ― Given a training set of input points Dtrain, the goal of a clustering algorithm is to assign each point ϕ(xi) to a cluster zi∈{1,...,k}**
+
+&#10230;
+
+<br>
+
+
+**51. Objective function ― The loss function for one of the main clustering algorithms, k-means, is given by:**
+
+&#10230;
+
+<br>
+
+
+**52. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230;
+
+<br>
+
+
+**53. and**
+
+&#10230;
+
+<br>
+
+
+**54. [Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230;
+
+<br>
+
+
+**55. Principal Component Analysis**
+
+&#10230;
+
+<br>
+
+
+**56. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+
+<br>
+
+
+**57. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+
+<br>
+
+
+**58. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230;
+
+<br>
+
+
+**59. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:**
+
+&#10230;
+
+<br>
+
+
+**60. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230;
+
+<br>
+
+
+**61. [where, and]**
+
+&#10230;
+
+<br>
+
+
+**62. [Step 2: Compute Σ=1mm∑i=1ϕ(xi)ϕ(xi)T∈Rn×n, which is symmetric with real eigenvalues., Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues., Step 4: Project the data on spanR(u1,...,uk).]**
+
+&#10230;
+
+<br>
+
+
+**63. This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230;
+
+<br>
+
+
+**64. [Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230;
+
+<br>
+
+
+**65. For a more detailed overview of the concepts above, check out the Unsupervised Learning cheatsheets!**
+
+&#10230;
+
+<br>
+
+
+**66. [Linear predictors, Feature vector, Linear classifier/regression, Margin]**
+
+&#10230;
+
+<br>
+
+
+**67. [Loss minimization, Loss function, Framework]**
+
+&#10230;
+
+<br>
+
+
+**68. [Non-linear predictors, k-nearest neighbors, Neural networks]**
+
+&#10230;
+
+<br>
+
+
+**69. [Stochastic gradient descent, Gradient, Stochastic updates, Batch updates]**
+
+&#10230;
+
+<br>
+
+
+**70. [Fine-tuning models, Hypothesis class, Backpropagation, Regularization, Sets vocabulary]**
+
+&#10230;
+
+<br>
+
+
+**71. [Unsupervised Learning, k-means, Principal components analysis]**
+
+&#10230;
+
+<br>
+
+
+**72. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**73. Original authors**
+
+&#10230;
+
+<br>
+
+
+**74. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**75. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**76. By X and Y**
+
+&#10230;
+
+<br>
+
+
+**77. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230;
diff --git a/template/cs-221-states-models.md b/template/cs-221-states-models.md
new file mode 100644
index 000000000..e21270f89
--- /dev/null
+++ b/template/cs-221-states-models.md
@@ -0,0 +1,980 @@
+**States-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-states-models)
+
+<br>
+
+**1. States-based models with search optimization and MDP**
+
+&#10230;
+
+<br>
+
+
+**2. Search optimization**
+
+&#10230;
+
+<br>
+
+
+**3. In this section, we assume that by accomplishing action a from state s, we deterministically arrive in state Succ(s,a). The goal here is to determine a sequence of actions (a1,a2,a3,a4,...) that starts from an initial state and leads to an end state. In order to solve this kind of problem, our objective will be to find the minimum cost path by using states-based models.**
+
+&#10230;
+
+<br>
+
+
+**4. Tree search**
+
+&#10230;
+
+<br>
+
+
+**5. This category of states-based algorithms explores all possible states and actions. It is quite memory efficient, and is suitable for huge state spaces but the runtime can become exponential in the worst cases.**
+
+&#10230;
+
+<br>
+
+
+**6. [Self-loop, More than a parent, Cycle, More than a root, Valid tree]**
+
+&#10230;
+
+<br>
+
+
+**7. [Search problem ― A search problem is defined with:, a starting state sstart, possible actions Actions(s) from state s, action cost Cost(s,a) from state s with action a, successor Succ(s,a) of state s after action a, whether an end state was reached IsEnd(s)]**
+
+&#10230;
+
+<br>
+
+
+**8. The objective is to find a path that minimizes the cost.**
+
+&#10230;
+
+<br>
+
+
+**9. Backtracking search ― Backtracking search is a naive recursive algorithm that tries all possibilities to find the minimum cost path. Here, action costs can be either positive or negative.**
+
+&#10230;
+
+<br>
+
+
+**10. Breadth-first search (BFS) ― Breadth-first search is a graph search algorithm that does a level-by-level traversal. We can implement it iteratively with the help of a queue that stores at each step future nodes to be visited. For this algorithm, we can assume action costs to be equal to a constant c⩾0.**
+
+&#10230;
+
+<br>
+
+
+**11. Depth-first search (DFS) ― Depth-first search is a search algorithm that traverses a graph by following each path as deep as it can. We can implement it recursively, or iteratively with the help of a stack that stores at each step future nodes to be visited. For this algorithm, action costs are assumed to be equal to 0.**
+
+&#10230;
+
+<br>
+
+
+**12. Iterative deepening ― The iterative deepening trick is a modification of the depth-first search algorithm so that it stops after reaching a certain depth, which guarantees optimality when all action costs are equal. Here, we assume that action costs are equal to a constant c⩾0.**
+
+&#10230;
+
+<br>
+
+
+**13. Tree search algorithms summary ― By noting b the number of actions per state, d the solution depth, and D the maximum depth, we have:**
+
+&#10230;
+
+<br>
+
+
+**14. [Algorithm, Action costs, Space, Time]**
+
+&#10230;
+
+<br>
+
+
+**15. [Backtracking search, any, Breadth-first search, Depth-first search, DFS-Iterative deepening]**
+
+&#10230;
+
+<br>
+
+
+**16. Graph search**
+
+&#10230;
+
+<br>
+
+
+**17. This category of states-based algorithms aims at constructing optimal paths, enabling exponential savings. In this section, we will focus on dynamic programming and uniform cost search.**
+
+&#10230;
+
+<br>
+
+
+**18. Graph ― A graph is comprised of a set of vertices V (also called nodes) as well as a set of edges E (also called links).**
+
+&#10230;
+
+<br>
+
+
+**19. Remark: a graph is said to be acylic when there is no cycle.**
+
+&#10230;
+
+<br>
+
+
+**20. State ― A state is a summary of all past actions sufficient to choose future actions optimally.**
+
+&#10230;
+
+<br>
+
+
+**21. Dynamic programming ― Dynamic programming (DP) is a backtracking search algorithm with memoization (i.e. partial results are saved) whose goal is to find a minimum cost path from state s to an end state send. It can potentially have exponential savings compared to traditional graph search algorithms, and has the property to only work for acyclic graphs. For any given state s, the future cost is computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**22. [if, otherwise]**
+
+&#10230;
+
+<br>
+
+
+**23. Remark: the figure above illustrates a bottom-to-top approach whereas the formula provides the intuition of a top-to-bottom problem resolution.**
+
+&#10230;
+
+<br>
+
+
+**24. Types of states ― The table below presents the terminology when it comes to states in the context of uniform cost search:**
+
+&#10230;
+
+<br>
+
+
+**25. [State, Explanation]**
+
+&#10230;
+
+<br>
+
+
+**26. [Explored, Frontier, Unexplored]**
+
+&#10230;
+
+<br>
+
+
+**27. [States for which the optimal path has already been found, States seen for which we are still figuring out how to get there with the cheapest cost, States not seen yet]**
+
+&#10230;
+
+<br>
+
+
+**28. Uniform cost search ― Uniform cost search (UCS) is a search algorithm that aims at finding the shortest path from a state sstart to an end state send. It explores states s in increasing order of PastCost(s) and relies on the fact that all action costs are non-negative.**
+
+&#10230;
+
+<br>
+
+
+**29. Remark 1: the UCS algorithm is logically equivalent to Dijkstra's algorithm.**
+
+&#10230;
+
+<br>
+
+
+**30. Remark 2: the algorithm would not work for a problem with negative action costs, and adding a positive constant to make them non-negative would not solve the problem since this would end up being a different problem.**
+
+&#10230;
+
+<br>
+
+
+**31. Correctness theorem ― When a state s is popped from the frontier F and moved to explored set E, its priority is equal to PastCost(s) which is the minimum cost path from sstart to s.**
+
+&#10230;
+
+<br>
+
+
+**32. Graph search algorithms summary ― By noting N the number of total states, n of which are explored before the end state send, we have:**
+
+&#10230;
+
+<br>
+
+
+**33. [Algorithm, Acyclicity, Costs, Time/space]**
+
+&#10230;
+
+<br>
+
+
+**34. [Dynamic programming, Uniform cost search]**
+
+&#10230;
+
+<br>
+
+
+**35. Remark: the complexity countdown supposes the number of possible actions per state to be constant.**
+
+&#10230;
+
+<br>
+
+
+**36. Learning costs**
+
+&#10230;
+
+<br>
+
+
+**37. Suppose we are not given the values of Cost(s,a), we want to estimate these quantities from a training set of minimizing-cost-path sequence of actions (a1,a2,...,ak).**
+
+&#10230;
+
+<br>
+
+
+**38. [Structured perceptron ― The structured perceptron is an algorithm aiming at iteratively learning the cost of each state-action pair. At each step, it:, decreases the estimated cost of each state-action of the true minimizing path y given by the training data, increases the estimated cost of each state-action of the current predicted path y' inferred from the learned weights.]**
+
+&#10230;
+
+<br>
+
+
+**39. Remark: there are several versions of the algorithm, one of which simplifies the problem to only learning the cost of each action a, and the other parametrizes Cost(s,a) to a feature vector of learnable weights.**
+
+&#10230;
+
+<br>
+
+
+**40. A* search**
+
+&#10230;
+
+<br>
+
+
+**41. Heuristic function ― A heuristic is a function h over states s, where each h(s) aims at estimating FutureCost(s), the cost of the path from s to send.**
+
+&#10230;
+
+<br>
+
+
+**42. Algorithm ― A∗ is a search algorithm that aims at finding the shortest path from a state s to an end state send. It explores states s in increasing order of PastCost(s)+h(s). It is equivalent to a uniform cost search with edge costs Cost′(s,a) given by:**
+
+&#10230;
+
+<br>
+
+
+**43. Remark: this algorithm can be seen as a biased version of UCS exploring states estimated to be closer to the end state.**
+
+&#10230;
+
+<br>
+
+
+**44. [Consistency ― A heuristic h is said to be consistent if it satisfies the two following properties:, For all states s and actions a, The end state verifies the following:]**
+
+&#10230;
+
+<br>
+
+
+**45. Correctness ― If h is consistent, then A∗ returns the minimum cost path.**
+
+&#10230;
+
+<br>
+
+
+**46. Admissibility ― A heuristic h is said to be admissible if we have:**
+
+&#10230;
+
+<br>
+
+
+**47. Theorem ― Let h(s) be a given heuristic. We have:**
+
+&#10230;
+
+<br>
+
+
+**48. [consistent, admissible]**
+
+&#10230;
+
+<br>
+
+
+**49. Efficiency ― A* explores all states s satisfying the following equation:**
+
+&#10230;
+
+<br>
+
+
+**50. Remark: larger values of h(s) is better as this equation shows it will restrict the set of states s going to be explored.**
+
+&#10230;
+
+<br>
+
+
+**51. Relaxation**
+
+&#10230;
+
+<br>
+
+
+**52. It is a framework for producing consistent heuristics. The idea is to find closed-form reduced costs by removing constraints and use them as heuristics.**
+
+&#10230;
+
+<br>
+
+
+**53. Relaxed search problem ― The relaxation of search problem P with costs Cost is noted Prel with costs Costrel, and satisfies the identity:**
+
+&#10230;
+
+<br>
+
+
+**54. Relaxed heuristic ― Given a relaxed search problem Prel, we define the relaxed heuristic h(s)=FutureCostrel(s) as the minimum cost path from s to an end state in the graph of costs Costrel(s,a).**
+
+&#10230;
+
+<br>
+
+
+**55. Consistency of relaxed heuristics ― Let Prel be a given relaxed problem. By theorem, we have:**
+
+&#10230;
+
+<br>
+
+
+**56. consistent**
+
+&#10230;
+
+<br>
+
+
+**57. [Tradeoff when choosing heuristic ― We have to balance two aspects in choosing a heuristic:, Computational efficiency: h(s)=FutureCostrel(s) must be easy to compute. It has to produce a closed form, easier search and independent subproblems., Good enough approximation: the heuristic h(s) should be close to FutureCost(s) and we have thus to not remove too many constraints.]**
+
+&#10230;
+
+<br>
+
+
+**58. Max heuristic ― Let h1(s), h2(s) be two heuristics. We have the following property:**
+
+&#10230;
+
+<br>
+
+
+**59. Markov decision processes**
+
+&#10230;
+
+<br>
+
+
+**60. In this section, we assume that performing action a from state s can lead to several states s′1,s′2,... in a probabilistic manner. In order to find our way between an initial state and an end state, our objective will be to find the maximum value policy by using Markov decision processes that help us cope with randomness and uncertainty.**
+
+&#10230;
+
+<br>
+
+
+**61. Notations**
+
+&#10230;
+
+<br>
+
+
+**62. [Definition ― The objective of a Markov decision process is to maximize rewards. It is defined with:, a starting state sstart, possible actions Actions(s) from state s, transition probabilities T(s,a,s′) from s to s′ with action a, rewards Reward(s,a,s′) from s to s′ with action a, whether an end state was reached IsEnd(s), a discount factor 0⩽γ⩽1]**
+
+&#10230;
+
+<br>
+
+
+**63. Transition probabilities ― The transition probability T(s,a,s′) specifies the probability of going to state s′ after action a is taken in state s. Each s′↦T(s,a,s′) is a probability distribution, which means that:**
+
+&#10230;
+
+<br>
+
+
+**64. states**
+
+&#10230;
+
+<br>
+
+
+**65. Policy ― A policy π is a function that maps each state s to an action a, i.e.**
+
+&#10230;
+
+<br>
+
+
+**66. Utility ― The utility of a path (s0,...,sk) is the discounted sum of the rewards on that path. In other words,**
+
+&#10230;
+
+<br>
+
+
+**67. The figure above is an illustration of the case k=4.**
+
+&#10230;
+
+<br>
+
+
+**68. Q-value ― The Q-value of a policy π at state s with action a, also noted Qπ(s,a), is the expected utility from state s after taking action a and then following policy π. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**69. Value of a policy ― The value of a policy π from state s, also noted Vπ(s), is the expected utility by following policy π from state s over random paths. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**70. Remark: Vπ(s) is equal to 0 if s is an end state.**
+
+&#10230;
+
+<br>
+
+
+**71. Applications**
+
+&#10230;
+
+<br>
+
+
+**72. [Policy evaluation ― Given a policy π, policy evaluation is an iterative algorithm that aims at estimating Vπ. It is done as follows:, Initialization: for all states s, we have:, Iteration: for t from 1 to TPE, we have, with]**
+
+&#10230;
+
+<br>
+
+
+**73. Remark: by noting S the number of states, A the number of actions per state, S′ the number of successors and T the number of iterations, then the time complexity is of O(TPESS′).**
+
+&#10230;
+
+<br>
+
+
+**74. Optimal Q-value ― The optimal Q-value Qopt(s,a) of state s with action a is defined to be the maximum Q-value attained by any policy starting. It is computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**75. Optimal value ― The optimal value Vopt(s) of state s is defined as being the maximum value attained by any policy. It is computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**76. actions**
+
+&#10230;
+
+<br>
+
+
+**77. Optimal policy ― The optimal policy πopt is defined as being the policy that leads to the optimal values. It is defined by:**
+
+&#10230;
+
+<br>
+
+
+**78. [Value iteration ― Value iteration is an algorithm that finds the optimal value Vopt as well as the optimal policy πopt. It is done as follows:, Initialization: for all states s, we have:, Iteration: for t from 1 to TVI, we have:, with]**
+
+&#10230;
+
+<br>
+
+
+**79. Remark: if we have either γ<1 or the MDP graph being acyclic, then the value iteration algorithm is guaranteed to converge to the correct answer.**
+
+&#10230;
+
+<br>
+
+
+**80. When unknown transitions and rewards**
+
+&#10230;
+
+<br>
+
+
+**81. Now, let's assume that the transition probabilities and the rewards are unknown.**
+
+&#10230;
+
+<br>
+
+
+**82. Model-based Monte Carlo ― The model-based Monte Carlo method aims at estimating T(s,a,s′) and Reward(s,a,s′) using Monte Carlo simulation with: **
+
+&#10230;
+
+<br>
+
+
+**83. [# times (s,a,s′) occurs, and]**
+
+&#10230;
+
+<br>
+
+
+**84. These estimations will be then used to deduce Q-values, including Qπ and Qopt.**
+
+&#10230;
+
+<br>
+
+
+**85. Remark: model-based Monte Carlo is said to be off-policy, because the estimation does not depend on the exact policy.**
+
+&#10230;
+
+<br>
+
+
+**86. Model-free Monte Carlo ― The model-free Monte Carlo method aims at directly estimating Qπ, as follows:**
+
+&#10230;
+
+<br>
+
+
+**87. Qπ(s,a)=average of ut where st−1=s,at=a**
+
+&#10230;
+
+<br>
+
+
+**88. where ut denotes the utility starting at step t of a given episode.**
+
+&#10230;
+
+<br>
+
+
+**89. Remark: model-free Monte Carlo is said to be on-policy, because the estimated value is dependent on the policy π used to generate the data.**
+
+&#10230;
+
+<br>
+
+
+**90. Equivalent formulation - By introducing the constant η=11+(#updates to (s,a)) and for each (s,a,u) of the training set, the update rule of model-free Monte Carlo has a convex combination formulation:**
+
+&#10230;
+
+<br>
+
+
+**91. as well as a stochastic gradient formulation:**
+
+&#10230;
+
+<br>
+
+
+**92. SARSA ― State-action-reward-state-action (SARSA) is a boostrapping method estimating Qπ by using both raw data and estimates as part of the update rule. For each (s,a,r,s′,a′), we have:**
+
+&#10230;
+
+<br>
+
+
+**93. Remark: the SARSA estimate is updated on the fly as opposed to the model-free Monte Carlo one where the estimate can only be updated at the end of the episode.**
+
+&#10230;
+
+<br>
+
+
+**94. Q-learning ― Q-learning is an off-policy algorithm that produces an estimate for Qopt. On each (s,a,r,s′,a′), we have:**
+
+&#10230;
+
+<br>
+
+
+**95. Epsilon-greedy ― The epsilon-greedy policy is an algorithm that balances exploration with probability ϵ and exploitation with probability 1−ϵ. For a given state s, the policy πact is computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**96. [with probability, random from Actions(s)]**
+
+&#10230;
+
+<br>
+
+
+**97. Game playing**
+
+&#10230;
+
+<br>
+
+
+**98. In games (e.g. chess, backgammon, Go), other agents are present and need to be taken into account when constructing our policy.**
+
+&#10230;
+
+<br>
+
+
+**99. Game tree ― A game tree is a tree that describes the possibilities of a game. In particular, each node is a decision point for a player and each root-to-leaf path is a possible outcome of the game.**
+
+&#10230;
+
+<br>
+
+
+**100. [Two-player zero-sum game ― It is a game where each state is fully observed and such that players take turns. It is defined with:, a starting state sstart, possible actions Actions(s) from state s, successors Succ(s,a) from states s with actions a, whether an end state was reached IsEnd(s), the agent's utility Utility(s) at end state s, the player Player(s) who controls state s]**
+
+&#10230;
+
+<br>
+
+
+**101. Remark: we will assume that the utility of the agent has the opposite sign of the one of the opponent.**
+
+&#10230;
+
+<br>
+
+
+**102. [Types of policies ― There are two types of policies:, Deterministic policies, noted πp(s), which are actions that player p takes in state s., Stochastic policies, noted πp(s,a)∈[0,1], which are probabilities that player p takes action a in state s.]**
+
+&#10230;
+
+<br>
+
+
+**103. Expectimax ― For a given state s, the expectimax value Vexptmax(s) is the maximum expected utility of any agent policy when playing with respect to a fixed and known opponent policy πopp. It is computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**104. Remark: expectimax is the analog of value iteration for MDPs.**
+
+&#10230;
+
+<br>
+
+
+**105. Minimax ― The goal of minimax policies is to find an optimal policy against an adversary by assuming the worst case, i.e. that the opponent is doing everything to minimize the agent's utility. It is done as follows:**
+
+&#10230;
+
+<br>
+
+
+**106. Remark: we can extract πmax and πmin from the minimax value Vminimax.**
+
+&#10230;
+
+<br>
+
+
+**107. Minimax properties ― By noting V the value function, there are 3 properties around minimax to have in mind:**
+
+&#10230;
+
+<br>
+
+
+**108. Property 1: if the agent were to change its policy to any πagent, then the agent would be no better off.**
+
+&#10230;
+
+<br>
+
+
+**109. Property 2: if the opponent changes its policy from πmin to πopp, then he will be no better off.**
+
+&#10230;
+
+<br>
+
+
+**110. Property 3: if the opponent is known to be not playing the adversarial policy, then the minimax policy might not be optimal for the agent.**
+
+&#10230;
+
+<br>
+
+
+**111. In the end, we have the following relationship:**
+
+&#10230;
+
+<br>
+
+
+**112. Speeding up minimax**
+
+&#10230;
+
+<br>
+
+
+**113. Evaluation function ― An evaluation function is a domain-specific and approximate estimate of the value Vminimax(s). It is noted Eval(s).**
+
+&#10230;
+
+<br>
+
+
+**114. Remark: FutureCost(s) is an analogy for search problems.**
+
+&#10230;
+
+<br>
+
+
+**115. Alpha-beta pruning ― Alpha-beta pruning is a domain-general exact method optimizing the minimax algorithm by avoiding the unnecessary exploration of parts of the game tree. To do so, each player keeps track of the best value they can hope for (stored in α for the maximizing player and in β for the minimizing player). At a given step, the condition β<α means that the optimal path is not going to be in the current branch as the earlier player had a better option at their disposal.**
+
+&#10230;
+
+<br>
+
+
+**116. TD learning ― Temporal difference (TD) learning is used when we don't know the transitions/rewards. The value is based on exploration policy. To be able to use it, we need to know rules of the game Succ(s,a). For each (s,a,r,s′), the update is done as follows:**
+
+&#10230;
+
+<br>
+
+
+**117. Simultaneous games**
+
+&#10230;
+
+<br>
+
+
+**118. This is the contrary of turn-based games, where there is no ordering on the player's moves.**
+
+&#10230;
+
+<br>
+
+
+**119. Single-move simultaneous game ― Let there be two players A and B, with given possible actions. We note V(a,b) to be A's utility if A chooses action a, B chooses action b. V is called the payoff matrix.**
+
+&#10230;
+
+<br>
+
+
+**120. [Strategies ― There are two main types of strategies:, A pure strategy is a single action:, A mixed strategy is a probability distribution over actions:]**
+
+&#10230;
+
+<br>
+
+
+**121. Game evaluation ― The value of the game V(πA,πB) when player A follows πA and player B follows πB is such that:**
+
+&#10230;
+
+<br>
+
+
+**122. Minimax theorem ― By noting πA,πB ranging over mixed strategies, for every simultaneous two-player zero-sum game with a finite number of actions, we have:**
+
+&#10230;
+
+<br>
+
+
+**123. Non-zero-sum games**
+
+&#10230;
+
+<br>
+
+
+**124. Payoff matrix ― We define Vp(πA,πB) to be the utility for player p.**
+
+&#10230;
+
+<br>
+
+
+**125. Nash equilibrium ― A Nash equilibrium is (π∗A,π∗B) such that no player has an incentive to change its strategy. We have:**
+
+&#10230;
+
+<br>
+
+
+**126. and**
+
+&#10230;
+
+<br>
+
+
+**127. Remark: in any finite-player game with finite number of actions, there exists at least one Nash equilibrium.**
+
+&#10230;
+
+<br>
+
+
+**128. [Tree search, Backtracking search, Breadth-first search, Depth-first search, Iterative deepening]**
+
+&#10230;
+
+<br>
+
+
+**129. [Graph search, Dynamic programming, Uniform cost search]**
+
+&#10230;
+
+<br>
+
+
+**130. [Learning costs, Structured perceptron]**
+
+&#10230;
+
+<br>
+
+
+**131. [A star search, Heuristic function, Algorithm, Consistency, correctness, Admissibility, efficiency]**
+
+&#10230;
+
+<br>
+
+
+**132. [Relaxation, Relaxed search problem, Relaxed heuristic, Max heuristic]**
+
+&#10230;
+
+<br>
+
+
+**133. [Markov decision processes, Overview, Policy evaluation, Value iteration, Transitions, rewards]**
+
+&#10230;
+
+<br>
+
+
+**134. [Game playing, Expectimax, Minimax, Speeding up minimax, Simultaneous games, Non-zero-sum games]**
+
+&#10230;
+
+<br>
+
+
+**135. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**136. Original authors**
+
+&#10230;
+
+<br>
+
+
+**137. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**138. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**139. By X and Y**
+
+&#10230;
+
+<br>
+
+
+**140. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230;
diff --git a/template/cs-221-variables-models.md b/template/cs-221-variables-models.md
new file mode 100644
index 000000000..f55ef0270
--- /dev/null
+++ b/template/cs-221-variables-models.md
@@ -0,0 +1,617 @@
+**Variables-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-variables-models)
+
+<br>
+
+**1. Variables-based models with CSP and Bayesian networks**
+
+&#10230;
+
+<br>
+
+
+**2. Constraint satisfaction problems**
+
+&#10230;
+
+<br>
+
+
+**3. In this section, our objective is to find maximum weight assignments of variable-based models. One advantage compared to states-based models is that these algorithms are more convenient to encode problem-specific constraints.**
+
+&#10230;
+
+<br>
+
+
+**4. Factor graphs**
+
+&#10230;
+
+<br>
+
+
+**5. Definition ― A factor graph, also referred to as a Markov random field, is a set of variables X=(X1,...,Xn) where Xi∈Domaini and m factors f1,...,fm with each fj(X)⩾0.**
+
+&#10230;
+
+<br>
+
+
+**6. Domain**
+
+&#10230;
+
+<br>
+
+
+**7. Scope and arity ― The scope of a factor fj is the set of variables it depends on. The size of this set is called the arity.**
+
+&#10230;
+
+<br>
+
+
+**8. Remark: factors of arity 1 and 2 are called unary and binary respectively.**
+
+&#10230;
+
+<br>
+
+
+**9. Assignment weight ― Each assignment x=(x1,...,xn) yields a weight Weight(x) defined as being the product of all factors fj applied to that assignment. Its expression is given by:**
+
+&#10230;
+
+<br>
+
+
+**10. Constraint satisfaction problem ― A constraint satisfaction problem (CSP) is a factor graph where all factors are binary; we call them to be constraints:**
+
+&#10230;
+
+<br>
+
+
+**11. Here, the constraint j with assignment x is said to be satisfied if and only if fj(x)=1.**
+
+&#10230;
+
+<br>
+
+
+**12. Consistent assignment ― An assignment x of a CSP is said to be consistent if and only if Weight(x)=1, i.e. all constraints are satisfied.**
+
+&#10230;
+
+<br>
+
+
+**13. Dynamic ordering**
+
+&#10230;
+
+<br>
+
+
+**14. Dependent factors ― The set of dependent factors of variable Xi with partial assignment x is called D(x,Xi), and denotes the set of factors that link Xi to already assigned variables.**
+
+&#10230;
+
+<br>
+
+
+**15. Backtracking search ― Backtracking search is an algorithm used to find maximum weight assignments of a factor graph. At each step, it chooses an unassigned variable and explores its values by recursion. Dynamic ordering (i.e. choice of variables and values) and lookahead (i.e. early elimination of inconsistent options) can be used to explore the graph more efficiently, although the worst-case runtime stays exponential: O(|Domain|n).**
+
+&#10230;
+
+<br>
+
+
+**16. [Forward checking ― It is a one-step lookahead heuristic that preemptively removes inconsistent values from the domains of neighboring variables. It has the following characteristics:, After assigning a variable Xi, it eliminates inconsistent values from the domains of all its neighbors., If any of these domains becomes empty, we stop the local backtracking search., If we un-assign a variable Xi, we have to restore the domain of its neighbors.]**
+
+&#10230;
+
+<br>
+
+
+**17. Most constrained variable ― It is a variable-level ordering heuristic that selects the next unassigned variable that has the fewest consistent values. This has the effect of making inconsistent assignments to fail earlier in the search, which enables more efficient pruning.**
+
+&#10230;
+
+<br>
+
+
+**18. Least constrained value ― It is a value-level ordering heuristic that assigns the next value that yields the highest number of consistent values of neighboring variables. Intuitively, this procedure chooses first the values that are most likely to work.**
+
+&#10230;
+
+<br>
+
+
+**19. Remark: in practice, this heuristic is useful when all factors are constraints.**
+
+&#10230;
+
+<br>
+
+
+**20. The example above is an illustration of the 3-color problem with backtracking search coupled with most constrained variable exploration and least constrained value heuristic, as well as forward checking at each step.**
+
+&#10230;
+
+<br>
+
+
+**21. [Arc consistency ― We say that arc consistency of variable Xl with respect to Xk is enforced when for each xl∈Domainl:, unary factors of Xl are non-zero, there exists at least one xk∈Domaink such that any factor between Xl and Xk is non-zero.]**
+
+&#10230;
+
+<br>
+
+
+**22. AC-3 ― The AC-3 algorithm is a multi-step lookahead heuristic that applies forward checking to all relevant variables. After a given assignment, it performs forward checking and then successively enforces arc consistency with respect to the neighbors of variables for which the domain change during the process.**
+
+&#10230;
+
+<br>
+
+
+**23. Remark: AC-3 can be implemented both iteratively and recursively.**
+
+&#10230;
+
+<br>
+
+
+**24. Approximate methods**
+
+&#10230;
+
+<br>
+
+
+**25. Beam search ― Beam search is an approximate algorithm that extends partial assignments of n variables of branching factor b=|Domain| by exploring the K top paths at each step. The beam size K∈{1,...,bn} controls the tradeoff between efficiency and accuracy. This algorithm has a time complexity of O(n⋅Kblog(Kb)).**
+
+&#10230;
+
+<br>
+
+
+**26. The example below illustrates a possible beam search of parameters K=2, b=3 and n=5.**
+
+&#10230;
+
+<br>
+
+
+**27. Remark: K=1 corresponds to greedy search whereas K→+∞ is equivalent to BFS tree search.**
+
+&#10230;
+
+<br>
+
+
+**28. Iterated conditional modes ― Iterated conditional modes (ICM) is an iterative approximate algorithm that modifies the assignment of a factor graph one variable at a time until convergence. At step i, we assign to Xi the value v that maximizes the product of all factors connected to that variable.**
+
+&#10230;
+
+<br>
+
+
+**29. Remark: ICM may get stuck in local minima.**
+
+&#10230;
+
+<br>
+
+
+**30. [Gibbs sampling ― Gibbs sampling is an iterative approximate method that modifies the assignment of a factor graph one variable at a time until convergence. At step i:, we assign to each element u∈Domaini a weight w(u) that is the product of all factors connected to that variable, we sample v from the probability distribution induced by w and assign it to Xi.]**
+
+&#10230;
+
+<br>
+
+
+**31. Remark: Gibbs sampling can be seen as the probabilistic counterpart of ICM. It has the advantage to be able to escape local minima in most cases.**
+
+&#10230;
+
+<br>
+
+
+**32. Factor graph transformations**
+
+&#10230;
+
+<br>
+
+
+**33. Independence ― Let A,B be a partitioning of the variables X. We say that A and B are independent if there are no edges between A and B and we write:**
+
+&#10230;
+
+<br>
+
+
+**34. Remark: independence is the key property that allows us to solve subproblems in parallel.**
+
+&#10230;
+
+<br>
+
+
+**35. Conditional independence ― We say that A and B are conditionally independent given C if conditioning on C produces a graph in which A and B are independent. In this case, it is written:**
+
+&#10230;
+
+<br>
+
+
+**36. [Conditioning ― Conditioning is a transformation aiming at making variables independent that breaks up a factor graph into smaller pieces that can be solved in parallel and can use backtracking. In order to condition on a variable Xi=v, we do as follows:, Consider all factors f1,...,fk that depend on Xi, Remove Xi and f1,...,fk, Add gj(x) for j∈{1,...,k} defined as:]**
+
+&#10230;
+
+<br>
+
+
+**37. Markov blanket ― Let A⊆X be a subset of variables. We define MarkovBlanket(A) to be the neighbors of A that are not in A.**
+
+&#10230;
+
+<br>
+
+
+**38. Proposition ― Let C=MarkovBlanket(A) and B=X∖(A∪C). Then we have:**
+
+&#10230;
+
+<br>
+
+
+**39. [Elimination ― Elimination is a factor graph transformation that removes Xi from the graph and solves a small subproblem conditioned on its Markov blanket as follows:, Consider all factors fi,1,...,fi,k that depend on Xi, Remove Xi
+and fi,1,...,fi,k, Add fnew,i(x) defined as:]**
+
+&#10230;
+
+<br>
+
+
+**40. Treewidth ― The treewidth of a factor graph is the maximum arity of any factor created by variable elimination with the best variable ordering. In other words,**
+
+&#10230;
+
+<br>
+
+
+**41. The example below illustrates the case of a factor graph of treewidth 3.**
+
+&#10230;
+
+<br>
+
+
+**42. Remark: finding the best variable ordering is a NP-hard problem.**
+
+&#10230;
+
+<br>
+
+
+**43. Bayesian networks**
+
+&#10230;
+
+<br>
+
+
+**44. In this section, our goal will be to compute conditional probabilities. What is the probability of a query given evidence?**
+
+&#10230;
+
+<br>
+
+
+**45. Introduction**
+
+&#10230;
+
+<br>
+
+
+**46. Explaining away ― Suppose causes C1 and C2 influence an effect E. Conditioning on the effect E and on one of the causes (say C1) changes the probability of the other cause (say C2). In this case, we say that C1 has explained away C2.**
+
+&#10230;
+
+<br>
+
+
+**47. Directed acyclic graph ― A directed acyclic graph (DAG) is a finite directed graph with no directed cycles.**
+
+&#10230;
+
+<br>
+
+
+**48. Bayesian network ― A Bayesian network is a directed acyclic graph (DAG) that specifies a joint distribution over random variables X=(X1,...,Xn) as a product of local conditional distributions, one for each node:**
+
+&#10230;
+
+<br>
+
+
+**49. Remark: Bayesian networks are factor graphs imbued with the language of probability.**
+
+&#10230;
+
+<br>
+
+
+**50. Locally normalized ― For each xParents(i), all factors are local conditional distributions. Hence they have to satisfy:**
+
+&#10230;
+
+<br>
+
+
+**51. As a result, sub-Bayesian networks and conditional distributions are consistent.**
+
+&#10230;
+
+<br>
+
+
+**52. Remark: local conditional distributions are the true conditional distributions.**
+
+&#10230;
+
+<br>
+
+
+**53. Marginalization ― The marginalization of a leaf node yields a Bayesian network without that node.**
+
+&#10230;
+
+<br>
+
+
+**54. Probabilistic programs**
+
+&#10230;
+
+<br>
+
+
+**55. Concept ― A probabilistic program randomizes variables assignment. That way, we can write down complex Bayesian networks that generate assignments without us having to explicitly specify associated probabilities.**
+
+&#10230;
+
+<br>
+
+
+**56. Remark: examples of probabilistic programs include Hidden Markov model (HMM), factorial HMM, naive Bayes, latent Dirichlet allocation, diseases and symptoms and stochastic block models.**
+
+&#10230;
+
+<br>
+
+
+**57. Summary ― The table below summarizes the common probabilistic programs as well as their applications:**
+
+&#10230;
+
+<br>
+
+
+**58. [Program, Algorithm, Illustration, Example]**
+
+&#10230;
+
+<br>
+
+
+**59. [Markov Model, Hidden Markov Model (HMM), Factorial HMM, Naive Bayes, Latent Dirichlet Allocation (LDA)]**
+
+&#10230;
+
+<br>
+
+
+**60. [Generate, distribution]**
+
+&#10230;
+
+<br>
+
+
+**61. [Language modeling, Object tracking, Multiple object tracking, Document classification, Topic modeling]**
+
+&#10230;
+
+<br>
+
+
+**62. Inference**
+
+&#10230;
+
+<br>
+
+
+**63. [General probabilistic inference strategy ― The strategy to compute the probability P(Q|E=e) of query Q given evidence E=e is as follows:, Step 1: Remove variables that are not ancestors of the query Q or the evidence E by marginalization, Step 2: Convert Bayesian network to factor graph, Step 3: Condition on the evidence E=e, Step 4: Remove nodes disconnected from the query Q by marginalization, Step 5: Run a probabilistic inference algorithm (manual, variable elimination, Gibbs sampling, particle filtering)]**
+
+&#10230;
+
+<br>
+
+
+**64. Forward-backward algorithm ― This algorithm computes the exact value of P(H=hk|E=e) (smoothing query) for any k∈{1,...,L} in the case of an HMM of size L. To do so, we proceed in 3 steps:**
+
+&#10230;
+
+<br>
+
+
+**65. Step 1: for ..., compute ...**
+
+&#10230;
+
+<br>
+
+
+**66. with the convention F0=BL+1=1. From this procedure and these notations, we get that**
+
+&#10230;
+
+<br>
+
+
+**67. Remark: this algorithm interprets each assignment to be a path where each edge hi−1→hi is of weight p(hi|hi−1)p(ei|hi).**
+
+&#10230;
+
+<br>
+
+
+**68. [Gibbs sampling ― This algorithm is an iterative approximate method that uses a small set of assignments (particles) to represent a large probability distribution. From a random assignment x, Gibbs sampling performs the following steps for i∈{1,...,n} until convergence:, For all u∈Domaini, compute the weight w(u) of assignment x where Xi=u, Sample v from the probability distribution induced by w: v∼P(Xi=v|X−i=x−i), Set Xi=v]**
+
+&#10230;
+
+<br>
+
+
+**69. Remark: X−i denotes X∖{Xi} and x−i represents the corresponding assignment.**
+
+&#10230;
+
+<br>
+
+
+**70. [Particle filtering ― This algorithm approximates the posterior density of state variables given the evidence of observation variables by keeping track of K particles at a time. Starting from a set of particles C of size K, we run the following 3 steps iteratively:, Step 1: proposal - For each old particle xt−1∈C, sample x from the transition probability distribution p(x|xt−1) and add x to a set C′., Step 2: weighting - Weigh each x of the set C′ by w(x)=p(et|x), where et is the evidence observed at time t., Step 3: resampling - Sample K elements from the set C′ using the probability distribution induced by w and store them in C: these are the current particles xt.]**
+
+&#10230;
+
+<br>
+
+
+**71. Remark: a more expensive version of this algorithm also keeps track of past particles in the proposal step.**
+
+&#10230;
+
+<br>
+
+
+**72. Maximum likelihood ― If we don't know the local conditional distributions, we can learn them using maximum likelihood.**
+
+&#10230;
+
+<br>
+
+
+**73. Laplace smoothing ― For each distribution d and partial assignment (xParents(i),xi), add λ to countd(xParents(i),xi), then normalize to get probability estimates.**
+
+&#10230;
+
+<br>
+
+
+**74. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230;
+
+<br>
+
+
+**75. [E-step: Evaluate the posterior probability q(h) that each data point e came from a particular cluster h as follows:, M-step: Use the posterior probabilities q(h) as cluster specific weights on data points e to determine θ through maximum likelihood.]**
+
+&#10230;
+
+<br>
+
+
+**76. [Factor graphs, Arity, Assignment weight, Constraint satisfaction problem, Consistent assignment]**
+
+&#10230;
+
+<br>
+
+
+**77. [Dynamic ordering, Dependent factors, Backtracking search, Forward checking, Most constrained variable, Least constrained value]**
+
+&#10230;
+
+<br>
+
+
+**78. [Approximate methods, Beam search, Iterated conditional modes, Gibbs sampling]**
+
+&#10230;
+
+<br>
+
+
+**79. [Factor graph transformations, Conditioning, Elimination]**
+
+&#10230;
+
+<br>
+
+
+**80. [Bayesian networks, Definition, Locally normalized, Marginalization]**
+
+&#10230;
+
+<br>
+
+
+**81. [Probabilistic program, Concept, Summary]**
+
+&#10230;
+
+<br>
+
+
+**82. [Inference, Forward-backward algorithm, Gibbs sampling, Laplace smoothing]**
+
+&#10230;
+
+<br>
+
+
+**83. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**84. Original authors**
+
+&#10230;
+
+<br>
+
+
+**85. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**86. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**87. By X and Y**
+
+&#10230;
+
+<br>
+
+
+**88. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230;
diff --git a/ar/cheatsheet-deep-learning.md b/template/cs-229-deep-learning.md
similarity index 98%
rename from ar/cheatsheet-deep-learning.md
rename to template/cs-229-deep-learning.md
index a5aa3756c..a7770a048 100644
--- a/ar/cheatsheet-deep-learning.md
+++ b/template/cs-229-deep-learning.md
@@ -1,3 +1,7 @@
+**Deep learning translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-deep-learning)
+
+<br>
+
 **1. Deep Learning cheatsheet**
 
 &#10230;
diff --git a/de/refresher-linear-algebra.md b/template/cs-229-linear-algebra.md
similarity index 97%
rename from de/refresher-linear-algebra.md
rename to template/cs-229-linear-algebra.md
index a6b440d1e..dced85397 100644
--- a/de/refresher-linear-algebra.md
+++ b/template/cs-229-linear-algebra.md
@@ -1,3 +1,7 @@
+**Linear Algebra and Calculus translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus)
+
+<br>
+
 **1. Linear Algebra and Calculus refresher**
 
 &#10230;
diff --git a/hi/cheatsheet-machine-learning-tips-and-tricks.md b/template/cs-229-machine-learning-tips-and-tricks.md
similarity index 97%
rename from hi/cheatsheet-machine-learning-tips-and-tricks.md
rename to template/cs-229-machine-learning-tips-and-tricks.md
index 9712297b8..edba03259 100644
--- a/hi/cheatsheet-machine-learning-tips-and-tricks.md
+++ b/template/cs-229-machine-learning-tips-and-tricks.md
@@ -1,3 +1,7 @@
+**Machine Learning tips and tricks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-machine-learning-tips-and-tricks)
+
+<br>
+
 **1. Machine Learning tips and tricks cheatsheet**
 
 &#10230;
diff --git a/de/refresher-probability.md b/template/cs-229-probability.md
similarity index 98%
rename from de/refresher-probability.md
rename to template/cs-229-probability.md
index 5c9b34656..b8be13004 100644
--- a/de/refresher-probability.md
+++ b/template/cs-229-probability.md
@@ -1,3 +1,7 @@
+**Probabilities and Statistics translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/refresher-probabilities-statistics)
+
+<br>
+
 **1. Probabilities and Statistics refresher**
 
 &#10230;
diff --git a/de/cheatsheet-supervised-learning.md b/template/cs-229-supervised-learning.md
similarity index 98%
rename from de/cheatsheet-supervised-learning.md
rename to template/cs-229-supervised-learning.md
index a6b19ea1c..d82685e6e 100644
--- a/de/cheatsheet-supervised-learning.md
+++ b/template/cs-229-supervised-learning.md
@@ -1,3 +1,7 @@
+**Supervised Learning translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-supervised-learning)
+
+<br>
+
 **1. Supervised Learning cheatsheet**
 
 &#10230;
diff --git a/he/cheatsheet-unsupervised-learning.md b/template/cs-229-unsupervised-learning.md
similarity index 96%
rename from he/cheatsheet-unsupervised-learning.md
rename to template/cs-229-unsupervised-learning.md
index 40724eb28..18fafef8c 100644
--- a/he/cheatsheet-unsupervised-learning.md
+++ b/template/cs-229-unsupervised-learning.md
@@ -1,3 +1,7 @@
+**Unsupervised Learning translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-unsupervised-learning)
+
+<br>
+
 **1. Unsupervised Learning cheatsheet**
 
 &#10230;
@@ -299,7 +303,7 @@ dimensions by maximizing the variance of the data as follows:**
 
 <br>
 
-**51. The Machine Learning cheatsheets are now available in Hebrew.**
+**51. The Machine Learning cheatsheets are now available in [target language].**
 
 &#10230;
 
diff --git a/template/cs-230-convolutional-neural-networks.md b/template/cs-230-convolutional-neural-networks.md
new file mode 100644
index 000000000..94006a675
--- /dev/null
+++ b/template/cs-230-convolutional-neural-networks.md
@@ -0,0 +1,716 @@
+**Convolutional Neural Networks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks)
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230;
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230;
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+&#10230;
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230;
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230;
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230;
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230;
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230;
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230;
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230;
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230;
+
+<br>
+
+
+**12. Overview**
+
+&#10230;
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230;
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230;
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230;
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230;
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230;
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230;
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230;
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230;
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230;
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230;
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230;
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230;
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230;
+
+<br>
+
+
+**26. Filter**
+
+&#10230;
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230;
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230;
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230;
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230;
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230;
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230;
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230;
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230;
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230;
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230;
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230;
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230;
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230;
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230;
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230;
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230;
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230;
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230;
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230;
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230;
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**48. where**
+
+&#10230;
+
+<br>
+
+
+**49. Object detection**
+
+&#10230;
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230;
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230;
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230;
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230;
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230;
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230;
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230;
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230;
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230;
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230;
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230;
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230;
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230;
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230;
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230;
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230;
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230;
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230;
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230;
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230;
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230;
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230;
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230;
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230;
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230;
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230;
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230;
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230;
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230;
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230;
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230;
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230;
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230;
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230;
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230;
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230;
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230;
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230;
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230;
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230;
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230;
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230;
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230;
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+
+**98. Original authors**
+
+&#10230;
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230;
+
+<br>
diff --git a/template/cs-230-deep-learning-tips-and-tricks.md b/template/cs-230-deep-learning-tips-and-tricks.md
new file mode 100644
index 000000000..75127ac5d
--- /dev/null
+++ b/template/cs-230-deep-learning-tips-and-tricks.md
@@ -0,0 +1,457 @@
+**Deep Learning Tips and Tricks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-deep-learning-tips-and-tricks)
+
+<br>
+
+**1. Deep Learning Tips and Tricks cheatsheet**
+
+&#10230;
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230;
+
+<br>
+
+
+**3. Tips and tricks**
+
+&#10230;
+
+<br>
+
+
+**4. [Data processing, Data augmentation, Batch normalization]**
+
+&#10230;
+
+<br>
+
+
+**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
+
+&#10230;
+
+<br>
+
+
+**6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
+
+&#10230;
+
+<br>
+
+
+**7. [Regularization, Dropout, Weight regularization, Early stopping]**
+
+&#10230;
+
+<br>
+
+
+**8. [Good practices, Overfitting small batch, Gradient checking]**
+
+&#10230;
+
+<br>
+
+
+**9. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+
+**10. Data processing**
+
+&#10230;
+
+<br>
+
+
+**11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
+
+&#10230;
+
+<br>
+
+
+**12. [Original, Flip, Rotation, Random crop]**
+
+&#10230;
+
+<br>
+
+
+**13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
+
+&#10230;
+
+<br>
+
+
+**14. [Color shift, Noise addition, Information loss, Contrast change]**
+
+&#10230;
+
+<br>
+
+
+**15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
+
+&#10230;
+
+<br>
+
+
+**16. Remark: data is usually augmented on the fly during training.**
+
+&#10230;
+
+<br>
+
+
+**17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
+
+&#10230;
+
+<br>
+
+
+**18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
+
+&#10230;
+
+<br>
+
+
+**19. Training a neural network**
+
+&#10230;
+
+<br>
+
+
+**20. Definitions**
+
+&#10230;
+
+<br>
+
+
+**21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
+
+&#10230;
+
+<br>
+
+
+**22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
+
+&#10230;
+
+<br>
+
+
+**23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
+
+&#10230;
+
+<br>
+
+
+**24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**25. Finding optimal weights**
+
+&#10230;
+
+<br>
+
+
+**26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
+
+&#10230;
+
+<br>
+
+
+**27. Using this method, each weight is updated with the rule:**
+
+&#10230;
+
+<br>
+
+
+**28. Updating weights ― In a neural network, weights are updated as follows:**
+
+&#10230;
+
+<br>
+
+
+**29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
+
+&#10230;
+
+<br>
+
+
+**30. [Forward propagation, Backpropagation, Weights update]**
+
+&#10230;
+
+<br>
+
+
+**31. Parameter tuning**
+
+&#10230;
+
+<br>
+
+
+**32. Weights initialization**
+
+&#10230;
+
+<br>
+
+
+**33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
+
+&#10230;
+
+<br>
+
+
+**34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
+
+&#10230;
+
+<br>
+
+
+**35. [Training size, Illustration, Explanation]**
+
+&#10230;
+
+<br>
+
+
+**36. [Small, Medium, Large]**
+
+&#10230;
+
+<br>
+
+
+**37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
+
+&#10230;
+
+<br>
+
+
+**38. Optimizing convergence**
+
+&#10230;
+
+<br>
+
+
+**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.
+**
+
+&#10230;
+
+<br>
+
+
+**40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**41. [Method, Explanation, Update of w, Update of b]**
+
+&#10230;
+
+<br>
+
+
+**42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
+
+&#10230;
+
+<br>
+
+
+**43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
+
+&#10230;
+
+<br>
+
+
+**44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
+
+&#10230;
+
+<br>
+
+
+**45. Remark: other methods include Adadelta, Adagrad and SGD.**
+
+&#10230;
+
+<br>
+
+
+**46. Regularization**
+
+&#10230;
+
+<br>
+
+
+**47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
+
+&#10230;
+
+<br>
+
+
+**48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
+
+&#10230;
+
+<br>
+
+
+**49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**50. [LASSO, Ridge, Elastic Net]**
+
+&#10230;
+
+<br>
+
+**50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230;
+
+<br>
+
+**51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
+
+&#10230;
+
+<br>
+
+
+**52. [Error, Validation, Training, early stopping, Epochs]**
+
+&#10230;
+
+<br>
+
+
+**53. Good practices**
+
+&#10230;
+
+<br>
+
+
+**54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
+
+&#10230;
+
+<br>
+
+
+**55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
+
+&#10230;
+
+<br>
+
+
+**56. [Type, Numerical gradient, Analytical gradient]**
+
+&#10230;
+
+<br>
+
+
+**57. [Formula, Comments]**
+
+&#10230;
+
+<br>
+
+
+**58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
+
+&#10230;
+
+<br>
+
+
+**59. ['Exact' result, Direct computation, Used in the final implementation]**
+
+&#10230;
+
+<br>
+
+
+**60. The Deep Learning cheatsheets are now available in [target language].
+
+&#10230;
+
+
+**61. Original authors**
+
+&#10230;
+
+<br>
+
+**62.Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**63.Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**64.View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+**65.By X and Y**
+
+&#10230;
+
+<br>
diff --git a/template/cs-230-recurrent-neural-networks.md b/template/cs-230-recurrent-neural-networks.md
new file mode 100644
index 000000000..bd3c638bc
--- /dev/null
+++ b/template/cs-230-recurrent-neural-networks.md
@@ -0,0 +1,677 @@
+**Recurrent Neural Networks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks)
+
+<br>
+
+**1. Recurrent Neural Networks cheatsheet**
+
+&#10230;
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230;
+
+<br>
+
+
+**3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
+
+&#10230;
+
+<br>
+
+
+**4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
+
+&#10230;
+
+<br>
+
+
+**5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
+
+&#10230;
+
+<br>
+
+
+**6. [Comparing words, Cosine similarity, t-SNE]**
+
+&#10230;
+
+<br>
+
+
+**7. [Language model, n-gram, Perplexity]**
+
+&#10230;
+
+<br>
+
+
+**8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
+
+&#10230;
+
+<br>
+
+
+**9. [Attention, Attention model, Attention weights]**
+
+&#10230;
+
+<br>
+
+
+**10. Overview**
+
+&#10230;
+
+<br>
+
+
+**11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
+
+&#10230;
+
+<br>
+
+
+**12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
+
+&#10230;
+
+<br>
+
+
+**13. and**
+
+&#10230;
+
+<br>
+
+
+**14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
+
+&#10230;
+
+<br>
+
+
+**15. The pros and cons of a typical RNN architecture are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
+
+&#10230;
+
+<br>
+
+
+**17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
+
+&#10230;
+
+<br>
+
+
+**18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**19. [Type of RNN, Illustration, Example]**
+
+&#10230;
+
+<br>
+
+
+**20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
+
+&#10230;
+
+<br>
+
+
+**21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
+
+&#10230;
+
+<br>
+
+
+**22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
+
+&#10230;
+
+<br>
+
+
+**23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
+
+&#10230;
+
+<br>
+
+
+**24. Handling long term dependencies**
+
+&#10230;
+
+<br>
+
+
+**25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
+
+&#10230;
+
+<br>
+
+
+**26. [Sigmoid, Tanh, RELU]**
+
+&#10230;
+
+<br>
+
+
+**27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
+
+&#10230;
+
+<br>
+
+
+**28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
+
+&#10230;
+
+<br>
+
+
+**29. clipped**
+
+&#10230;
+
+<br>
+
+
+**30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
+
+&#10230;
+
+<br>
+
+
+**31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**32. [Type of gate, Role, Used in]**
+
+&#10230;
+
+<br>
+
+
+**33. [Update gate, Relevance gate, Forget gate, Output gate]**
+
+&#10230;
+
+<br>
+
+
+**34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
+
+&#10230;
+
+<br>
+
+
+**35. [LSTM, GRU]**
+
+&#10230;
+
+<br>
+
+
+**36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
+
+&#10230;
+
+<br>
+
+
+**37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
+
+&#10230;
+
+<br>
+
+
+**38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
+
+&#10230;
+
+<br>
+
+
+**39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
+
+&#10230;
+
+<br>
+
+
+**40. [Bidirectional (BRNN), Deep (DRNN)]**
+
+&#10230;
+
+<br>
+
+
+**41. Learning word representation**
+
+&#10230;
+
+<br>
+
+
+**42. In this section, we note V the vocabulary and |V| its size.**
+
+&#10230;
+
+<br>
+
+
+**43. Motivation and notations**
+
+&#10230;
+
+<br>
+
+
+**44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+
+**45. [1-hot representation, Word embedding]**
+
+&#10230;
+
+<br>
+
+
+**46. [teddy bear, book, soft]**
+
+&#10230;
+
+<br>
+
+
+**47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
+
+&#10230;
+
+<br>
+
+
+**48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
+
+&#10230;
+
+<br>
+
+
+**49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
+
+&#10230;
+
+<br>
+
+
+**50. Word embeddings**
+
+&#10230;
+
+<br>
+
+
+**51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
+
+&#10230;
+
+<br>
+
+
+**52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
+
+&#10230;
+
+<br>
+
+
+**53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
+
+&#10230;
+
+<br>
+
+
+**54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
+
+&#10230;
+
+<br>
+
+
+**55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
+
+&#10230;
+
+<br>
+
+
+**56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
+
+&#10230;
+
+<br>
+
+
+**57. Remark: this method is less computationally expensive than the skip-gram model.**
+
+&#10230;
+
+<br>
+
+
+**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
+
+&#10230;
+
+<br>
+
+
+**58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
+Given the symmetry that e and θ play in this model, the final word embedding e(final)w is given by:**
+
+&#10230;
+
+<br>
+
+
+**59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
+
+&#10230;
+
+<br>
+
+
+**60. Comparing words**
+
+&#10230;
+
+<br>
+
+
+**61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
+
+&#10230;
+
+<br>
+
+
+**62. Remark: θ is the angle between words w1 and w2.**
+
+&#10230;
+
+<br>
+
+
+**63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
+
+&#10230;
+
+<br>
+
+
+**64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
+
+&#10230;
+
+<br>
+
+
+**65. Language model**
+
+&#10230;
+
+<br>
+
+
+**66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
+
+&#10230;
+
+<br>
+
+
+**67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
+
+&#10230;
+
+<br>
+
+
+**68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**69. Remark: PP is commonly used in t-SNE.**
+
+&#10230;
+
+<br>
+
+
+**70. Machine translation**
+
+&#10230;
+
+<br>
+
+
+**71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
+
+&#10230;
+
+<br>
+
+
+**72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
+
+&#10230;
+
+<br>
+
+
+**73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k−1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
+
+&#10230;
+
+<br>
+
+
+**74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
+
+&#10230;
+
+<br>
+
+
+**75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
+
+&#10230;
+
+<br>
+
+
+**76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
+
+&#10230;
+
+<br>
+
+
+**77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
+
+&#10230;
+
+<br>
+
+
+**78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
+
+&#10230;
+
+<br>
+
+
+**79. [Case, Root cause, Remedies]**
+
+&#10230;
+
+<br>
+
+
+**80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
+
+&#10230;
+
+<br>
+
+
+**81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**82. where pn is the bleu score on n-gram only defined as follows:**
+
+&#10230;
+
+<br>
+
+
+**83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
+
+&#10230;
+
+<br>
+
+
+**84. Attention**
+
+&#10230;
+
+<br>
+
+
+**85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
+
+&#10230;
+
+<br>
+
+
+**86. with**
+
+&#10230;
+
+<br>
+
+
+**87. Remark: the attention scores are commonly used in image captioning and machine translation.**
+
+&#10230;
+
+<br>
+
+
+**88. A cute teddy bear is reading Persian literature.**
+
+&#10230;
+
+<br>
+
+
+**89. Attention weight ― The amount of attention that the output y<t> should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
+
+&#10230;
+
+<br>
+
+
+**90. Remark: computation complexity is quadratic with respect to Tx.**
+
+&#10230;
+
+<br>
+
+
+**91. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+**92. Original authors**
+
+&#10230;
+
+<br>
+
+**93. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**94. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**95. View PDF version on GitHub**
+
+&#10230;
+
+<br>
+
+**96. By X and Y**
+
+&#10230;
+
+<br>
diff --git a/template/refresher-linear-algebra.md b/template/refresher-linear-algebra.md
deleted file mode 100644
index a6b440d1e..000000000
--- a/template/refresher-linear-algebra.md
+++ /dev/null
@@ -1,339 +0,0 @@
-**1. Linear Algebra and Calculus refresher**
-
-&#10230;
-
-<br>
-
-**2. General notations**
-
-&#10230;
-
-<br>
-
-**3. Definitions**
-
-&#10230;
-
-<br>
-
-**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
-
-&#10230;
-
-<br>
-
-**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
-
-&#10230;
-
-<br>
-
-**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
-
-&#10230;
-
-<br>
-
-**7. Main matrices**
-
-&#10230;
-
-<br>
-
-**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
-
-&#10230;
-
-<br>
-
-**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we also note D as diag(d1,...,dn).**
-
-&#10230;
-
-<br>
-
-**12. Matrix operations**
-
-&#10230;
-
-<br>
-
-**13. Multiplication**
-
-&#10230;
-
-<br>
-
-**14. Vector-vector ― There are two types of vector-vector products:**
-
-&#10230;
-
-<br>
-
-**15. inner product: for x,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**16. outer product: for x∈Rm,y∈Rn, we have:**
-
-&#10230;
-
-<br>
-
-**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
-
-&#10230;
-
-<br>
-
-**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
-
-&#10230;
-
-<br>
-
-**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
-
-&#10230;
-
-<br>
-
-**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
-
-&#10230;
-
-<br>
-
-**21. Other operations**
-
-&#10230;
-
-<br>
-
-**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
-
-&#10230;
-
-<br>
-
-**23. Remark: for matrices A,B, we have (AB)T=BTAT**
-
-&#10230;
-
-<br>
-
-**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
-
-&#10230;
-
-<br>
-
-**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
-
-&#10230;
-
-<br>
-
-**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
-
-&#10230;
-
-<br>
-
-**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
-
-&#10230;
-
-<br>
-
-**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
-
-&#10230;
-
-<br>
-
-**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
-
-&#10230;
-
-<br>
-
-**30. Matrix properties**
-
-&#10230;
-
-<br>
-
-**31. Definitions**
-
-&#10230;
-
-<br>
-
-**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
-
-&#10230;
-
-<br>
-
-**33. [Symmetric, Antisymmetric]**
-
-&#10230;
-
-<br>
-
-**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
-
-&#10230;
-
-<br>
-
-**35. N(ax)=|a|N(x) for a scalar**
-
-&#10230;
-
-<br>
-
-**36. if N(x)=0, then x=0**
-
-&#10230;
-
-<br>
-
-**37. For x∈V, the most commonly used norms are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**38. [Norm, Notation, Definition, Use case]**
-
-&#10230;
-
-<br>
-
-**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
-
-&#10230;
-
-<br>
-
-**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
-
-&#10230;
-
-<br>
-
-**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
-
-&#10230;
-
-<br>
-
-**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
-
-&#10230;
-
-<br>
-
-**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
-
-&#10230;
-
-<br>
-
-**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**46. diagonal**
-
-&#10230;
-
-<br>
-
-**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
-
-&#10230;
-
-<br>
-
-**48. Matrix calculus**
-
-&#10230;
-
-<br>
-
-**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
-
-&#10230;
-
-<br>
-
-**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
-
-&#10230;
-
-<br>
-
-**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
-
-&#10230;
-
-<br>
-
-**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
-
-&#10230;
-
-<br>
-
-**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
-
-&#10230;
-
-<br>
-
-**54. [General notations, Definitions, Main matrices]**
-
-&#10230;
-
-<br>
-
-**55. [Matrix operations, Multiplication, Other operations]**
-
-&#10230;
-
-<br>
-
-**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
-
-&#10230;
-
-<br>
-
-**57. [Matrix calculus, Gradient, Hessian, Operations]**
-
-&#10230;
diff --git a/template/refresher-probability.md b/template/refresher-probability.md
deleted file mode 100644
index 5c9b34656..000000000
--- a/template/refresher-probability.md
+++ /dev/null
@@ -1,381 +0,0 @@
-**1. Probabilities and Statistics refresher**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Probability and Combinatorics**
-
-&#10230;
-
-<br>
-
-**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-
-<br>
-
-**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-
-<br>
-
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-
-<br>
-
-**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-
-<br>
-
-**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-
-<br>
-
-**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-
-<br>
-
-**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-
-<br>
-
-**12. Conditional Probability**
-
-&#10230;
-
-<br>
-
-**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-
-<br>
-
-**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-
-<br>
-
-**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-
-<br>
-
-**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-
-<br>
-
-**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-
-<br>
-
-**18. Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-
-<br>
-
-**19. Random Variables**
-
-&#10230;
-
-<br>
-
-**20. Definitions**
-
-&#10230;
-
-<br>
-
-**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-
-<br>
-
-**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-
-<br>
-
-**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-
-<br>
-
-**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-
-<br>
-
-**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-
-<br>
-
-**26. [Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-
-<br>
-
-**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-
-<br>
-
-**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-
-<br>
-
-**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-
-<br>
-
-**32. Probability Distributions**
-
-&#10230;
-
-<br>
-
-**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-
-<br>
-
-**34. Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-
-<br>
-
-**35. [Type, Distribution]**
-
-&#10230;
-
-<br>
-
-**36. Jointly Distributed Random Variables**
-
-&#10230;
-
-<br>
-
-**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-
-<br>
-
-**38. [Case, Marginal density, Cumulative function]**
-
-&#10230;
-
-<br>
-
-**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**40. Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-
-<br>
-
-**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-
-<br>
-
-**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-
-<br>
-
-**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-
-<br>
-
-**44. Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-
-<br>
-
-**45. Parameter estimation**
-
-&#10230;
-
-<br>
-
-**46. Definitions**
-
-&#10230;
-
-<br>
-
-**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-
-<br>
-
-**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-
-<br>
-
-**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-
-<br>
-
-**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-
-<br>
-
-**51. Estimating the mean**
-
-&#10230;
-
-<br>
-
-**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
-
-&#10230;
-
-<br>
-
-**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-
-<br>
-
-**55. Estimating the variance**
-
-&#10230;
-
-<br>
-
-**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-
-<br>
-
-**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-
-<br>
-
-**59. [Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-
-<br>
-
-**60. [Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-
-<br>
-
-**61. [Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-
-<br>
-
-**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-
-<br>
-
-**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-
-<br>
-
-**64. [Parameter estimation, Mean, Variance]**
-
-&#10230;
diff --git a/tr/cheatsheet-machine-learning-tips-and-tricks.md b/tr/cheatsheet-machine-learning-tips-and-tricks.md
deleted file mode 100644
index 9712297b8..000000000
--- a/tr/cheatsheet-machine-learning-tips-and-tricks.md
+++ /dev/null
@@ -1,285 +0,0 @@
-**1. Machine Learning tips and tricks cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Classification metrics**
-
-&#10230;
-
-<br>
-
-**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
-
-&#10230;
-
-<br>
-
-**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**5. [Predicted class, Actual class]**
-
-&#10230;
-
-<br>
-
-**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
-
-&#10230;
-
-<br>
-
-**7. [Metric, Formula, Interpretation]**
-
-&#10230;
-
-<br>
-
-**8. Overall performance of model**
-
-&#10230;
-
-<br>
-
-**9. How accurate the positive predictions are**
-
-&#10230;
-
-<br>
-
-**10. Coverage of actual positive sample**
-
-&#10230;
-
-<br>
-
-**11. Coverage of actual negative sample**
-
-&#10230;
-
-<br>
-
-**12. Hybrid metric useful for unbalanced classes**
-
-&#10230;
-
-<br>
-
-**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**14. [Metric, Formula, Equivalent]**
-
-&#10230;
-
-<br>
-
-**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
-
-&#10230;
-
-<br>
-
-**16. [Actual, Predicted]**
-
-&#10230;
-
-<br>
-
-**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
-
-&#10230;
-
-<br>
-
-**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
-
-&#10230;
-
-<br>
-
-**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
-
-&#10230;
-
-<br>
-
-**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
-
-&#10230;
-
-<br>
-
-**22. Model selection**
-
-&#10230;
-
-<br>
-
-**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
-
-&#10230;
-
-<br>
-
-**24. [Training set, Validation set, Testing set]**
-
-&#10230;
-
-<br>
-
-**25. [Model is trained, Model is assessed, Model gives predictions]**
-
-&#10230;
-
-<br>
-
-**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
-
-&#10230;
-
-<br>
-
-**27. [Also called hold-out or development set, Unseen data]**
-
-&#10230;
-
-<br>
-
-**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
-
-&#10230;
-
-<br>
-
-**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
-
-&#10230;
-
-<br>
-
-**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
-
-&#10230;
-
-<br>
-
-**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
-
-&#10230;
-
-<br>
-
-**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
-
-&#10230;
-
-<br>
-
-**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
-
-&#10230;
-
-<br>
-
-**35. Diagnostics**
-
-&#10230;
-
-<br>
-
-**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
-
-&#10230;
-
-<br>
-
-**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
-
-&#10230;
-
-<br>
-
-**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
-
-&#10230;
-
-<br>
-
-**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
-
-&#10230;
-
-<br>
-
-**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
-
-&#10230;
-
-<br>
-
-**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
-
-&#10230;
-
-<br>
-
-**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
-
-&#10230;
-
-<br>
-
-**44. Regression metrics**
-
-&#10230;
-
-<br>
-
-**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-**47. [Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/tr/cheatsheet-supervised-learning.md b/tr/cheatsheet-supervised-learning.md
deleted file mode 100644
index a6b19ea1c..000000000
--- a/tr/cheatsheet-supervised-learning.md
+++ /dev/null
@@ -1,567 +0,0 @@
-**1. Supervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Supervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
-
-&#10230;
-
-<br>
-
-**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**5. [Regression, Classifier, Outcome, Examples]**
-
-&#10230;
-
-<br>
-
-**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**7. Type of model ― The different models are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
-
-&#10230;
-
-<br>
-
-**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**10. Notations and general concepts**
-
-&#10230;
-
-<br>
-
-**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
-
-&#10230;
-
-<br>
-
-**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
-
-&#10230;
-
-<br>
-
-**14. [Linear regression, Logistic regression, SVM, Neural Network]**
-
-&#10230;
-
-<br>
-
-**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
-
-&#10230;
-
-<br>
-
-**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
-
-&#10230;
-
-<br>
-
-**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
-
-&#10230;
-
-<br>
-
-**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
-
-&#10230;
-
-<br>
-
-**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
-
-&#10230;
-
-<br>
-
-**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
-
-&#10230;
-
-<br>
-
-**21. Linear models**
-
-&#10230;
-
-<br>
-
-**22. Linear regression**
-
-&#10230;
-
-<br>
-
-**23. We assume here that y|x;θ∼N(μ,σ2)**
-
-&#10230;
-
-<br>
-
-**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
-
-&#10230;
-
-<br>
-
-**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
-
-&#10230;
-
-<br>
-
-**26. Remark: the update rule is a particular case of the gradient ascent.**
-
-&#10230;
-
-<br>
-
-**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
-
-&#10230;
-
-<br>
-
-**28. Classification and logistic regression**
-
-&#10230;
-
-<br>
-
-**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
-
-&#10230;
-
-<br>
-
-**31. Remark: there is no closed form solution for the case of logistic regressions.**
-
-&#10230;
-
-<br>
-
-**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
-
-&#10230;
-
-<br>
-
-**33. Generalized Linear Models**
-
-&#10230;
-
-<br>
-
-**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
-
-&#10230;
-
-<br>
-
-**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
-
-&#10230;
-
-<br>
-
-**36. Here are the most common exponential distributions summed up in the following table:**
-
-&#10230;
-
-<br>
-
-**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
-
-&#10230;
-
-<br>
-
-**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
-
-&#10230;
-
-<br>
-
-**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
-
-&#10230;
-
-<br>
-
-**40. Support Vector Machines**
-
-&#10230;
-
-<br>
-
-**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
-
-&#10230;
-
-<br>
-
-**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
-
-&#10230;
-
-<br>
-
-**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
-
-&#10230;
-
-<br>
-
-**44. such that**
-
-&#10230;
-
-<br>
-
-**45. support vectors**
-
-&#10230;
-
-<br>
-
-**46. Remark: the line is defined as wTx−b=0.**
-
-&#10230;
-
-<br>
-
-**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
-
-&#10230;
-
-<br>
-
-**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
-
-&#10230;
-
-<br>
-
-**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
-
-&#10230;
-
-<br>
-
-**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
-
-&#10230;
-
-<br>
-
-**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the coefficients βi are called the Lagrange multipliers.**
-
-&#10230;
-
-<br>
-
-**54. Generative Learning**
-
-&#10230;
-
-<br>
-
-**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
-
-&#10230;
-
-<br>
-
-**56. Gaussian Discriminant Analysis**
-
-&#10230;
-
-<br>
-
-**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
-
-&#10230;
-
-<br>
-
-**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
-
-&#10230;
-
-<br>
-
-**59. Naive Bayes**
-
-&#10230;
-
-<br>
-
-**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
-
-&#10230;
-
-<br>
-
-**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
-
-&#10230;
-
-<br>
-
-**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
-
-&#10230;
-
-<br>
-
-**63. Tree-based and ensemble methods**
-
-&#10230;
-
-<br>
-
-**64. These methods can be used for both regression and classification problems.**
-
-&#10230;
-
-<br>
-
-**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
-
-&#10230;
-
-<br>
-
-**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
-
-&#10230;
-
-<br>
-
-**67. Remark: random forests are a type of ensemble methods.**
-
-&#10230;
-
-<br>
-
-**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**69. [Adaptive boosting, Gradient boosting]**
-
-&#10230;
-
-<br>
-
-**70. High weights are put on errors to improve at the next boosting step**
-
-&#10230;
-
-<br>
-
-**71. Weak learners trained on remaining errors**
-
-&#10230;
-
-<br>
-
-**72. Other non-parametric approaches**
-
-&#10230;
-
-<br>
-
-**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
-
-&#10230;
-
-<br>
-
-**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
-
-&#10230;
-
-<br>
-
-**75. Learning Theory**
-
-&#10230;
-
-<br>
-
-**76. Union bound ― Let A1,...,Ak be k events. We have:**
-
-&#10230;
-
-<br>
-
-**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
-
-&#10230;
-
-<br>
-
-**78. Remark: this inequality is also known as the Chernoff bound.**
-
-&#10230;
-
-<br>
-
-**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
-
-&#10230;
-
-<br>
-
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
-
-&#10230;
-
-<br>
-
-**81: the training and testing sets follow the same distribution **
-
-&#10230;
-
-<br>
-
-**82. the training examples are drawn independently**
-
-&#10230;
-
-<br>
-
-**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
-
-&#10230;
-
-<br>
-
-**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
-
-&#10230;
-
-<br>
-
-**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
-
-&#10230;
-
-<br>
-
-**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
-
-&#10230;
-
-<br>
-
-**88. [Introduction, Type of prediction, Type of model]**
-
-&#10230;
-
-<br>
-
-**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
-
-&#10230;
-
-<br>
-
-**90. [Linear models, linear regression, logistic regression, generalized linear models]**
-
-&#10230;
-
-<br>
-
-**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
-
-&#10230;
-
-<br>
-
-**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
-
-&#10230;
-
-<br>
-
-**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
-
-&#10230;
-
-<br>
-
-**94. [Other methods, k-NN]**
-
-&#10230;
-
-<br>
-
-**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
-
-&#10230;
diff --git a/tr/cheatsheet-unsupervised-learning.md b/tr/cheatsheet-unsupervised-learning.md
deleted file mode 100644
index 5eae29ed8..000000000
--- a/tr/cheatsheet-unsupervised-learning.md
+++ /dev/null
@@ -1,340 +0,0 @@
-**1. Unsupervised Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Unsupervised Learning**
-
-&#10230;
-
-<br>
-
-**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
-
-&#10230;
-
-<br>
-
-**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
-
-&#10230;
-
-<br>
-
-**5. Clustering**
-
-&#10230;
-
-<br>
-
-**6. Expectation-Maximization**
-
-&#10230;
-
-<br>
-
-**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
-
-&#10230;
-
-<br>
-
-**8. [Setting, Latent variable z, Comments]**
-
-&#10230;
-
-<br>
-
-**9. [Mixture of k Gaussians, Factor analysis]**
-
-&#10230;
-
-<br>
-
-**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
-
-&#10230;
-
-<br>
-
-**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
-
-&#10230;
-
-<br>
-
-**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
-
-&#10230;
-
-<br>
-
-**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
-
-&#10230;
-
-<br>
-
-**14. k-means clustering**
-
-&#10230;
-
-<br>
-
-**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
-
-&#10230;
-
-<br>
-
-**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
-
-&#10230;
-
-<br>
-
-**17. [Means initialization, Cluster assignment, Means update, Convergence]**
-
-&#10230;
-
-<br>
-
-**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
-
-&#10230;
-
-<br>
-
-**19. Hierarchical clustering**
-
-&#10230;
-
-<br>
-
-**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
-
-&#10230;
-
-<br>
-
-**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
-
-&#10230;
-
-<br>
-
-**22. [Ward linkage, Average linkage, Complete linkage]**
-
-&#10230;
-
-<br>
-
-**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
-
-&#10230;
-
-<br>
-
-**24. Clustering assessment metrics**
-
-&#10230;
-
-<br>
-
-**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
-
-&#10230;
-
-<br>
-
-**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
-
-&#10230;
-
-<br>
-
-**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
-
-&#10230;
-
-<br>
-
-**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Dimension reduction**
-
-&#10230;
-
-<br>
-
-**30. Principal component analysis**
-
-&#10230;
-
-<br>
-
-**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
-
-&#10230;
-
-<br>
-
-**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
-
-&#10230;
-
-<br>
-
-**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
-
-&#10230;
-
-<br>
-
-**34. diagonal**
-
-&#10230;
-
-<br>
-
-**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
-
-&#10230;
-
-<br>
-
-**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
-dimensions by maximizing the variance of the data as follows:**
-
-&#10230;
-
-<br>
-
-**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
-
-&#10230;
-
-<br>
-
-**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
-
-&#10230;
-
-<br>
-
-**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
-
-&#10230;
-
-<br>
-
-**40. Step 4: Project the data on spanR(u1,...,uk).**
-
-&#10230;
-
-<br>
-
-**41. This procedure maximizes the variance among all k-dimensional spaces.**
-
-&#10230;
-
-<br>
-
-**42. [Data in feature space, Find principal components, Data in principal components space]**
-
-&#10230;
-
-<br>
-
-**43. Independent component analysis**
-
-&#10230;
-
-<br>
-
-**44. It is a technique meant to find the underlying generating sources.**
-
-&#10230;
-
-<br>
-
-**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
-
-&#10230;
-
-<br>
-
-**46. The goal is to find the unmixing matrix W=A−1.**
-
-&#10230;
-
-<br>
-
-**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
-
-&#10230;
-
-<br>
-
-**48. Write the probability of x=As=W−1s as:**
-
-&#10230;
-
-<br>
-
-**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
-
-&#10230;
-
-<br>
-
-**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
-
-&#10230;
-
-<br>
-
-**51. The Machine Learning cheatsheets are now available in Turkish.**
-
-&#10230;
-
-<br>
-
-**52. Original authors**
-
-&#10230;
-
-<br>
-
-**53. Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**54. Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-**55. [Introduction, Motivation, Jensen's inequality]**
-
-&#10230;
-
-<br>
-
-**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
-
-&#10230;
-
-<br>
-
-**57. [Dimension reduction, PCA, ICA]**
-
-&#10230;
diff --git a/tr/cs-221-logic-models.md b/tr/cs-221-logic-models.md
new file mode 100644
index 000000000..23476dd86
--- /dev/null
+++ b/tr/cs-221-logic-models.md
@@ -0,0 +1,462 @@
+**Logic-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-logic-models)
+
+<br>
+
+**1. Logic-based models with propositional and first-order logic**
+
+&#10230; Önermeli ve birinci dereceden mantık (Lojik) temelli modeller
+
+<br>
+
+
+**2. Basics**
+
+&#10230; Temeller
+
+<br>
+
+
+**3. Syntax of propositional logic ― By noting f,g formulas, and ¬,∧,∨,→,↔ connectives, we can write the following logical expressions:**
+
+&#10230; Önerme mantığının sözdizimi ― f, g formülleri ve ¬,∧,∨,→,↔ bağlayıcılarını belirterek, aşağıdaki mantıksal ifadeleri yazabiliriz:
+
+<br>
+
+
+**4. [Name, Symbol, Meaning, Illustration]**
+
+&#10230; [Ad, Sembol, Anlamı, Gösterim]
+
+<br>
+
+
+**5. [Affirmation, Negation, Conjunction, Disjunction, Implication, Biconditional]**
+
+&#10230; [Doğrulama, Dışlayan, Kesişim, Birleşim, Implication, İki koşullu]
+
+<br>
+
+
+**6. [not f, f and g, f or g, if f then g, f, that is to say g]**
+
+&#10230;  [f değil, f ve g, f veya g, eğer f'den g çıkarsa, f, f ve g'nin ortak olduğu bölge]
+
+<br>
+
+
+**7. Remark: formulas can be built up recursively out of these connectives.**
+
+&#10230; Not: Bu bağlantılar dışında tekrarlayan formüller oluşturulabilir.
+
+<br>
+
+
+**8. Model ― A model w denotes an assignment of binary weights to propositional symbols.**
+
+&#10230; Model - w modeli, ikili sembollerin önermeli sembollere atanmasını belirtir.
+
+<br>
+
+
+**9. Example: the set of truth values w={A:0,B:1,C:0} is one possible model to the propositional symbols A, B and C.**
+
+&#10230; Örnek: w = {A: 0, B: 1, C: 0} doğruluk değerleri kümesi, A, B ve C önermeli semboller için olası bir modeldir.
+
+<br>
+
+
+**10. Interpretation function ― The interpretation function I(f,w) outputs whether model w satisfies formula f:**
+
+&#10230; Yorumlama fonksiyonu ― Yorumlama fonksiyonu I(f,w), w modelinin f formülüne uygun olup olmadığını gösterir:
+
+<br>
+
+
+**11. Set of models ― M(f) denotes the set of models w that satisfy formula f. Mathematically speaking, we define it as follows:**
+
+&#10230; Modellerin seti ― M(f), f formülünü sağlayan model setini belirtir. Matematiksel konuşursak, şöyle tanımlarız:
+
+<br>
+
+
+**12. Knowledge base**
+
+&#10230; Bilgi temelli
+
+<br>
+
+
+**13. Definition ― The knowledge base KB is the conjunction of all formulas that have been considered so far. The set of models of the knowledge base is the intersection of the set of models that satisfy each formula. In other words:**
+
+&#10230; Tanım ― Bilgi temeli (KB-Knowledgde Base), şu ana kadar düşünülen tüm formüllerin birleşimidir. Bilgi temelinin model kümesi, her formülü karşılayan model dizisinin kesişimidir. Diğer bir deyişle:
+
+<br>
+
+
+**14. Probabilistic interpretation ― The probability that query f is evaluated to 1 can be seen as the proportion of models w of the knowledge base KB that satisfy f, i.e.:**
+
+&#10230; Olasılıksal yorumlama ― f sorgusunun 1 olarak değerlendirilmesi olasılığı, f'yi sağlayan bilgi temeli KB'nin w modellerinin oranı olarak görülebilir, yani:
+
+<br>
+
+
+**15. Satisfiability ― The knowledge base KB is said to be satisfiable if at least one model w satisfies all its constraints. In other words:**
+
+&#10230; Gerçeklenebilirlik ― En az bir modelin tüm kısıtlamaları yerine getirmesi durumunda KB'nin bilgi temelinin gerçeklenebilir olduğu söylenir. Diğer bir deyişle:
+
+<br>
+
+
+**16. satisfiable**
+
+&#10230; Karşılanabilirlik
+
+<br>
+
+
+**17. Remark: M(KB) denotes the set of models compatible with all the constraints of the knowledge base.**
+
+&#10230; Not: M(KB), bilgi temelinin tüm kısıtları ile uyumlu model kümesini belirtir.
+
+<br>
+
+
+**18. Relation between formulas and knowledge base - We define the following properties between the knowledge base KB and a new formula f:**
+
+&#10230; Formüller ve bilgi temeli arasındaki ilişki - Bilgi temeli KB ile yeni bir formül f arasında aşağıdaki özellikleri tanımlarız:
+
+<br>
+
+
+**19. [Name, Mathematical formulation, Illustration, Notes]**
+
+&#10230; [Adı, Matematiksel formülü, Gösterim, Notlar]
+
+<br>
+
+
+**20. [KB entails f, KB contradicts f, f contingent to KB]**
+ 
+&#10230; [KB f içerir, KB f içermez, f koşullu KB]
+
+<br>
+
+
+**21. [f does not bring any new information, Also written KB⊨f, No model satisfies the constraints after adding f, Equivalent to KB⊨¬f, f does not contradict KB, f adds a non-trivial amount of information to KB]**
+
+&#10230; [f yeni bir bilgi getirmiyor, Ayrıca KB⊨f yazıyor, Hiçbir model f ekledikten sonra kısıtlamaları yerine getirmiyor, f KB'ye eşdeğer, f KB'ye aykırı değil, f KB'ye önemsiz miktarda bilgi ekliyor]
+
+<br>
+
+
+**22. Model checking ― A model checking algorithm takes as input a knowledge base KB and outputs whether it is satisfiable or not.**
+
+&#10230; Model denetimi - Bir model denetimi algoritması, KB'nin bilgi temelini girdi olarak alır ve bunun gerçeklenebilir/karşılanabilir olup olmadığını çıkarır.
+
+<br>
+
+
+**23. Remark: popular model checking algorithms include DPLL and WalkSat.**
+
+&#10230; Not: popüler model kontrol algoritmaları DPLL ve WalkSat'ı içerir.
+
+<br>
+
+
+**24. Inference rule ― An inference rule of premises f1,...,fk and conclusion g is written:**
+
+&#10230; Çıkarım kuralı - f1, ..., fk ve sonuç g yapısının çıkarım kuralı şöyle yazılmıştır:
+
+<br>
+
+
+**25. Forward inference algorithm ― From a set of inference rules Rules, this algorithm goes through all possible f1,...,fk and adds g to the knowledge base KB if a matching rule exists. This process is repeated until no more additions can be made to KB.**
+
+&#10230; İleri çıkarım algoritması - Çıkarım kurallarından Kurallar, bu algoritma mümkün olan tüm f1, ..., fk'den geçer ve eşleşen bir kural varsa, KB bilgi tabanına g ekler. Bu işlem KB'ye daha fazla ekleme yapılamayana kadar tekrar edilir.
+
+<br>
+
+
+**26. Derivation ― We say that KB derives f (written KB⊢f) with rules Rules if f already is in KB or gets added during the forward inference algorithm using the set of rules Rules.**
+
+&#10230; Türetme - f'nin KB içerisindeyse veya kurallar kurallarını kullanarak ileri çıkarım algoritması sırasında eklenmişse, KB'nin kurallar ile f (KB⊢f yazılır) türettiğini söylüyoruz.
+
+<br>
+
+
+**27. Properties of inference rules ― A set of inference rules Rules can have the following properties:**
+
+&#10230; Çıkarım kurallarının özellikleri - Çıkarım kurallarının kümesi Kurallar aşağıdaki özelliklere sahip olabilir:
+
+<br>
+
+
+**28. [Name, Mathematical formulation, Notes]**
+
+&#10230; [Adı, Matematiksel formülü, Notlar]
+
+<br>
+
+
+**29. [Soundness, Completeness]**
+
+&#10230; [Sağlamlık, Tamlık]
+
+<br>
+
+
+**30. [Inferred formulas are entailed by KB, Can be checked one rule at a time, "Nothing but the truth", Formulas entailing KB are either already in the knowledge base or inferred from it, "The whole truth"]**
+
+&#10230; [Çıkarılan formüller KB tarafından sağlanmıştır, Her defasında bir kural kontrol edilebilir, ya KB'yi içeren Formüller ya bilgi tabanında zaten vardır "Gerçeğinden başka bir şey yok", ya da ondan çıkarılan "Tüm gerçek" değerlerdir]
+
+<br>
+
+
+**31. Propositional logic**
+
+&#10230; Önerme mantığı
+
+<br>
+
+
+**32. In this section, we will go through logic-based models that use logical formulas and inference rules. The idea here is to balance expressivity and computational efficiency.**
+
+&#10230; Bu bölümde, mantıksal formülleri ve çıkarım kurallarını kullanan mantık tabanlı modelleri inceleyeceğiz. Buradaki fikir ifade ve hesaplamanın verimliliğini dengelemektir.
+
+<br>
+
+
+**33. Horn clause ― By noting p1,...,pk and q propositional symbols, a Horn clause has the form:**
+
+&#10230; Horn cümlesi ― p1, ..., pk ve q önerme sembollerini not ederek, bir Horn cümlesi şu şekildedir (Matematiksel mantık ve mantık programlamada, kural gibi özel bir biçime sahip mantıksal formüllere Horn cümlesi denir.): 
+
+<br>
+
+
+**34. Remark: when q=false, it is called a "goal clause", otherwise we denote it as a "definite clause".**
+
+&#10230; Not: q = false olduğunda, "hedeflenen bir cümle" olarak adlandırılır, aksi takdirde "kesin bir cümle" olarak adlandırırız
+
+<br>
+
+
+**35. Modus ponens ― For propositional symbols f1,...,fk and p, the modus ponens rule is written:**
+
+&#10230; Modus ponens - f1, ..., fk ve p önermeli semboller için modus ponens kuralı yazılır (Modus ponens: Önerme mantığında, modus ponens bir çıkarım kuralıdır. "P, Q anlamına gelir ve P'nin doğru olduğu iddia edilir, bu yüzden Q doğru olmalı" şeklinde özetlenebilir. Modus ponens, başka bir geçerli argüman biçimi olan modus tollens ile yakından ilgilidir.):
+
+<br>
+
+
+**36. Remark: it takes linear time to apply this rule, as each application generate a clause that contains a single propositional symbol.**
+
+&#10230; Not: Her uygulama tek bir önermeli sembol içeren bir cümle oluşturduğundan, bu kuralın uygulanması doğrusal bir zaman alır.
+
+<br>
+
+
+**37. Completeness ― Modus ponens is complete with respect to Horn clauses if we suppose that KB contains only Horn clauses and p is an entailed propositional symbol. Applying modus ponens will then derive p.**
+
+&#10230; Tamlık ― KB'nin sadece Horn cümleleri içerdiğini ve p'nin zorunlu bir teklif sembolü olduğunu varsayalım, Hornus cümlelerine göre Modus ponenleri tamamlanmıştır. Modus ponens uygulanması daha sonra p'yi türetir.
+
+<br>
+
+
+**38. Conjunctive normal form ― A conjunctive normal form (CNF) formula is a conjunction of clauses, where each clause is a disjunction of atomic formulas.**
+
+&#10230; Konjunktif (Birleştirici) normal form - Bir konjonktif normal form (CNF) formülü, her bir cümlenin atomik formüllerin bir ayrıntısı olduğu cümle birleşimidir.
+
+<br>
+
+
+**39. Remark: in other words, CNFs are ∧ of ∨.**
+
+&#10230; Açıklama: başka bir deyişle, CNF'ler ∨ ait ∧ bulunmaktadır.
+
+<br>
+
+
+**40. Equivalent representation ― Every formula in propositional logic can be written into an equivalent CNF formula. The table below presents general conversion properties:**
+
+&#10230; Eşdeğer temsil - Önerme mantığındaki her formül eşdeğer bir CNF formülüne yazılabilir. Aşağıdaki tabloda genel dönüşüm özellikleri gösterilmektedir:
+
+<br>
+
+
+**41. [Rule name, Initial, Converted, Eliminate, Distribute, over]**
+
+&#10230; [Kural adı, Başlangıç, Dönüştürülmüş, Eleme, Dağıtma, üzerine]
+
+<br>
+
+
+**42. Resolution rule ― For propositional symbols f1,...,fn, and g1,...,gm as well as p, the resolution rule is written:**
+
+&#10230; Çözünürlük kuralı - f1, ..., fn ve g1, ..., gm önerme sembolleri için, p, çözümleme kuralı yazılır:
+
+<br>
+
+
+**43. Remark: it can take exponential time to apply this rule, as each application generates a clause that has a subset of the propositional symbols.**
+
+&#10230; Not: Her uygulama, teklif sembollerinin alt kümesine sahip bir cümle oluşturduğundan, bu kuralı uygulamak için üssel olarak zaman alabilir.
+
+<br>
+
+
+**44. [Resolution-based inference ― The resolution-based inference algorithm follows the following steps:, Step 1: Convert all formulas into CNF, Step 2: Repeatedly apply resolution rule, Step 3: Return unsatisfiable if and only if False, is derived]**
+
+&#10230; [Çözünürlük tabanlı çıkarım - Çözünürlük tabanlı çıkarım algoritması, aşağıdaki adımları izler :, Adım 1: Tüm formülleri CNF'ye dönüştürün, Adım 2: Tekrar tekrar, çözünürlük kuralını uygulayın, Adım 3: Yanlışsa türetilmişse tatmin edici olmayan dönüş yapın]
+
+<br>
+
+
+**45. First-order logic**
+
+&#10230; Birinci dereceden mantık
+
+<br>
+
+
+**46. The idea here is to use variables to yield more compact knowledge representations.**
+
+&#10230; Buradaki fikir, daha kompakt bilgi sunumları sağlamak için değişkenleri kullanmaktır.
+
+<br>
+
+
+**47. [Model ― A model w in first-order logic maps:, constant symbols to objects, predicate symbols to tuple of objects]**
+
+&#10230; [Model ― Birinci mertebeden mantık haritalarında bir w modeli :, nesnelere sabit semboller, nesnelerin dizisini sembolize etmek için tahmin]
+
+<br>
+
+
+**48. Horn clause ― By noting x1,...,xn variables and a1,...,ak,b atomic formulas, the first-order logic version of a horn clause has the form:**
+
+&#10230; Horn cümlesi - x1, ..., xn değişkenleri ve a1, ..., ak, b atomik formüllerine dikkat çekerek, bir boynuz maddesinin birinci derece mantık versiyonu aşağıdaki şekildedir:
+
+<br>
+
+
+**49. Substitution ― A substitution θ maps variables to terms and Subst[θ,f] denotes the result of substitution θ on f.**
+
+&#10230; Yer değiştirme - Bir yerdeğiştirme değişkenleri terimlerle eşler ve Subst[θ,f] yerdeğiştirme sonucunu f olarak belirtir.
+
+<br>
+
+
+**50. Unification ― Unification takes two formulas f and g and returns the most general substitution θ that makes them equal:**
+
+&#10230; Birleştirme ― Birleştirme f ve g'nin iki formülünü alır ve onları eşit yapan en genel ikameyi θ verir:
+
+<br>
+
+
+**51. such that**
+
+&#10230; öyle ki
+
+<br>
+
+
+**52. Note: Unify[f,g] returns Fail if no such θ exists.**
+
+&#10230; Not: Unify[f,g], eğer böyle bir θ yoksa Fail döndürür.
+
+<br>
+
+
+**53. Modus ponens ― By noting x1,...,xn variables, a1,...,ak and a′1,...,a′k atomic formulas and by calling θ=Unify(a′1∧...∧a′k,a1∧...∧ak) the first-order logic version of modus ponens can be written:**
+
+&#10230; Modus ponens ― x1, ..., xn değişkenleri, a1, ..., ak ve a′1, ..., a′k atomik formüllerine dikkat ederek ve θ=Unify(a′1∧...∧a′k,a1∧...∧ak) modus ponenlerin birinci dereceden mantık versiyonu yazılabilir:
+
+<br>
+
+
+**54. Completeness ― Modus ponens is complete for first-order logic with only Horn clauses.**
+
+&#10230; Tamlık - Modus ponens sadece Horn cümleleriyle birinci dereceden mantık için tamamlanmıştır.
+
+<br>
+
+
+**55. Resolution rule ― By noting f1,...,fn, g1,...,gm, p, q formulas and by calling θ=Unify(p,q), the first-order logic version of the resolution rule can be written:**
+
+&#10230; Çözünürlük kuralı ― f1,...,fn,g1,...,gm, p, q formüllerini not ederek ve θ=Unify(p,q) ifadesini kullanarak, çözümleme kuralının birinci dereceden mantık sürümü yazılabilir. :
+
+<br>
+
+
+**56. [Semi-decidability ― First-order logic, even restricted to only Horn clauses, is semi-decidable., if KB⊨f, forward inference on complete inference rules will prove f in finite time, if KB⊭f, no algorithm can show this in finite time]**
+
+&#10230; Yarı-karar verilebilirlik ― Birinci dereceden mantık, sadece Horn cümleleriyle sınırlı olsa bile,  yarı karar verilebilir eğer KB⊨f ise f sonsuz zamanlıdır. KB⊭f ise sonsuz zamanlı olabilirliği gösteren algoritma yoktur.
+
+<br>
+
+
+**57. [Basics, Notations, Model, Interpretation function, Set of models]**
+
+&#10230; [Temeller, Notasyon, Model, Yorumlama fonksiyonu, Modellerin kümesi]
+
+<br>
+
+
+**58. [Knowledge base, Definition, Probabilistic interpretation, Satisfiability, Relationship with formulas, Forward inference, Rule properties]**
+
+&#10230;  [Bilgi temeli, Tanım, Olasılıksal yorumlama, Gerçeklenebilirlik, Formüllerle İlişki, İleri çıkarım, Kural özellikleri]
+
+<br>
+
+
+**59. [Propositional logic, Clauses, Modus ponens, Conjunctive normal form, Representation equivalence, Resolution]**
+
+&#10230; [Önerme mantığı, Cümleler, Modus ponens, Eşlenik (Conjunctive) normal form, Temsil eşdeğeri, Çözüm]
+
+<br>
+
+
+**60. [First-order logic, Substitution, Unification, Resolution rule, Modus ponens, Resolution, Semi-decidability]**
+
+&#10230; [Birinci derece mantık, Değiştirme, Birleştirme, Çözünürlük kuralı, Modus ponens, Çözünürlük, Yarı-karar verilebilirlik]
+
+<br>
+
+
+**61. View PDF version on GitHub**
+
+&#10230; GitHub'da PDF sürümünü görüntüleyin
+
+<br>
+
+
+**62. Original authors**
+
+&#10230; Orijinal yazarlar
+
+<br>
+
+
+**63. Translated by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından çevrilmiştir
+
+<br>
+
+
+**64. Reviewed by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından gözden geçirilmiştir
+
+<br>
+
+
+**65. By X and Y**
+
+&#10230; X ve Y ile
+
+<br>
+
+
+**66. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230; Yapay Zeka el kitabı şimdi [Türkçe] mevcuttur.
diff --git a/tr/cs-221-reflex-models.md b/tr/cs-221-reflex-models.md
new file mode 100644
index 000000000..e1aea4a79
--- /dev/null
+++ b/tr/cs-221-reflex-models.md
@@ -0,0 +1,538 @@
+**Reflex-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-reflex-models)
+
+<br>
+
+**1. Reflex-based models with Machine Learning**
+
+&#10230; Makine Öğrenmesi ile Refleks-temelli modeller
+
+<br>
+
+
+**2. Linear predictors**
+
+&#10230; Doğrusal öngörücüler
+
+<br>
+
+
+**3. In this section, we will go through reflex-based models that can improve with experience, by going through samples that have input-output pairs.**
+
+&#10230; Bu bölümde, girdi-çıktı çiftleri olan örneklerden geçerek, deneyim ile gelişebilecek refleks-temelli modelleri göreceğiz.
+
+<br>
+
+
+**4. Feature vector ― The feature vector of an input x is noted ϕ(x) and is such that:**
+
+&#10230; Öznitelik vektörü ― x girişinin öznitelik vektörü ϕ (x) olarak not edilir ve şöyledir:
+
+<br>
+
+
+**5. Score ― The score s(x,w) of an example (ϕ(x),y)∈Rd×R associated to a linear model of weights w∈Rd is given by the inner product:**
+
+&#10230; Puan - Bir örneğin s(x, w)si ni (ϕ(x),y))∈Rd×R, w∈Rd doğrusal ağırlık modeline bağlı olarak:
+
+<br>
+
+
+**6. Classification**
+
+&#10230; Sınıflandırma
+
+<br>
+
+
+**7. Linear classifier ― Given a weight vector w∈Rd and a feature vector ϕ(x)∈Rd, the binary linear classifier fw is given by:**
+
+&#10230; Doğrusal sınıflandırıcı - Bir ağırlık vektörü w∈Rd ve bir öznitelik vektörü ϕ(x)∈Rd verildiğinde, ikili doğrusal sınıflandırıcı fw şöyle verilir:
+
+<br>
+
+
+**8. if**
+
+&#10230;
+
+<br> Eğer
+
+
+**9. Margin ― The margin m(x,y,w)∈R of an example (ϕ(x),y)∈Rd×{−1,+1} associated to a linear model of weights w∈Rd quantifies the confidence of the prediction: larger values are better. It is given by:**
+
+&#10230; Marj ― (ϕ(x),y)∈Rd×{−1,+1} örneğinin m(x,y,w)∈R marjları w∈Rd doğrusal ağırlık modeliyle ilişkili olarak, tahminin güvenirliği ölçülür: daha büyük değerler daha iyidir. Şöyle ifade edilir:
+
+<br>
+
+
+**10. Regression**
+
+&#10230; Bağlanım (Regression)
+
+<br>
+
+
+**11. Linear regression ― Given a weight vector w∈Rd and a feature vector ϕ(x)∈Rd, the output of a linear regression of weights w denoted as fw is given by:**
+
+&#10230; Doğrusal bağlanım (Linear regression) - w∈Rd bir ağırlık vektörü ve bir öznitelik vektörü ϕ(x)∈Rd verildiğinde, fw olarak belirtilen ağırlıkların doğrusal bir bağlanım" çıktısı şöyle verilir:
+
+<br>
+
+
+**12. Residual ― The residual res(x,y,w)∈R is defined as being the amount by which the prediction fw(x) overshoots the target y:**
+
+&#10230; Artık (Residual) - Artık res(x,y,w)∈R, fw(x) tahmininin y hedefini aştığı miktar olarak tanımlanır:
+
+<br>
+
+
+**13. Loss minimization**
+
+&#10230; Kayıp/Yitim minimizasyonu
+
+<br>
+
+
+**14. Loss function ― A loss function Loss(x,y,w) quantifies how unhappy we are with the weights w of the model in the prediction task of output y from input x. It is a quantity we want to minimize during the training process.**
+
+&#10230; Kayıp fonksiyonu - Kayıp fonksiyonu Loss(x,y,w), x girişinden y çıktısının öngörme görevindeki model ağırlıkları ile ne kadar mutsuz olduğumuzu belirler. Bu değer eğitim sürecinde en aza indirmek istediğimiz bir miktar.
+
+<br>
+
+
+**15. Classification case - The classification of a sample x of true label y∈{−1,+1} with a linear model of weights w can be done with the predictor fw(x)≜sign(s(x,w)). In this situation, a metric of interest quantifying the quality of the classification is given by the margin m(x,y,w), and can be used with the following loss functions:**
+
+&#10230; Sınıflandırma durumu - Doğru etiket y∈{−1,+1} değerinin x örneğinin doğrusal ağırlık w modeliyle sınıflandırılması fw(x)≜sign(s(x,w)) belirleyicisi ile yapılabilir. Bu durumda, sınıflandırma kalitesini ölçen bir fayda ölçütü m(x,y,w) marjı ile verilir ve aşağıdaki kayıp fonksiyonlarıyla birlikte kullanılabilir:
+
+<br>
+
+
+**16. [Name, Illustration, Zero-one loss, Hinge loss, Logistic loss]**
+ 
+&#10230; [Ad, Örnekleme, Sıfır-bir kayıp, Menteşe kaybı, Lojistik kaybı]
+
+<br>
+
+
+**17. Regression case - The prediction of a sample x of true label y∈R with a linear model of weights w can be done with the predictor fw(x)≜s(x,w). In this situation, a metric of interest quantifying the quality of the regression is given by the margin res(x,y,w) and can be used with the following loss functions:**
+
+&#10230; Regresyon durumu - Doğru etiket y∈R değerinin x örneğinin bir doğrusal ağırlık modeli w ile öngörülmesi fw(x)≜s(x,w) öngörüsü ile yapılabilir. Bu durumda, regresyonun kalitesini ölçen bir fayda ölçütü res(x,y,w) marjı ile verilir ve aşağıdaki kayıp fonksiyonlarıyla birlikte kullanılabilir:
+
+<br>
+
+
+**18. [Name, Squared loss, Absolute deviation loss, Illustration]**
+
+&#10230; [Ad, Kareler kaybı, Mutlak sapma kaybı, Görselleştirme]
+
+<br>
+
+
+**19. Loss minimization framework ― In order to train a model, we want to minimize the training loss is defined as follows:**
+
+&#10230; Kayıp minimize etme çerçevesi (framework) - Bir modeli eğitmek için, eğitim kaybını en aza indirmek istiyoruz;
+
+<br>
+
+
+**20. Non-linear predictors**
+
+&#10230; Doğrusal olmayan öngörücüler
+
+<br>
+
+
+**21. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
+
+&#10230; k-en yakın komşu - Yaygın olarak k-NN olarak bilinen k-en yakın komşu algoritması, bir veri noktasının tepkisinin eğitim kümesinden k komşularının yapısı tarafından belirlendiği parametrik olmayan bir yaklaşımdır. Hem sınıflandırma hem de regresyon ayarlarında kullanılabilir.
+
+<br>
+
+
+**22. Remark: the higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
+
+&#10230; Not: k parametresi ne kadar yüksekse, önyargı (bias) o kadar yüksek ve k parametresi ne kadar düşükse, varyans o kadar yüksek olur.
+
+<br>
+
+
+**23. Neural networks ― Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks. The vocabulary around neural networks architectures is described in the figure below:**
+
+&#10230; Yapay sinir ağları - Yapay sinir ağları katmanlarla oluşturulmuş bir model sınıfıdır. Yaygın olarak kullanılan sinir ağları, evrişimli ve tekrarlayan sinir ağlarını içerir. Yapay sinir ağları mimarisi etrafındaki kelime bilgisi aşağıdaki şekilde tanımlanmıştır:
+
+<br>
+
+
+**24. [Input layer, Hidden layer, Output layer]**
+
+&#10230; [Giriş katmanı, Gizli katman, Çıkış katmanı]
+
+<br>
+
+
+**25. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
+
+&#10230; i, ağın i. katmanı ve j, katmanın j. gizli birimi olacak şekilde aşağıdaki gibi ifade edilir:
+
+<br>
+
+
+**26. where we note w, b, x, z the weight, bias, input and non-activated output of the neuron respectively.**
+
+&#10230; w, b, x, z değerlerinin sırasıyla nöronun ağırlık, önyargı (bias), girdi ve aktive edilmemiş çıkışını olarak ifade eder.
+
+<br>
+
+
+**27. For a more detailed overview of the concepts above, check out the Supervised Learning cheatsheets!**
+
+&#10230; Yukarıdaki kavramlara daha ayrıntılı bir bakış için, Gözetimli Öğrenme el kitabına göz atın!
+
+<br>
+
+
+**28. Stochastic gradient descent**
+
+&#10230; Stokastik gradyan inişi (Bayır inişi)
+
+<br>
+
+
+**29. Gradient descent ― By noting η∈R the learning rate (also called step size), the update rule for gradient descent is expressed with the learning rate and the loss function Loss(x,y,w) as follows:**
+
+&#10230; Gradyan inişi (Bayır inişi) - η∈R öğrenme oranını (aynı zamanda adım boyutu olarak da bilinir) dikkate alınarak, gradyan inişine ilişkin güncelleme kuralı, öğrenme oranı ve Loss(x,y,w) kayıp fonksiyonu ile aşağıdaki şekilde ifade edilir:
+
+<br>
+
+
+**30. Stochastic updates ― Stochastic gradient descent (SGD) updates the parameters of the model one training example (ϕ(x),y)∈Dtrain at a time. This method leads to sometimes noisy, but fast updates.**
+
+&#10230; Stokastik güncellemeler - Stokastik gradyan inişi (SGİ / SGD), bir seferde bir eğitim örneğinin (ϕ(x),y)∈Değitim parametrelerini günceller. Bu yöntem bazen gürültülü, ancak hızlı güncellemeler yol açar.
+
+<br>
+
+
+**31. Batch updates ― Batch gradient descent (BGD) updates the parameters of the model one batch of examples (e.g. the entire training set) at a time. This method computes stable update directions, at a greater computational cost.**
+
+&#10230; Yığın/küme güncellemeler - Yığın gradyan inişi (YGİ / BGD), bir seferde bir grup örnek (örneğin, tüm eğitim kümesi) parametrelerini günceller. Bu yöntem daha yüksek bir hesaplama maliyetiyle kararlı güncelleme talimatlarını hesaplar.
+
+<br>
+
+
+**32. Fine-tuning models**
+
+&#10230; İnce ayar (Fine-tuning) modelleri
+
+<br>
+
+
+**33. Hypothesis class ― A hypothesis class F is the set of possible predictors with a fixed ϕ(x) and varying w:**
+
+&#10230; Hipotez sınıfı - Bir hipotez sınıfı F, sabit bir ϕ (x) ve değişken w ile olası öngörücü kümesidir:
+
+<br>
+
+
+**34. Logistic function ― The logistic function σ, also called the sigmoid function, is defined as:**
+
+&#10230; Lojistik fonksiyon - Ayrıca sigmoid fonksiyon olarak da adlandırılan lojistik fonksiyon σ, şöyle tanımlanır:
+
+<br>
+
+
+**35. Remark: we have σ′(z)=σ(z)(1−σ(z)).**
+
+&#10230; Not: σ′(z)=σ(z)(1−σ(z)) şeklinde ifade edilir.
+
+<br>
+
+
+**36. Backpropagation ― The forward pass is done through fi, which is the value for the subexpression rooted at i, while the backward pass is done through gi=∂out∂fi and represents how fi influences the output.**
+
+&#10230; Geri yayılım - İleriye geçiş, i'de yer alan alt ifadenin değeri olan fi ile yapılırken, geriye doğru geçiş gi=∂out∂fi aracılığıyla yapılır ve fi'nin çıkışı nasıl etkilediğini gösterir.
+
+<br>
+
+
+**37. Approximation and estimation error ― The approximation error ϵapprox represents how far the entire hypothesis class F is from the target predictor g∗, while the estimation error ϵest quantifies how good the predictor ^f is with respect to the best predictor f∗ of the hypothesis class F.**
+
+&#10230; Yaklaşım ve kestirim hatası - Yaklaşım hatası ϵapprox, F tüm hipotez sınıfının hedef öngörücü g∗ ne kadar uzak olduğunu gösterirken, kestirim hatası ϵest öngörücüsü ^f, F hipotez sınıfının en iyi yordayıcısı f∗'ya göre ne kadar iyi olduğunu gösterir.
+<br>
+
+
+**38. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230; Düzenlileştirme (Regularization) - Düzenlileştirme prosedürü, modelin verilerin aşırı öğrenmesinden kaçınmayı amaçlar ve böylece yüksek değişkenlik sorunlarıyla ilgilenir. Aşağıdaki tablo, yaygın olarak kullanılan düzenlileştirme tekniklerinin farklı türlerini özetlemektedir:
+
+<br>
+
+
+**39. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230; [Katsayıları 0'a düşürür, Değişken seçimi için iyi, Katsayıları daha küçük yapar, Değişken seçimi ile küçük katsayılar arasında ödünleşim]
+
+<br>
+
+
+**40. Hyperparameters ― Hyperparameters are the properties of the learning algorithm, and include features, regularization parameter λ, number of iterations T, step size η, etc.**
+
+&#10230; Hiperparametreler - Hiperparametreler öğrenme algoritmasının özellikleridir ve öznitelikler dahildir, λ normalizasyon parametresi, yineleme sayısı T, adım büyüklüğü η, vb.
+
+<br>
+
+
+**41. Sets vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230; Kümeler - Bir model seçerken, veriyi aşağıdaki gibi 3 farklı parçaya ayırırız:
+
+<br>
+
+
+**42. [Training set, Validation set, Testing set]**
+
+&#10230; [Eğitim kümesi, Doğrulama kümesi, Test kümesi]
+
+<br>
+
+
+**43. [Model is trained, Usually 80% of the dataset, Model is assessed, Usually 20% of the dataset, Also called hold-out or development set, Model gives predictions, Unseen data]**
+
+&#10230; [Model eğitilir, Veri kümesinin genellikle %80'i, Model değerlendirilir, Veri kümesinin genellikle %20'si, Ayrıca tutma veya geliştirme kümesi olarak da adlandırılır, Model tahminlerini verir, Görünmeyen veriler]
+
+<br>
+
+
+**44. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+&#10230; Model seçildikten sonra, tüm veri kümesi üzerinde eğitilir ve görünmeyen test kümesinde test edilir. Bunlar aşağıdaki şekilde gösterilmektedir:
+
+<br>
+
+
+**45. [Dataset, Unseen data, train, validation, test]**
+
+&#10230; [Veri kümesi, Görünmeyen veriler, eğitim, doğrulama, test]
+
+<br>
+
+
+**46. For a more detailed overview of the concepts above, check out the Machine Learning tips and tricks cheatsheets!**
+
+&#10230; Yukarıdaki kavramlara daha ayrıntılı bir bakış için, Makine Öğrenmesi ipuçları ve püf noktaları el kitabını göz atın!
+
+<br>
+
+
+**47. Unsupervised Learning**
+
+&#10230; Gözetimsiz Öğrenme
+
+<br>
+
+
+**48. The class of unsupervised learning methods aims at discovering the structure of the data, which may have of rich latent structures.**
+
+&#10230; Gözetimsiz öğrenme yöntemlerinin sınıfı, zengin gizli yapılara sahip olabilecek verilerin yapısını keşfetmeyi amaçlamaktadır.
+
+<br>
+
+
+**49. k-means**
+
+&#10230; k-ortalama
+
+<br>
+
+
+**50. Clustering ― Given a training set of input points Dtrain, the goal of a clustering algorithm is to assign each point ϕ(xi) to a cluster zi∈{1,...,k}**
+
+&#10230; Kümeleme - Dtrain giriş noktalarından oluşan bir eğitim kümesi göz önüne alındığında, kümeleme algoritmasının amacı, her bir ϕ(xi) noktasını zi∈{1,...,k} kümesine atamaktır.
+
+<br>
+
+
+**51. Objective function ― The loss function for one of the main clustering algorithms, k-means, is given by:**
+
+&#10230; Amaç fonksiyonu - Ana kümeleme algoritmalarından biri olan k-ortalama için kayıp fonksiyonu şöyle ifade edilir:
+ 
+<br>
+
+
+**52. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230; Algoritma - Küme merkezlerini μ1,μ2,...,μk∈Rn kümesini rasgele başlattıktan sonra, k-ortalama algoritması yakınsayana kadar aşağıdaki adımı tekrarlar:
+
+<br>
+
+
+**53. and**
+
+&#10230; ve 
+
+<br>
+
+
+**54. [Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230; [Başlatma anlamına gelir, Kümeleme görevi, Güncelleme, Yakınsama anlamına gelir]
+
+<br>
+
+
+**55. Principal Component Analysis**
+
+&#10230; Temel Bileşenler Analizi
+
+<br>
+
+
+**56. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230; Özdeğer, özvektör - Bir A∈Rn×n matrisi verildiğinde, z∈Rn∖{0} olacak şekilde bir vektör varsa λ, A'nın bir öz değeri olduğu söylenir, aşağıdaki gibi ifade edilir:
+
+<br>
+
+
+**57. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230; Spektral teoremi - A∈Rn×n olsun. A simetrik ise, o zaman A gerçek ortogonal matris U∈Rn×n olacak şekilde köşegenleştirilebilir. Λ=diag(λ1,...,λn) formülü dikkate alınarak aşağıdaki gibi ifade edilir:
+
+<br>
+
+
+**58. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230; Not: En büyük özdeğerle ilişkilendirilen özvektör, A matrisinin temel özvektörüdür.
+
+<br>
+
+
+**59. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:**
+
+&#10230; Algoritma - Temel Bileşenler Analizi (PCA) prosedürü, verilerin varyansını en üst düzeye çıkararak k boyutlarına indirgeyen bir boyut küçültme tekniğidir:
+
+<br>
+
+
+**60. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230; Adım 1: Verileri ortalama 0 ve 1 standart sapma olacak şekilde normalize edin.
+
+<br>
+
+
+**61. [where, and]**
+
+&#10230; [koşul, ve]
+
+<br>
+
+
+**62. [Step 2: Compute Σ=1mm∑i=1ϕ(xi)ϕ(xi)T∈Rn×n, which is symmetric with real eigenvalues., Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues., Step 4: Project the data on spanR(u1,...,uk).]**
+
+&#10230; [Adım 2: Hesaplama Σ=1mm∑i=1ϕ(xi)ϕ(xi)T∈Rn×n, ki bu, gerçek özdeğerlerle simetriktir., Adım 3: Hesaplama u1,...,uk∈Rn k'nin ortogonal ana özvektörleri, yani k en büyük özdeğerlerin ortogonal özvektörleri., Adım 4: spanR(u1,...,uk)'daki verilerin izdüşümünü al.
+
+<br>
+
+
+**63. This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230; Bu prosedür, tüm k boyutlu uzaylar arasındaki farkı en üst düzeye çıkarır.
+
+<br>
+
+
+**64. [Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230; [Öznitelik uzayındaki veriler, Asıl bileşenleri bulma, Asıl bileşenler uzayındaki veriler]
+
+<br>
+
+
+**65. For a more detailed overview of the concepts above, check out the Unsupervised Learning cheatsheets!**
+
+&#10230; Yukarıdaki kavramlara daha ayrıntılı bir genel bakış için, Gözetimsiz Öğrenme el kitaplarına göz atın!
+
+<br>
+
+
+**66. [Linear predictors, Feature vector, Linear classifier/regression, Margin]**
+
+&#10230; [Doğrusal öngörücüler, Öznitelik vektörü, Doğrusal sınıflandırıcı/regresyon, Marj]
+
+<br>
+
+
+**67. [Loss minimization, Loss function, Framework]**
+
+&#10230; [Kayıp minimizasyonu, Kayıp fonksiyonu, Çerçeve (Framework)]
+
+<br>
+
+
+**68. [Non-linear predictors, k-nearest neighbors, Neural networks]**
+
+&#10230; [Doğrusal olmayan öngörücüler, k-en yakın komşular, Yapay sinir ağları]
+
+<br>
+
+
+**69. [Stochastic gradient descent, Gradient, Stochastic updates, Batch updates]**
+
+&#10230; [Stokastik Dereceli Azalma/Bayır İnişi, Gradyan, Stokastik güncellemeler, Yığın/Küme (Batch) güncellemeler]
+
+<br>
+
+
+**70. [Fine-tuning models, Hypothesis class, Backpropagation, Regularization, Sets vocabulary]**
+
+&#10230; [Hassas ayar modeller, Hipotez sınıfı, Geri yayılım, Düzenlileştirme (Regularization), Kelime dizisi]
+
+<br>
+
+
+**71. [Unsupervised Learning, k-means, Principal components analysis]**
+
+&#10230; [Gözetimsiz Öğrenme, k-ortalama, Temel bileşenler analizi]
+
+<br>
+
+
+**72. View PDF version on GitHub**
+
+&#10230; GitHub'da PDF sürümünü görüntüleyin
+
+<br>
+
+
+**73. Original authors**
+
+&#10230; Orijinal yazarlar
+
+<br>
+
+
+**74. Translated by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından çevrilmiştir
+
+<br>
+
+
+**75. Reviewed by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından gözden geçirilmiştir
+
+<br>
+
+
+**76. By X and Y**
+
+&#10230; X ve Y ile
+
+<br>
+
+
+**77. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230; Yapay Zeka el kitabı şimdi [hedef dilde] mevcuttur.
diff --git a/tr/cs-221-states-models.md b/tr/cs-221-states-models.md
new file mode 100644
index 000000000..bceddce2b
--- /dev/null
+++ b/tr/cs-221-states-models.md
@@ -0,0 +1,980 @@
+**States-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-states-models)
+
+<br>
+
+**1. States-based models with search optimization and MDP**
+
+&#10230; Arama optimizasyonu ve Markov karar sürecine (MDP) sahip durum-temelli modeller
+
+<br>
+
+
+**2. Search optimization**
+
+&#10230;  Arama optimizasyonu
+
+<br>
+
+
+**3. In this section, we assume that by accomplishing action a from state s, we deterministically arrive in state Succ(s,a). The goal here is to determine a sequence of actions (a1,a2,a3,a4,...) that starts from an initial state and leads to an end state. In order to solve this kind of problem, our objective will be to find the minimum cost path by using states-based models.**
+
+&#10230; Bu bölümde, s durumunda a eylemini gerçekleştirdiğimizde, Succ(s,a) durumuna varacağımızı varsayıyoruz. Burada amaç, başlangıç durumundan başlayıp bitiş durumuna götüren bir eylem dizisi (a1,a2,a3,a4,...) belirlenmesidir. Bu tür bir problemi çözmek için, amacımız durum-temelli modelleri kullanarak asgari (minimum) maliyet yolunu bulmak olacaktır.
+
+<br>
+
+
+**4. Tree search**
+
+&#10230; Ağaç arama
+
+<br>
+
+
+**5. This category of states-based algorithms explores all possible states and actions. It is quite memory efficient, and is suitable for huge state spaces but the runtime can become exponential in the worst cases.**
+
+&#10230; Bu durum-temelli algoritmalar, olası bütün durum ve eylemleri araştırırlar. Oldukça bellek verimli ve büyük durum uzayları için uygundurlar ancak çalışma zamanı en kötü durumlarda üstel olabilir.
+
+<br>
+
+
+**6. [Self-loop, More than a parent, Cycle, More than a root, Valid tree]**
+
+&#10230; [Kendinden-Döngü(Self-loop), Bir ebeveynden (parent) daha fazlası, Çevrim, Bir kökten daha fazlası, Geçerli ağaç]
+
+<br>
+
+
+**7. [Search problem ― A search problem is defined with:, a starting state sstart, possible actions Actions(s) from state s, action cost Cost(s,a) from state s with action a, successor Succ(s,a) of state s after action a, whether an end state was reached IsEnd(s)]**
+
+&#10230; [Arama problemi ― Bir arama problemi aşağıdaki şekilde tanımlanmaktadır:, bir başlangıç durumu sstart, s durumunda gerçekleşebilecek olası eylemler Actions(s), s durumunda gerçekleşen a eyleminin eylem maliyeti Cost(s,a), a eyleminden sonraki varılacak durum Succ(s,a), son duruma ulaşılıp ulaşılamadığı IsEnd(s)]
+
+<br>
+
+
+**8. The objective is to find a path that minimizes the cost.**
+
+&#10230; Amaç, maliyeti en aza indiren bir yol bulmaktır.
+
+<br>
+
+
+**9. Backtracking search ― Backtracking search is a naive recursive algorithm that tries all possibilities to find the minimum cost path. Here, action costs can be either positive or negative.**
+
+&#10230; Geri izleme araması ― Geri izleme araması, asgari (minimum) maliyet yolunu bulmak için tüm olasılıkları deneyen saf (naive) bir özyinelemeli algoritmadır. Burada, eylem maliyetleri pozitif ya da negatif olabilir.
+
+<br>
+
+
+**10. Breadth-first search (BFS) ― Breadth-first search is a graph search algorithm that does a level-by-level traversal. We can implement it iteratively with the help of a queue that stores at each step future nodes to be visited. For this algorithm, we can assume action costs to be equal to a constant c⩾0.**
+
+&#10230; Genişlik öncelikli arama (Breadth-first search-BFS) ― Genişlik öncelikli arama, seviye seviye arama yapan bir çizge arama algoritmasıdır. Gelecekte her adımda ziyaret edilecek düğümleri tutan bir kuyruk yardımıyla yinelemeli olarak gerçekleyebiliriz. Bu algoritma için, eylem maliyetlerinin belirli bir sabite c⩾0 eşit olduğunu kabul edebiliriz.
+
+<br>
+
+
+**11. Depth-first search (DFS) ― Depth-first search is a search algorithm that traverses a graph by following each path as deep as it can. We can implement it recursively, or iteratively with the help of a stack that stores at each step future nodes to be visited. For this algorithm, action costs are assumed to be equal to 0.**
+
+&#10230; Derinlik öncelikli arama (Depth-first search-DFS) ― Derinlik öncelikli arama, her bir yolu olabildiğince derin bir şekilde takip ederek çizgeyi dolaşan bir arama algoritmasıdır. Bu algoritmayı, ziyaret edilecek gelecek düğümleri her adımda bir yığın yardımıyla saklayarak, yinelemeli (recursively) ya da tekrarlı (iteratively) olarak uygulayabiliriz. Bu algoritma için eylem maliyetlerinin 0 olduğu varsayılmaktadır.
+
+<br>
+
+
+**12. Iterative deepening ― The iterative deepening trick is a modification of the depth-first search algorithm so that it stops after reaching a certain depth, which guarantees optimality when all action costs are equal. Here, we assume that action costs are equal to a constant c⩾0.**
+
+&#10230; Tekrarlı derinleşme ― Tekrarlı derinleşme hilesi, derinlik-ilk arama algoritmasının değiştirilmiş bir halidir, böylece belirli bir derinliğe ulaştıktan sonra durur, bu da tüm işlem maliyetleri eşit olduğunda en iyiliği (optimal) garanti eder. Burada, işlem maliyetlerinin c⩾0 gibi sabit bir değere eşit olduğunu varsayıyoruz.
+
+<br>
+
+
+**13. Tree search algorithms summary ― By noting b the number of actions per state, d the solution depth, and D the maximum depth, we have:**
+
+&#10230; Ağaç arama algoritmaları özeti ― B durum başına eylem sayısını, d çözüm derinliğini ve D en yüksek (maksimum) derinliği ifade ederse, o zaman:
+
+<br>
+
+
+**14. [Algorithm, Action costs, Space, Time]**
+
+&#10230; [Algoritma, Eylem maliyetleri, Arama uzayı, Zaman]
+
+<br>
+
+
+**15. [Backtracking search, any, Breadth-first search, Depth-first search, DFS-Iterative deepening]**
+
+&#10230; [Geri izleme araması, herhangi bir şey, Genişlik öncelikli arama, Derinlik öncelikli arama, DFS - Tekrarlı derinleşme]
+
+<br>
+
+
+**16. Graph search**
+
+&#10230; Çizge arama
+
+<br>
+
+
+**17. This category of states-based algorithms aims at constructing optimal paths, enabling exponential savings. In this section, we will focus on dynamic programming and uniform cost search.**
+
+&#10230; Bu durum-temelli algoritmalar kategorisi, üssel tasarruf sağlayan en iyi (optimal) yolları oluşturmayı amaçlar. Bu bölümde, dinamik programlama ve tek tip maliyet araştırması üzerinde duracağız.
+
+<br>
+
+
+**18. Graph ― A graph is comprised of a set of vertices V (also called nodes) as well as a set of edges E (also called links).**
+
+&#10230; Çizge ― Bir çizge, V köşeler (düğüm olarak da adlandırılır) kümesi ile E kenarlar (bağlantı olarak da adlandırılır) kümesinden oluşur.
+
+<br>
+
+
+**19. Remark: a graph is said to be acylic when there is no cycle.**
+
+&#10230; Not: çevrim olmadığında, bir çizgenin asiklik (çevrimsiz) olduğu söylenir.
+
+<br>
+
+
+**20. State ― A state is a summary of all past actions sufficient to choose future actions optimally.**
+
+&#10230; Durum ― Bir durum gelecekteki eylemleri en iyi (optimal) şekilde seçmek için, yeterli tüm geçmiş eylemlerin özetidir.
+
+<br>
+
+
+**21. Dynamic programming ― Dynamic programming (DP) is a backtracking search algorithm with memoization (i.e. partial results are saved) whose goal is to find a minimum cost path from state s to an end state send. It can potentially have exponential savings compared to traditional graph search algorithms, and has the property to only work for acyclic graphs. For any given state s, the future cost is computed as follows:**
+
+&#10230; Dinamik programlama ― Dinamik programlama (DP), amacı s durumundan bitiş durumu olan send'e kadar asgari(minimum) maliyet yolunu bulmak olan hatırlamalı (memoization) (başka bir deyişle kısmi sonuçlar kaydedilir) bir geri izleme (backtracking) arama algoritmasıdır. Geleneksel çizge arama algoritmalarına kıyasla üstel olarak tasarruf sağlayabilir ve yalnızca asiklik (çevrimsiz) çizgeler ile çalışma özelliğine sahiptir. Herhangi bir durum için gelecekteki maliyet aşağıdaki gibi hesaplanır:
+
+<br>
+
+
+**22. [if, otherwise]**
+
+&#10230; [eğer, aksi taktirde]
+
+<br>
+
+
+**23. Remark: the figure above illustrates a bottom-to-top approach whereas the formula provides the intuition of a top-to-bottom problem resolution.**
+
+&#10230; Not: Yukarıdaki şekil, aşağıdan yukarıya bir yaklaşımı sergilerken, formül ise yukarıdan aşağıya bir önsezi ile problem çözümü sağlar.
+
+<br>
+
+
+**24. Types of states ― The table below presents the terminology when it comes to states in the context of uniform cost search:**
+
+&#10230; Durum türleri ― Tek tip maliyet araştırması bağlamındaki durumlara ilişkin terminoloji aşağıdaki tabloda sunulmaktadır:
+
+<br>
+
+
+**25. [State, Explanation]**
+
+&#10230; [Durum, Açıklama]
+
+<br>
+
+
+**26. [Explored, Frontier, Unexplored]**
+
+&#10230; [Keşfedilmiş, Sırada (Frontier), Keşfedilmemiş]
+
+<br>
+
+
+**27. [States for which the optimal path has already been found, States seen for which we are still figuring out how to get there with the cheapest cost, States not seen yet]**
+
+&#10230; [En iyi (optimal) yolun daha önce bulunduğu durumlar, Görülen ancak hala en ucuza nasıl gidileceği hesaplanmaya çalışılan durumlar, Daha önce görülmeyen durumlar]
+
+<br>
+
+
+**28. Uniform cost search ― Uniform cost search (UCS) is a search algorithm that aims at finding the shortest path from a state sstart to an end state send. It explores states s in increasing order of PastCost(s) and relies on the fact that all action costs are non-negative.**
+
+&#10230; Tek tip maliyet araması ― Tek tip maliyet araması (Uniform cost search - UCS) bir başlangıç durumu olan Sstart, ile bir bitiş durumu olan Send arasındaki en kısa yolu bulmayı amaçlayan bir arama algoritmasıdır. Bu algoritma s durumlarını artan geçmiş maliyetleri olan PastCost(s)'a göre araştırır ve eylem maliyetlerinin negatif olmayacağı kuralına dayanır.
+
+<br>
+
+
+**29. Remark 1: the UCS algorithm is logically equivalent to Dijkstra's algorithm.**
+
+&#10230; Not 1: UCS algoritması mantıksal olarak Dijkstra algoritması ile aynıdır.
+
+<br>
+
+
+**30. Remark 2: the algorithm would not work for a problem with negative action costs, and adding a positive constant to make them non-negative would not solve the problem since this would end up being a different problem.**
+
+&#10230; Not 2: Algoritma, negatif eylem maliyetleriyle ilgili bir problem için çalışmaz ve negatif olmayan bir hale getirmek için pozitif bir sabit eklemek problemi çözmez, çünkü problem farklı bir problem haline gelmiş olur.
+
+<br>
+
+
+**31. Correctness theorem ― When a state s is popped from the frontier F and moved to explored set E, its priority is equal to PastCost(s) which is the minimum cost path from sstart to s.**
+
+&#10230; Doğruluk teoremi ― S durumu sıradaki (frontier) F'den çıkarılır ve daha önceden keşfedilmiş olan E kümesine taşınırsa, önceliği başlangıç durumu olan Sstart'dan, s durumuna kadar asgari (minimum) maliyet yolu olan PastCost(s)'e eşittir.
+
+<br>
+
+
+**32. Graph search algorithms summary ― By noting N the number of total states, n of which are explored before the end state send, we have:**
+
+&#10230; Çizge arama algoritmaları özeti ― N toplam durumların sayısı, n-bitiş durumu(Send)'ndan önce keşfedilen durum sayısı ise:
+
+<br>
+
+
+**33. [Algorithm, Acyclicity, Costs, Time/space]**
+
+&#10230; [Algoritma, Asiklik (Çevrimsizlik), Maliyetler, Zaman/arama uzayı]
+
+<br>
+
+
+**34. [Dynamic programming, Uniform cost search]**
+
+&#10230; [Dinamik programlama, Tek tip maliyet araması]
+
+<br>
+
+
+**35. Remark: the complexity countdown supposes the number of possible actions per state to be constant.**
+
+&#10230; Not: Karmaşıklık geri sayımı, her durum için olası eylemlerin sayısını sabit olarak kabul eder.
+
+<br>
+
+
+**36. Learning costs**
+
+&#10230; Öğrenme maliyetleri
+
+<br>
+
+
+**37. Suppose we are not given the values of Cost(s,a), we want to estimate these quantities from a training set of minimizing-cost-path sequence of actions (a1,a2,...,ak).**
+
+&#10230; Diyelim ki, Cost(s,a) değerleri verilmedi ve biz bu değerleri maliyet yolu eylem dizisini,(a1,a2,...,ak), en aza indiren bir eğitim kümesinden tahmin etmek istiyoruz.
+
+<br>
+
+
+**38. [Structured perceptron ― The structured perceptron is an algorithm aiming at iteratively learning the cost of each state-action pair. At each step, it:, decreases the estimated cost of each state-action of the true minimizing path y given by the training data, increases the estimated cost of each state-action of the current predicted path y' inferred from the learned weights.]**
+
+&#10230; [Yapılandırılmış algılayıcı ― Yapılandırılmış algılayıcı, her bir durum-eylem çiftinin maliyetini tekrarlı (iteratively) olarak öğrenmeyi amaçlayan bir algoritmadır. Her bir adımda, algılayıcı:, eğitim verilerinden elde edilen gerçek asgari (minimum) y yolunun her bir durum-eylem çiftinin tahmini (estimated) maliyetini azaltır, öğrenilen ağırlıklardan elde edilen şimdiki tahmini(predicted) y' yolununun durum-eylem çiftlerinin tahmini maliyetini artırır.]
+
+<br>
+
+
+**39. Remark: there are several versions of the algorithm, one of which simplifies the problem to only learning the cost of each action a, and the other parametrizes Cost(s,a) to a feature vector of learnable weights.**
+
+&#10230; Not: Algoritmanın birkaç sürümü vardır, bunlardan biri problemi sadece her bir a eyleminin maliyetini öğrenmeye indirger, bir diğeri ise öğrenilebilir ağırlık öznitelik vektörünü, Cost(s,a)'nın parametresi haline getirir.
+
+<br>
+
+
+**40. A* search**
+
+&#10230; A* arama
+
+<br>
+
+
+**41. Heuristic function ― A heuristic is a function h over states s, where each h(s) aims at estimating FutureCost(s), the cost of the path from s to send.**
+
+&#10230; Sezgisel işlev(Heuristic function) ― Sezgisel, s durumu üzerinde işlem yapan bir h fonksiyonudur, burada her bir h(s), s ile send arasındaki yol maliyeti olan FutureCost(s)'yi tahmin etmeyi amaçlar.
+
+<br>
+
+
+**42. Algorithm ― A∗ is a search algorithm that aims at finding the shortest path from a state s to an end state send. It explores states s in increasing order of PastCost(s)+h(s). It is equivalent to a uniform cost search with edge costs Cost′(s,a) given by:**
+
+&#10230; Algoritma ― A∗, s durumu ile send bitiş durumu arasındaki en kısa yolu bulmayı amaçlayan bir arama algoritmasıdır. Bahse konu algoritma PastCost(s)+h(s)'yi artan sıra ile araştırır. Aşağıda verilenler ışığında kenar maliyetlerini de içeren tek tip maliyet aramasına eşittir:
+
+<br>
+
+
+**43. Remark: this algorithm can be seen as a biased version of UCS exploring states estimated to be closer to the end state.**
+
+&#10230; Not: Bu algoritma, son duruma yakın olduğu tahmin edilen durumları araştıran tek tip maliyet aramasının taraflı bir sürümü olarak görülebilir.
+
+<br>
+
+
+**44. [Consistency ― A heuristic h is said to be consistent if it satisfies the two following properties:, For all states s and actions a, The end state verifies the following:]**
+
+&#10230; [Tutarlılık ― Bir sezgisel h, aşağıdaki iki özelliği sağlaması durumunda tutarlıdır denilebilir:, Bütün s durumları ve a eylemleri için, bitiş durumu aşağıdakileri doğrular:]
+
+<br>
+
+
+**45. Correctness ― If h is consistent, then A∗ returns the minimum cost path.**
+
+&#10230; Doğruluk ― Eğer h tutarlı ise o zaman A∗ algoritması asgari (minimum) maliyet yolunu döndürür.
+
+<br>
+
+
+**46. Admissibility ― A heuristic h is said to be admissible if we have:**
+
+&#10230; Kabul edilebilirlik ― Bir sezgisel h kabul edilebilirdir eğer:
+
+<br>
+
+
+**47. Theorem ― Let h(s) be a given heuristic. We have:**
+
+&#10230; Teorem ― h(s) sezgisel olsun ve:
+
+<br>
+
+
+**48. [consistent, admissible]**
+
+&#10230; [tutarlı, kabul edilebilir]
+
+<br>
+
+
+**49. Efficiency ― A* explores all states s satisfying the following equation:**
+
+&#10230; Verimlilik ― A* algoritması aşağıdaki eşitliği sağlayan bütün s durumlarını araştırır:
+
+<br>
+
+
+**50. Remark: larger values of h(s) is better as this equation shows it will restrict the set of states s going to be explored.**
+
+&#10230; Not: h(s)'nin yüksek değerleri, bu eşitliğin araştırılacak olan s durum kümesini kısıtlayacak olması nedeniyle daha iyidir.
+
+<br>
+
+
+**51. Relaxation**
+
+&#10230; Rahatlama
+
+<br>
+
+
+**52. It is a framework for producing consistent heuristics. The idea is to find closed-form reduced costs by removing constraints and use them as heuristics.**
+
+&#10230; Bu tutarlı sezgisel için bir altyapıdır (framework). Buradaki fikir, kısıtlamaları kaldırarak kapalı şekilli (closed-form) düşük maliyetler bulmak ve bunları sezgisel olarak kullanmaktır.
+
+<br>
+
+
+**53. Relaxed search problem ― The relaxation of search problem P with costs Cost is noted Prel with costs Costrel, and satisfies the identity:**
+
+&#10230; Rahat arama problemi (Relaxed search problem) ― Cost maliyetli bir arama probleminin rahatlaması, Costrel maliyetli Prel ile ifade edilir ve kimliği karşılar (satisfies the identity) :
+
+<br>
+
+
+**54. Relaxed heuristic ― Given a relaxed search problem Prel, we define the relaxed heuristic h(s)=FutureCostrel(s) as the minimum cost path from s to an end state in the graph of costs Costrel(s,a).**
+
+&#10230; Rahat sezgisel (Relaxed heuristic) ― Bir Prel rahat arama problemi verildiğinde, h(s)=FutureCostrel(s) rahat sezgisel eşitliğini Costrel(s,a) maliyet çizgesindeki s durumu ile bir bitiş durumu arasındaki asgari(minimum) maliyet yolu olarak tanımlarız.
+
+<br>
+
+
+**55. Consistency of relaxed heuristics ― Let Prel be a given relaxed problem. By theorem, we have:**
+
+&#10230; Rahat sezgisel tutarlılığı ― Prel bir rahat problem olarak verilmiş olsun. Teoreme göre:
+
+<br>
+
+
+**56. consistent**
+
+&#10230; tutarlı
+
+<br>
+
+
+**57. [Tradeoff when choosing heuristic ― We have to balance two aspects in choosing a heuristic:, Computational efficiency: h(s)=FutureCostrel(s) must be easy to compute. It has to produce a closed form, easier search and independent subproblems., Good enough approximation: the heuristic h(s) should be close to FutureCost(s) and we have thus to not remove too many constraints.]**
+
+&#10230; [Sezgisel seçiminde ödünleşim (tradeoff) ― Sezgisel seçiminde iki yönü dengelemeliyiz:, Hesaplamalı verimlilik: h(s)=FutureCostrel(s) eşitliği kolay hesaplanabilir olmalıdır. Kapalı bir şekil, daha kolay arama ve bağımsız alt problemler üretmesi gerekir., Yeterince iyi yaklaşım: sezgisel h(s), FutureCost(s) işlevine yakın olmalı ve bu nedenle çok fazla kısıtlamayı ortadan kaldırmamalıyız.]
+
+<br>
+
+
+**58. Max heuristic ― Let h1(s), h2(s) be two heuristics. We have the following property:**
+
+&#10230; En yüksek sezgisel ― h1(s) ve h2(s) aşağıdaki özelliklere sahip iki adet sezgisel olsun:
+
+<br>
+
+
+**59. Markov decision processes**
+
+&#10230; Markov karar süreçleri
+
+<br>
+
+
+**60. In this section, we assume that performing action a from state s can lead to several states s′1,s′2,... in a probabilistic manner. In order to find our way between an initial state and an end state, our objective will be to find the maximum value policy by using Markov decision processes that help us cope with randomness and uncertainty.**
+
+&#10230; Bu bölümde, s durumunda a eyleminin gerçekleştirilmesinin olasılıksal olarak birden fazla durum,(s′1,s′2,...), ile sonuçlanacağını kabul ediyoruz. Başlangıç durumu ile bitiş durumu arasındaki yolu bulmak için amacımız, rastgelelilik ve belirsizlik ile başa çıkabilmek için yardımcı olan Markov karar süreçlerini kullanarak en yüksek değer politikasını bulmak olacaktır.
+
+<br>
+
+
+**61. Notations**
+
+&#10230; Gösterimler
+
+<br>
+
+
+**62. [Definition ― The objective of a Markov decision process is to maximize rewards. It is defined with:, a starting state sstart, possible actions Actions(s) from state s, transition probabilities T(s,a,s′) from s to s′ with action a, rewards Reward(s,a,s′) from s to s′ with action a, whether an end state was reached IsEnd(s), a discount factor 0⩽γ⩽1]**
+
+&#10230; [Tanım ― Markov karar sürecinin amacı ödülleri en yüksek seviyeye çıkarmaktır. Markov karar süreci aşağıdaki bileşenlerden oluşmaktadır:, başlangıç durumu sstart, s durumunda gerçekleştirilebilecek olası eylemler Actions(s), s durumunda a eyleminin gerçekleştirilmesi ile s′ durumuna geçiş olasılıkları T(s,a,s′), s durumunda a eyleminin gerçekleştirilmesi ile elde edilen ödüller Reward(s,a,s′), bitiş durumuna ulaşılıp ulaşılamadığı IsEnd(s), indirim faktörü 0⩽γ⩽1]
+
+<br>
+
+
+**63. Transition probabilities ― The transition probability T(s,a,s′) specifies the probability of going to state s′ after action a is taken in state s. Each s′↦T(s,a,s′) is a probability distribution, which means that:**
+
+&#10230; Geçiş olasılıkları ― Geçiş olasılığı T(s,a,s′) s durumundayken gerçekleştirilen a eylemi neticesinde s′ durumuna gitme olasılığını belirtir. Her bir s′↦T(s,a,s′) aşağıda belirtildiği gibi bir olasılık dağılımıdır:
+
+<br>
+
+
+**64. states**
+
+&#10230; durumlar
+
+<br>
+
+
+**65. Policy ― A policy π is a function that maps each state s to an action a, i.e.**
+
+&#10230; Politika ― Bir π politikası her s durumunu bir a eylemi ile ilişkilendiren bir işlevdir.
+
+<br>
+
+
+**66. Utility ― The utility of a path (s0,...,sk) is the discounted sum of the rewards on that path. In other words,**
+
+&#10230; Fayda ― Bir (s0,...,sk) yolunun faydası, o yol üzerindeki ödüllerin indirimli toplamıdır. Diğer bir deyişle,
+
+<br>
+
+
+**67. The figure above is an illustration of the case k=4.**
+
+&#10230; Yukarıdaki şekil k=4 durumunun bir gösterimidir.
+
+<br>
+
+
+**68. Q-value ― The Q-value of a policy π at state s with action a, also noted Qπ(s,a), is the expected utility from state s after taking action a and then following policy π. It is defined as follows:**
+
+&#10230; Q-değeri ― S durumunda gerçekleştirilen bir a eylemi için π politikasının Q-değeri, Qπ(s,a) olarak da gösterilir, a eylemini gerçekleştirip ve sonrasında π politikasını takiben s durumundan beklenen faydadır. Q-değeri aşağıdaki şekilde tanımlanmaktadır:
+
+<br>
+
+
+**69. Value of a policy ― The value of a policy π from state s, also noted Vπ(s), is the expected utility by following policy π from state s over random paths. It is defined as follows:**
+
+&#10230; Bir politikanın değeri ― S durumundaki π politikasının değeri,Vπ(s) olarak da gösterilir, rastgele yollar üzerinde s durumundaki π politikasını izleyerek elde edilen beklenen faydadır. S durumundaki π politikasının değeri aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**70. Remark: Vπ(s) is equal to 0 if s is an end state.**
+
+&#10230; Not: Eğer s bitiş durumu ise Vπ(s) sıfıra eşittir.
+
+<br>
+
+
+**71. Applications**
+
+&#10230; Uygulamalar
+
+<br>
+
+
+**72. [Policy evaluation ― Given a policy π, policy evaluation is an iterative algorithm that aims at estimating Vπ. It is done as follows:, Initialization: for all states s, we have:, Iteration: for t from 1 to TPE, we have, with]**
+
+&#10230; [Politika değerlendirme ― bir π politikası verildiğinde, politika değerlendirmesini,Vπ, tahmin etmeyi amaçlayan bir tekrarlı (iterative) algoritmadır. Politika değerlendirme aşağıdaki gibi yapılmaktadır:, İlklendirme: bütün s durumları için:, Tekrar: 1'den TPE'ye kadar her t için, ile]
+
+<br>
+
+
+**73. Remark: by noting S the number of states, A the number of actions per state, S′ the number of successors and T the number of iterations, then the time complexity is of O(TPESS′).**
+
+&#10230; Not: S durum sayısını, A her bir durum için eylem sayısını, S′ ardılların (successors) sayısını ve T yineleme sayısını gösterdiğinde, zaman karmaşıklığı O(TPESS′) olur.
+
+<br>
+
+
+**74. Optimal Q-value ― The optimal Q-value Qopt(s,a) of state s with action a is defined to be the maximum Q-value attained by any policy starting. It is computed as follows:**
+
+&#10230; En iyi Q-değeri ― S durumunda a eylemi gerçekleştirildiğinde bu durumun en iyi Q-değeri,Qopt(s,a), herhangi bir politika başlangıcında elde edilen en yüksek Q-değeri olarak tanımlanmaktadır. En iyi Q-değeri aşağıdaki gibi hesaplanmaktadır:
+
+<br>
+
+
+**75. Optimal value ― The optimal value Vopt(s) of state s is defined as being the maximum value attained by any policy. It is computed as follows:**
+
+&#10230; En iyi değer ― S durumunun en iyi değeri olan Vopt(s), herhangi bir politika ile elde edilen en yüksek değer olarak tanımlanmaktadır. En iyi değer aşağıdaki gibi hesaplanmaktadır:
+
+<br>
+
+
+**76. actions**
+
+&#10230; eylemler
+
+<br>
+
+
+**77. Optimal policy ― The optimal policy πopt is defined as being the policy that leads to the optimal values. It is defined by:**
+
+&#10230; En iyi politika ― En iyi politika olan πopt, en iyi değerlere götüren politika olarak tanımlanmaktadır. En iyi politika aşağıdaki gibi tanımlanmaktadır:
+
+<br>
+
+
+**78. [Value iteration ― Value iteration is an algorithm that finds the optimal value Vopt as well as the optimal policy πopt. It is done as follows:, Initialization: for all states s, we have:, Iteration: for t from 1 to TVI, we have:, with]**
+
+&#10230; [Değer tekrarı(iteration) ― Değer tekrarı(iteration) en iyi politika olan πopt, yanında en iyi değeri Vopt'ı, bulan bir algoritmadır. Değer tekrarı(iteration) aşağıdaki gibi yapılmaktadır:, İlklendirme: bütün s durumları için:, Tekrar: 1'den TVI'ya kadar her bir t için:, ile]
+
+<br>
+
+
+**79. Remark: if we have either γ<1 or the MDP graph being acyclic, then the value iteration algorithm is guaranteed to converge to the correct answer.**
+
+&#10230; Not: Eğer γ<1 ya da Markov karar süreci (Markov Decision Process - MDP) asiklik (çevrimsiz) olursa, o zaman değer tekrarı algoritmasının doğru cevaba yakınsayacağı garanti edilir.
+
+<br>
+
+
+**80. When unknown transitions and rewards**
+
+&#10230; Bilinmeyen geçişler ve ödüller
+
+<br>
+
+
+**81. Now, let's assume that the transition probabilities and the rewards are unknown.**
+
+&#10230; Şimdi, geçiş olasılıklarının ve ödüllerin bilinmediğini varsayalım.
+
+<br>
+
+
+**82. Model-based Monte Carlo ― The model-based Monte Carlo method aims at estimating T(s,a,s′) and Reward(s,a,s′) using Monte Carlo simulation with: **
+
+&#10230; Model-temelli Monte Carlo ― Model-temelli Monte Carlo yöntemi, T(s,a,s′) ve Reward(s,a,s′) işlevlerini Monte Carlo benzetimi kullanarak aşağıdaki formüllere uygun bir şekilde tahmin etmeyi amaçlar:
+
+<br>
+
+
+**83. [# times (s,a,s′) occurs, and]**
+
+&#10230; [# kere (s,a,s′) gerçekleşme sayısı, ve]
+
+<br>
+
+
+**84. These estimations will be then used to deduce Q-values, including Qπ and Qopt.**
+
+&#10230; Bu tahminler daha sonra Qπ ve Qopt'yi içeren Q-değerleri çıkarımı için kullanılacaktır.
+
+<br>
+
+
+**85. Remark: model-based Monte Carlo is said to be off-policy, because the estimation does not depend on the exact policy.**
+
+&#10230; Not: model-tabanlı Monte Carlo'nun politika dışı olduğu söyleniyor, çünkü tahmin kesin politikaya bağlı değildir.
+
+<br>
+
+
+**86. Model-free Monte Carlo ― The model-free Monte Carlo method aims at directly estimating Qπ, as follows:**
+
+&#10230; Model içermeyen Monte Carlo ― Model içermeyen Monte Carlo yöntemi aşağıdaki şekilde doğrudan Qπ'yi tahmin etmeyi amaçlar:
+
+<br>
+
+
+**87. Qπ(s,a)=average of ut where st−1=s,at=a**
+
+&#10230; Qπ(s,a)= ortalama ut , st−1=s ve at=a olduğunda
+
+<br>
+
+
+**88. where ut denotes the utility starting at step t of a given episode.**
+
+&#10230; ut belirli bir bölümün t anında başlayan faydayı ifade etmektedir.
+
+<br>
+
+
+**89. Remark: model-free Monte Carlo is said to be on-policy, because the estimated value is dependent on the policy π used to generate the data.**
+
+&#10230; Not: model içermeyen Monte Carlo'nun politikaya dahil olduğu söyleniyor, çünkü tahmini değer veriyi üretmek için kullanılan π politikasına bağlıdır.
+
+<br>
+
+
+**90. Equivalent formulation - By introducing the constant η=11+(#updates to (s,a)) and for each (s,a,u) of the training set, the update rule of model-free Monte Carlo has a convex combination formulation:**
+
+&#10230; Eşdeğer formülasyon - Sabit tanımı η=11+(#güncelleme sayısı (s,a) ) ve eğitim kümesinin her bir (s,a,u) üçlemesi için, model içermeyen Monte Carlo'nun güncelleme kuralı dışbükey bir kombinasyon formülasyonuna sahiptir:
+
+<br>
+
+
+**91. as well as a stochastic gradient formulation:**
+
+&#10230; olasılıksal bayır formülasyonu yanında:
+
+<br>
+
+
+**92. SARSA ― State-action-reward-state-action (SARSA) is a boostrapping method estimating Qπ by using both raw data and estimates as part of the update rule. For each (s,a,r,s′,a′), we have:**
+
+&#10230; SARSA ― Durum-eylem-ödül-durum-eylem (State-Action-Reward-State-Action - SARSA), hem ham verileri hem de güncelleme kuralının bir parçası olarak tahminleri kullanarak Qπ'yi tahmin eden bir destekleme yöntemidir. Her bir (s,a,r,s′,a′) için:
+
+<br>
+
+
+**93. Remark: the SARSA estimate is updated on the fly as opposed to the model-free Monte Carlo one where the estimate can only be updated at the end of the episode.**
+
+&#10230; Not: the SARSA tahmini, tahminin yalnızca bölüm sonunda güncellenebildiği model içermeyen Monte Carlo yönteminin aksine anında güncellenir.
+
+<br>
+
+
+**94. Q-learning ― Q-learning is an off-policy algorithm that produces an estimate for Qopt. On each (s,a,r,s′,a′), we have:**
+
+&#10230; Q-öğrenme ― Q-öğrenme, Qopt için tahmin üreten politikaya dahil olmayan bir algoritmadır. Her bir (s,a,r,s′,a′) için:
+
+<br>
+
+
+**95. Epsilon-greedy ― The epsilon-greedy policy is an algorithm that balances exploration with probability ϵ and exploitation with probability 1−ϵ. For a given state s, the policy πact is computed as follows:**
+
+&#10230; Epsilon-açgözlü ― Epsilon-açgözlü politika, ϵ olasılıkla araştırmayı ve 1−ϵ olasılıkla sömürüyü dengeleyen bir algoritmadır. Her bir s durumu için, πact politikası aşağıdaki şekilde hesaplanır:
+
+<br>
+
+
+**96. [with probability, random from Actions(s)]**
+
+&#10230; [olasılıkla, Actions(s) eylem kümesi içinden rastgele]
+
+<br>
+
+
+**97. Game playing**
+
+&#10230; Oyun oynama
+
+<br>
+
+
+**98. In games (e.g. chess, backgammon, Go), other agents are present and need to be taken into account when constructing our policy.**
+
+&#10230; Oyunlarda (örneğin satranç, tavla, Go), başka oyuncular vardır ve politikamızı oluştururken göz önünde bulundurulması gerekir.
+
+<br>
+
+
+**99. Game tree ― A game tree is a tree that describes the possibilities of a game. In particular, each node is a decision point for a player and each root-to-leaf path is a possible outcome of the game.**
+
+&#10230; Oyun ağacı ― Oyun ağacı, bir oyunun olasılıklarını tarif eden bir ağaçtır. Özellikle, her bir düğüm, oyuncu için bir karar noktasıdır ve her bir kökten (root) yaprağa (leaf) giden yol oyunun olası bir sonucudur.
+
+<br>
+
+
+**100. [Two-player zero-sum game ― It is a game where each state is fully observed and such that players take turns. It is defined with:, a starting state sstart, possible actions Actions(s) from state s, successors Succ(s,a) from states s with actions a, whether an end state was reached IsEnd(s), the agent's utility Utility(s) at end state s, the player Player(s) who controls state s]**
+
+&#10230; [İki oyunculu sıfır toplamlı oyun ― Her durumun tamamen gözlendiği ve oyuncuların sırayla oynadığı bir oyundur. Aşağıdaki gibi tanımlanır:, bir başlangıç durumu sstart, s durumunda gerçekleştirilebilecek olası eylemler Actions(s), s durumunda a eylemi gerçekleştirildiğindeki ardıllar Succ(s,a), bir bitiş durumuna ulaşılıp ulaşılmadığı IsEnd(s), s bitiş durumunda etmenin elde ettiği fayda Utility(s), s durumunu kontrol eden oyuncu Player(s)]
+
+<br>
+
+
+**101. Remark: we will assume that the utility of the agent has the opposite sign of the one of the opponent.**
+
+&#10230; Not: Oyuncu faydasının işaretinin, rakibinin faydasının tersi olacağını varsayacağız.
+
+<br>
+
+
+**102. [Types of policies ― There are two types of policies:, Deterministic policies, noted πp(s), which are actions that player p takes in state s., Stochastic policies, noted πp(s,a)∈[0,1], which are probabilities that player p takes action a in state s.]**
+
+&#10230; [Politika türleri ― İki tane politika türü vardır:, πp(s) olarak gösterilen belirlenimci politikalar , p oyuncusunun s durumunda gerçekleştirdiği eylemler., πp(s,a)∈[0,1] olarak gösterilen olasılıksal politikalar, p oyuncusunun s durumunda a eylemini gerçekleştirme olasılıkları.]
+
+<br>
+
+
+**103. Expectimax ― For a given state s, the expectimax value Vexptmax(s) is the maximum expected utility of any agent policy when playing with respect to a fixed and known opponent policy πopp. It is computed as follows:**
+
+&#10230; En yüksek beklenen değer(Expectimax) ― Belirli bir s durumu için, en yüksek beklenen değer olan Vexptmax(s), sabit ve bilinen bir rakip politikası olan πopp'a göre oynarken, bir oyuncu politikasının en yüksek beklenen faydasıdır. En yüksek beklenen değer(Expectimax) aşağıdaki gibi hesaplanmaktadır:
+
+<br>
+
+
+**104. Remark: expectimax is the analog of value iteration for MDPs.**
+
+&#10230; Not: En yüksek beklenen değer(Expectimax), MDP'ler için değer yinelemenin analog halidir.
+
+<br>
+
+
+**105. Minimax ― The goal of minimax policies is to find an optimal policy against an adversary by assuming the worst case, i.e. that the opponent is doing everything to minimize the agent's utility. It is done as follows:**
+
+&#10230; En küçük-en büyük (minimax) ― En küçük-enbüyük (minimax) politikaların amacı en kötü durumu kabul ederek, diğer bir deyişle; rakip, oyuncunun faydasını en aza indirmek için her şeyi yaparken, rakibe karşı en iyi politikayı bulmaktır. En küçük-en büyük(minimax) aşağıdaki şekilde yapılır:
+
+<br>
+
+
+**106. Remark: we can extract πmax and πmin from the minimax value Vminimax.**
+
+&#10230; Not: πmax ve πmin değerleri, en küçük-en büyük olan Vminimax'dan elde edilebilir.
+
+<br>
+
+
+**107. Minimax properties ― By noting V the value function, there are 3 properties around minimax to have in mind:**
+
+&#10230; En küçük-en büyük (minimax) özellikleri ― V değer fonksiyonunu ifade ederse, En küçük-en büyük (minimax) ile ilgili aklımızda bulundurmamız gereken 3 özellik vardır:
+
+<br>
+
+
+**108. Property 1: if the agent were to change its policy to any πagent, then the agent would be no better off.**
+
+&#10230; Özellik 1: Oyuncu politikasını herhangi bir πagent ile değiştirecek olsaydı, o zaman oyuncu daha iyi olmazdı.
+
+<br>
+
+
+**109. Property 2: if the opponent changes its policy from πmin to πopp, then he will be no better off.**
+
+&#10230; Özellik 2: Eğer rakip oyuncu politikasını πmin'den πopp'a değiştirecek olsaydı, o zaman rakip oyuncu daha iyi olamazdı.
+
+<br>
+
+
+**110. Property 3: if the opponent is known to be not playing the adversarial policy, then the minimax policy might not be optimal for the agent.**
+
+&#10230; Özellik 3: Eğer rakip oyuncunun muhalif (adversarial) politikayı oynamadığı biliniyorsa, o zaman en küçük-en büyük(minimax) politika oyuncu için ey iyi (optimal) olmayabilir.
+
+<br>
+
+
+**111. In the end, we have the following relationship:**
+
+&#10230; Sonunda, aşağıda belirtildiği gibi bir ilişkiye sahip oluruz:
+
+<br>
+
+
+**112. Speeding up minimax**
+
+&#10230; En küçük-en büyük (minimax) hızlandırma
+
+<br>
+
+
+**113. Evaluation function ― An evaluation function is a domain-specific and approximate estimate of the value Vminimax(s). It is noted Eval(s).**
+
+&#10230; Değerlendirme işlevi ― Değerlendirme işlevi, alana özgü (domain-specific) ve Vminimax(s) değerinin yaklaşık bir tahminidir. Eval(s) olarak ifade edilmektedir.
+
+<br>
+
+
+**114. Remark: FutureCost(s) is an analogy for search problems.**
+
+&#10230; Not: FutureCost(s) arama problemleri için bir benzetmedir(analogy).
+
+<br>
+
+
+**115. Alpha-beta pruning ― Alpha-beta pruning is a domain-general exact method optimizing the minimax algorithm by avoiding the unnecessary exploration of parts of the game tree. To do so, each player keeps track of the best value they can hope for (stored in α for the maximizing player and in β for the minimizing player). At a given step, the condition β<α means that the optimal path is not going to be in the current branch as the earlier player had a better option at their disposal.**
+
+&#10230; Alpha-beta budama ― Alfa-beta budama, oyun ağacının parçalarının gereksiz yere keşfedilmesini önleyerek en küçük-en büyük(minimax) algoritmasını en iyileyen (optimize eden) alana-özgü olmayan genel bir yöntemdir. Bunu yapmak için, her oyuncu ümit edebileceği en iyi değeri takip eder (maksimize eden oyuncu için α'da ve minimize eden oyuncu için β'de saklanır). Belirli bir adımda, β <α koşulu, önceki oyuncunun emrinde daha iyi bir seçeneğe sahip olması nedeniyle en iyi (optimal) yolun mevcut dalda olamayacağı anlamına gelir.
+
+<br>
+
+
+**116. TD learning ― Temporal difference (TD) learning is used when we don't know the transitions/rewards. The value is based on exploration policy. To be able to use it, we need to know rules of the game Succ(s,a). For each (s,a,r,s′), the update is done as follows:**
+
+&#10230; TD öğrenme ― Geçici fark (Temporal difference - TD) öğrenmesi, geçiş/ödülleri bilmediğimiz zaman kullanılır. Değer, keşif politikasına dayanır. Bunu kullanabilmek için, oyununun kurallarını,Succ (s, a), bilmemiz gerekir. Her bir (s,a,r,s′) için, güncelleme aşağıdaki şekilde yapılır:
+
+<br>
+
+
+**117. Simultaneous games**
+
+&#10230; Eşzamanlı oyunlar
+
+<br>
+
+
+**118. This is the contrary of turn-based games, where there is no ordering on the player's moves.**
+
+&#10230; Bu, oyuncunun hamlelerinin sıralı olmadığı sıra temelli oyunların tam tersidir.
+ 
+<br>
+
+
+**119. Single-move simultaneous game ― Let there be two players A and B, with given possible actions. We note V(a,b) to be A's utility if A chooses action a, B chooses action b. V is called the payoff matrix.**
+
+&#10230; Tek-hamleli eşzamanlı oyun ― Olası hareketlere sahip A ve B iki oyuncu olsun. V(a,b), A'nın a eylemini ve B'nin de b eylemini seçtiği A'nın faydasını ifade eder. V, getiri dizeyi olarak adlandırılır.
+
+<br>
+
+
+**120. [Strategies ― There are two main types of strategies:, A pure strategy is a single action:, A mixed strategy is a probability distribution over actions:]**
+
+&#10230; [Stratejiler ― İki tane ana strateji türü vardır:, Saf strateji, tek bir eylemdir:, Karışık strateji, eylemler üzerindeki bir olasılık dağılımıdır:]
+
+<br>
+
+
+**121. Game evaluation ― The value of the game V(πA,πB) when player A follows πA and player B follows πB is such that:**
+
+&#10230; Oyun değerlendirme ― oyuncu A πA'yı ve oyuncu B de πB'yi izlediğinde, Oyun değeri V(πA,πB):
+
+<br>
+
+
+**122. Minimax theorem ― By noting πA,πB ranging over mixed strategies, for every simultaneous two-player zero-sum game with a finite number of actions, we have:**
+
+&#10230; En küçük-en büyük (minimax) teoremi ― ΠA, πB’nin karma stratejilere göre değiştiğini belirterek, sonlu sayıda eylem ile eşzamanlı her iki oyunculu sıfır toplamlı oyun için:
+
+<br>
+
+
+**123. Non-zero-sum games**
+
+&#10230; Sıfır toplamı olmayan oyunlar
+
+<br>
+
+
+**124. Payoff matrix ― We define Vp(πA,πB) to be the utility for player p.**
+
+&#10230; Getiri matrisi ― Vp(πA,πB)'yi oyuncu p'nin faydası olarak tanımlıyoruz.
+
+<br>
+
+
+**125. Nash equilibrium ― A Nash equilibrium is (π∗A,π∗B) such that no player has an incentive to change its strategy. We have:**
+
+&#10230; Nash dengesi ― Nash dengesi (π ∗ A, π ∗ B) öyle birşey ki hiçbir oyuncuyu, stratejisini değiştirmeye teşvik etmiyor:
+
+<br>
+
+
+**126. and**
+
+&#10230; ve
+
+<br>
+
+
+**127. Remark: in any finite-player game with finite number of actions, there exists at least one Nash equilibrium.**
+
+&#10230; Not: sonlu sayıda eylem olan herhangi bir sonlu oyunculu oyunda, en azından bir tane Nash denegesi mevcuttur.
+
+<br>
+
+
+**128. [Tree search, Backtracking search, Breadth-first search, Depth-first search, Iterative deepening]**
+
+&#10230; [Ağaç arama, Geri izleme araması, Genişlik öncelikli arama, Derinlik öncelikli arama, Tekrarlı (Iterative) derinleşme]
+
+<br>
+
+
+**129. [Graph search, Dynamic programming, Uniform cost search]**
+
+&#10230; [Çizge arama, Dinamik programlama, Tek tip maliyet araması]
+
+<br>
+
+
+**130. [Learning costs, Structured perceptron]**
+
+&#10230; [Öğrenme maliyetleri, Yapısal algılayıcı]
+
+<br>
+
+
+**131. [A star search, Heuristic function, Algorithm, Consistency, correctness, Admissibility, efficiency]**
+
+&#10230; [A yıldız arama, Sezgisel işlev, Algoritma, Tutarlılık, doğruluk, kabul edilebilirlik, verimlilik]
+
+<br>
+
+
+**132. [Relaxation, Relaxed search problem, Relaxed heuristic, Max heuristic]**
+
+&#10230; [Rahatlama, Rahat arama problemi, Rahat sezgisel, En yüksek sezgisel]
+
+<br>
+
+
+**133. [Markov decision processes, Overview, Policy evaluation, Value iteration, Transitions, rewards]**
+
+&#10230; [Markov karar süreçleri, Genel bakış, Politika değerlendirme, Değer yineleme, Geçişler, ödüller]
+
+<br>
+
+
+**134. [Game playing, Expectimax, Minimax, Speeding up minimax, Simultaneous games, Non-zero-sum games]**
+
+&#10230; [Oyun oynama, En yüksek beklenti, En küçük-en büyük, En küçük-en büyük hızlandırma, Eşzamanlı oyunlar, Sıfır toplamı olmayan oyunlar]
+
+<br>
+
+
+**135. View PDF version on GitHub**
+
+&#10230; GitHub'da PDF sürümünü görüntüleyin
+
+<br>
+
+
+**136. Original authors**
+
+&#10230; Asıl yazarlar
+
+<br>
+
+
+**137. Translated by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından tercüme edilmiştir.
+
+<br>
+
+
+**138. Reviewed by X, Y and Z**
+
+&#10230; X,Y,Z tarafından gözden geçirilmiştir.
+
+<br>
+
+
+**139. By X and Y**
+
+&#10230; X ve Y ile
+
+<br>
+
+
+**140. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230; Yapay Zeka el kitapları artık [hedef dilde] mevcuttur.
diff --git a/tr/cs-221-variables-models.md b/tr/cs-221-variables-models.md
new file mode 100644
index 000000000..aac242e96
--- /dev/null
+++ b/tr/cs-221-variables-models.md
@@ -0,0 +1,617 @@
+**Variables-based models translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-221/cheatsheet-variables-models)
+
+<br>
+
+**1. Variables-based models with CSP and Bayesian networks**
+
+&#10230; 1. CSP  ile değişken-temelli modeller ve Bayesçi ağlar
+
+<br>
+
+
+**2. Constraint satisfaction problems**
+
+&#10230; 2. Kısıt memnuniyet problemleri
+
+<br>
+
+
+**3. In this section, our objective is to find maximum weight assignments of variable-based models. One advantage compared to states-based models is that these algorithms are more convenient to encode problem-specific constraints.**
+
+&#10230; 3. Bu bölümde hedefimiz değişken-temelli modellerin maksimum ağırlık seçimlerini bulmaktır. Durum temelli modellerle kıyaslandığında, bu algoritmaların probleme özgü kısıtları kodlamak için daha uygun olmaları bir avantajdır.  
+
+<br>
+
+
+**4. Factor graphs**
+
+&#10230; 4. Faktör grafikleri
+
+<br>
+
+
+**5. Definition ― A factor graph, also referred to as a Markov random field, is a set of variables X=(X1,...,Xn) where Xi∈Domaini and m factors f1,...,fm with each fj(X)⩾0.**
+
+&#10230;5. Tanımlama - Markov rasgele alanı olarak da adlandırılan faktör grafiği, Xi∈Domaini ve herbir fj(X)⩾0 olan f1,...,fm m faktör olmak üzere X=(X1,...,Xn) değişkenler kümesidir.
+
+<br>
+
+
+**6. Domain**
+
+&#10230; 6. Etki Alanı (Domain)
+
+<br>
+
+
+**7. Scope and arity ― The scope of a factor fj is the set of variables it depends on. The size of this set is called the arity.**
+
+&#10230; 7. Kapsam ve ilişki derecesi - Fj faktörünün kapsamı, dayandığı değişken kümesidir. Bu kümenin boyutuna ilişki derecesi (arity) denir.
+
+<br>
+
+
+**8. Remark: factors of arity 1 and 2 are called unary and binary respectively.**
+
+&#10230; 8. Not: Faktörlerin ilişki derecesi 1 ve 2 olanlarına sırasıyla tek ve ikili denir.
+
+<br>
+
+
+**9. Assignment weight ― Each assignment x=(x1,...,xn) yields a weight Weight(x) defined as being the product of all factors fj applied to that assignment. Its expression is given by:**
+
+&#10230;9. Atama ağırlığı - Her atama x = (x1, ..., xn), o atamaya uygulanan tüm faktörlerin çarpımı olarak tanımlanan bir Ağırlık (x) ağırlığı verir.Şöyle ifade edilir:
+
+<br> 
+
+
+**10. Constraint satisfaction problem ― A constraint satisfaction problem (CSP) is a factor graph where all factors are binary; we call them to be constraints:**
+
+&#10230; 10. Kısıt memnuniyet problemi - Kısıtlama memnuniyet problemi (constraint satisfaction problem-CSP), tüm faktörlerin ikili olduğu bir faktör grafiğidir; bunları kısıt olarak adlandırıyoruz:
+
+<br>
+
+
+**11. Here, the constraint j with assignment x is said to be satisfied if and only if fj(x)=1.**
+
+&#10230;11.Burada, j kısıtlı x ataması ancak ve ancak fj(x)=1 olduğunda uygundur (satisfied) denir.
+
+<br>
+
+
+**12. Consistent assignment ― An assignment x of a CSP is said to be consistent if and only if Weight(x)=1, i.e. all constraints are satisfied.**
+
+&#10230; 12.Tutarlı atama - Bir CSP'nin bir x atamasının, yalnızca Ağırlık (x) = 1 olduğunda, yani tüm kısıtların yerine getirilmesi durumunda tutarlı olduğu söylenir.
+
+<br>
+
+
+**13. Dynamic ordering**
+
+&#10230; 13. Dinamik düzenleşim (Dynamic ordering)
+
+<br>
+
+
+**14. Dependent factors ― The set of dependent factors of variable Xi with partial assignment x is called D(x,Xi), and denotes the set of factors that link Xi to already assigned variables.**
+
+&#10230;14.Bağımlı faktörler - X değişkeninin kısmi atamaya sahip bağımlı X değişken faktörlerinin kümesi D (x, Xi) ile gösterilir ve Xi'yi önceden atanmış değişkenlere bağlayan faktörler kümesini belirtir.
+
+<br>
+
+
+**15. Backtracking search ― Backtracking search is an algorithm used to find maximum weight assignments of a factor graph. At each step, it chooses an unassigned variable and explores its values by recursion. Dynamic ordering (i.e. choice of variables and values) and lookahead (i.e. early elimination of inconsistent options) can be used to explore the graph more efficiently, although the worst-case runtime stays exponential: O(|Domain|n).**
+
+&#10230; 15. Geri izleme araması - Geri izleme araması, bir faktör grafiğinin maksimum ağırlık atamalarını bulmak için kullanılan bir algoritmadır. Her adımda, atanmamış bir değişken seçer ve değerlerini özyineleme ile arar. Dinamik düzenleşim (yani değişkenlerin ve değerlerin seçimi) ve bakış açısı (yani tutarsız seçeneklerin erken elenmesi), en kötü durum çalışma süresi üssel olarak olsa da grafiği daha verimli aramak için kullanılabilir. O (| Domain | n).
+
+<br>
+
+
+**16. [Forward checking ― It is a one-step lookahead heuristic that preemptively removes inconsistent values from the domains of neighboring variables. It has the following characteristics:, After assigning a variable Xi, it eliminates inconsistent values from the domains of all its neighbors., If any of these domains becomes empty, we stop the local backtracking search., If we un-assign a variable Xi, we have to restore the domain of its neighbors.]**
+
+&#10230; 16. [İleri kontrol - Tutarsız değerleri komşu değişkenlerin etki alanlarından öncelikli bir şekilde ortadan kaldıran sezgisel bakış açısıdır. Aşağıdaki özelliklere sahiptir : Bir Xi değişkenini atadıktan sonra, tüm komşularının etki alanlarından tutarsız değerleri eler. Bu etki alanlardan herhangi biri boş olursa, yerel geri arama araması durdurulur.Komşularının etki alanını eski haline getirilmek zorundadır.]
+
+<br>
+
+
+**17. Most constrained variable ― It is a variable-level ordering heuristic that selects the next unassigned variable that has the fewest consistent values. This has the effect of making inconsistent assignments to fail earlier in the search, which enables more efficient pruning.**
+
+&#10230; 17. En kısıtlı değişken - En az tutarlı değere sahip bir sonraki atanmamış değişkeni seçen, değişken seviyeli sezgisel düzenleşimdir. Bu, daha verimli budama olanağı sağlayan aramada daha önce başarısız olmak için tutarsız atamalar yapma etkisine sahiptir.
+
+<br>
+
+
+**18. Least constrained value ― It is a value-level ordering heuristic that assigns the next value that yields the highest number of consistent values of neighboring variables. Intuitively, this procedure chooses first the values that are most likely to work.**
+
+&#10230; 18. En düşük kısıtlı değer - Komşu değişkenlerin en yüksek tutarlı değerlerini elde ederek bir sonrakini veren seviye düzenleyici sezgisel bir değerdir. Sezgisel olarak, bu prosedür önce çalışması en muhtemel olan değerleri seçer.
+
+<br>
+
+
+**19. Remark: in practice, this heuristic is useful when all factors are constraints.**
+
+&#10230; 19. Not: Uygulamada, bu sezgisel yaklaşım tüm faktörler kısıtlı olduğunda kullanışlıdır.
+
+<br>
+
+
+**20. The example above is an illustration of the 3-color problem with backtracking search coupled with most constrained variable exploration and least constrained value heuristic, as well as forward checking at each step.**
+
+&#10230; 20. Yukarıdaki örnek, en kısıtlı değişken keşfi ve sezgisel en düşük kısıtlı değerin yanı sıra, her adımda ileri kontrol ile birleştirilmiş geri izleme arama ile 3 renk probleminin bir gösterimidir.
+
+<br>
+
+
+**21. [Arc consistency ― We say that arc consistency of variable Xl with respect to Xk is enforced when for each xl∈Domainl:, unary factors of Xl are non-zero, there exists at least one xk∈Domaink such that any factor between Xl and Xk is non-zero.]**
+
+&#10230; 21. [Ark tutarlılığı (Arc consistency) - Xl değişkeninin ark tutarlılığının Xk'ye göre her bir xl∈Domainl için geçerli olduğu söylenir : Xl'in birleşik faktörleri sıfır olmadığında, en az bir xk∈Domaink vardır, öyle ki Xl ve Xk arasında sıfır olmayan herhangi bir faktör vardır.
+
+<br>
+
+
+**22. AC-3 ― The AC-3 algorithm is a multi-step lookahead heuristic that applies forward checking to all relevant variables. After a given assignment, it performs forward checking and then successively enforces arc consistency with respect to the neighbors of variables for which the domain change during the process.**
+
+&#10230; 22. AC-3 - AC-3 algoritması, tüm ilgili değişkenlere ileri kontrol uygulayan çok adımlı sezgisel bir bakış açısıdır. Belirli bir görevden sonra ileriye doğru kontrol yapar ve ardından işlem sırasında etki alanının değiştiği değişkenlerin komşularına göre ark tutarlılığını ardı ardına uygular.
+
+<br>
+
+
+**23. Remark: AC-3 can be implemented both iteratively and recursively.**
+
+&#10230; 23. Not: AC-3, tekrarlı ve özyinelemeli olarak uygulanabilir.
+
+<br>
+
+
+**24. Approximate methods**
+
+&#10230;24. Yaklaşık yöntemler (Approximate methods)
+
+<br>
+
+
+**25. Beam search ― Beam search is an approximate algorithm that extends partial assignments of n variables of branching factor b=|Domain| by exploring the K top paths at each step. The beam size K∈{1,...,bn} controls the tradeoff between efficiency and accuracy. This algorithm has a time complexity of O(n⋅Kblog(Kb)).**
+
+&#10230; 25. Işın araması (Beam search) - Işın araması, her adımda K en üst yollarını keşfederek, b=|Domain| dallanma faktörünün n değişkeninin kısmi atamalarını genişleten yaklaşık bir algoritmadır.
+
+<br>
+
+
+**26. The example below illustrates a possible beam search of parameters K=2, b=3 and n=5.**
+
+&#10230; 26. Aşağıdaki örnek, K = 2, b = 3 ve n = 5 parametreleri ile muhtemel ışın aramasını (beam search) göstermektedir.
+
+<br>
+
+
+**27. Remark: K=1 corresponds to greedy search whereas K→+∞ is equivalent to BFS tree search.**
+
+&#10230; 27. Not: K = 1 açgözlü aramaya (greedy search) karşılık gelirken K → + ∞, BFS ağaç aramasına eşdeğerdir.
+
+<br>
+
+
+**28. Iterated conditional modes ― Iterated conditional modes (ICM) is an iterative approximate algorithm that modifies the assignment of a factor graph one variable at a time until convergence. At step i, we assign to Xi the value v that maximizes the product of all factors connected to that variable.**
+
+&#10230;28. Tekrarlanmış koşullu modlar - Tekrarlanmış koşullu modlar (Iterated conditional modes-ICM), yakınsamaya kadar bir seferde bir değişkenli bir faktör grafiğinin atanmasını değiştiren yinelemeli bir yaklaşık algoritmadır. İ adımında, Xi'ye, bu değişkene bağlı tüm faktörlerin çarpımını maksimize eden v değeri atanır.
+
+<br>
+
+
+**29. Remark: ICM may get stuck in local minima.**
+
+&#10230; 29. Not: ICM yerel minimumda takılıp kalabilir.
+
+<br>
+
+
+**30. [Gibbs sampling ― Gibbs sampling is an iterative approximate method that modifies the assignment of a factor graph one variable at a time until convergence. At step i:, we assign to each element u∈Domaini a weight w(u) that is the product of all factors connected to that variable, we sample v from the probability distribution induced by w and assign it to Xi.]**
+
+&#10230; 30. [Gibbs örneklemesi - Gibbs örneklemesi, yakınsamaya kadar bir seferde bir değişken grafik faktörünün atanmasını değiştiren yinelemeli bir yaklaşık yöntemdir. İ adımında, her bir u∈Domain olan öğeye , bu değişkene bağlı tüm faktörlerin çarpımı olan bir ağırlık w (u) atanır, v'yi w tarafından indüklenen olasılık dağılımından örnek alır ve Xi'ye atanır.]
+
+<br>
+
+
+**31. Remark: Gibbs sampling can be seen as the probabilistic counterpart of ICM. It has the advantage to be able to escape local minima in most cases.**
+
+&#10230; 31. Not: Gibbs örneklemesi, ICM'nin olasılıksal karşılığı olarak görülebilir. Çoğu durumda yerel minimumlardan kaçabilme avantajına sahiptir.
+
+<br>
+
+
+**32. Factor graph transformations**
+
+&#10230; 32. Faktör grafiği dönüşümleri
+
+<br>
+
+
+**33. Independence ― Let A,B be a partitioning of the variables X. We say that A and B are independent if there are no edges between A and B and we write:**
+
+&#10230; 33. Bağımsızlık - A, B, X değişkenlerinin bir bölümü olsun. A ve B arasında kenar yoksa A ve B'nin bağımsız olduğu söylenir ve şöyle ifade edilir:
+
+<br>
+
+
+**34. Remark: independence is the key property that allows us to solve subproblems in parallel.**
+
+&#10230; 34. Not: bağımsızlık, alt sorunları paralel olarak çözmemize olanak sağlayan bir kilit özelliktir.
+
+<br>
+
+
+**35. Conditional independence ― We say that A and B are conditionally independent given C if conditioning on C produces a graph in which A and B are independent. In this case, it is written:**
+
+&#10230; 35. Koşullu bağımsızlık - Eğer C'nin şartlandırılması, A ve B'nin bağımsız olduğu bir grafik üretiyorsa A ve B verilen C koşulundan bağımsızdır. Bu durumda şöyle yazılır:
+
+<br>
+
+
+**36. [Conditioning ― Conditioning is a transformation aiming at making variables independent that breaks up a factor graph into smaller pieces that can be solved in parallel and can use backtracking. In order to condition on a variable Xi=v, we do as follows:, Consider all factors f1,...,fk that depend on Xi, Remove Xi and f1,...,fk, Add gj(x) for j∈{1,...,k} defined as:]**
+
+&#10230; 36. [Koşullandırma - Koşullandırma, bir faktör grafiğini paralel olarak çözülebilen ve geriye doğru izlemeyi kullanabilen daha küçük parçalara bölen değişkenleri bağımsız kılmayı amaçlayan bir dönüşümdür. Xi = v değişkeninde koşullandırmak için aşağıdakileri yaparız: Xi'ye bağlı tüm f1, ..., fk faktörlerini göz önünde bulundurun, Xi ve f1, ..., fk öğelerini kaldırın, j∈ {1, ..., k} için gj (x) ekleyin:]
+
+<br>
+
+
+**37. Markov blanket ― Let A⊆X be a subset of variables. We define MarkovBlanket(A) to be the neighbors of A that are not in A.**
+
+&#10230; 37. Markov blanket - A⊆X değişkenlerin bir alt kümesi olsun. MarkovBlanket'i (A), A'da olmayan A'nın komşuları olarak tanımlıyoruz.
+
+<br>
+
+
+**38. Proposition ― Let C=MarkovBlanket(A) and B=X∖(A∪C). Then we have:**
+
+&#10230; Önerme - C = MarkovBlanket (A) ve B = X ∖ (A∪C) olsun.Bu durumda:
+
+<br>
+
+
+**39. [Elimination ― Elimination is a factor graph transformation that removes Xi from the graph and solves a small subproblem conditioned on its Markov blanket as follows:, Consider all factors fi,1,...,fi,k that depend on Xi, Remove Xi
+and fi,1,...,fi,k, Add fnew,i(x) defined as:]**
+
+&#10230; 39. [Eliminasyon - Eliminasyon, Xi'yi grafikten ayıran ve Markov blanket de şartlandırılmış küçük bir alt sorunu çözen bir faktör grafiği dönüşümüdür: Xi'ye bağlı tüm fi, 1, ..., fi, k faktörlerini göz önünde bulundurun, Xi ve fi, 1, ..., fi, k, kaldır, fnew ekleyin, i (x) şöyle tanımlanır:]
+
+<br>
+
+
+**40. Treewidth ― The treewidth of a factor graph is the maximum arity of any factor created by variable elimination with the best variable ordering. In other words,**
+
+&#10230; 40. Ağaç genişliği (Treewidth) - Bir faktör grafiğinin ağaç genişliği, değişken elemeli en iyi değişken sıralamasıyla oluşturulan herhangi bir faktörün maksimum ilişki derecesidir. Diğer bir deyişle,
+
+<br>
+
+
+**41. The example below illustrates the case of a factor graph of treewidth 3.**
+
+&#10230; 41. Aşağıdaki örnek, ağaç genişliği 3 olan faktör grafiğini gösterir.
+
+<br>
+
+
+**42. Remark: finding the best variable ordering is a NP-hard problem.**
+
+&#10230; 42. Not: en iyi değişken sıralamasını bulmak NP-zor (NP-hard) bir problemdir.
+
+<br>
+
+
+**43. Bayesian networks**
+
+&#10230; 43. Bayesçi ağlar
+
+<br>
+
+
+**44. In this section, our goal will be to compute conditional probabilities. What is the probability of a query given evidence?**
+
+&#10230;44. Bu bölümün amacı koşullu olasılıkları hesaplamak olacaktır. Bir sorgunun kanıt verilmiş olma olasılığı nedir?
+
+<br>
+
+
+**45. Introduction**
+
+&#10230; 45. Giriş
+
+<br>
+
+
+**46. Explaining away ― Suppose causes C1 and C2 influence an effect E. Conditioning on the effect E and on one of the causes (say C1) changes the probability of the other cause (say C2). In this case, we say that C1 has explained away C2.**
+
+&#10230; 47. Açıklamalar - C1 ve C2 sebeplerinin E etkisini yarattığını varsayalım. E etkisinin durumu ve sebeplerden biri (C1 olduğunu varsayalım) üzerindeki etkisi, diğer sebep olan C2'nin olasılığını değiştirir. Bu durumda, C1'in C2'yi açıkladığı söylenir.
+
+<br>
+
+
+**47. Directed acyclic graph ― A directed acyclic graph (DAG) is a finite directed graph with no directed cycles.**
+
+&#10230;47. Yönlü çevrimsiz çizge - Yönlü çevrimsiz bir çizge (Directed acyclic graph-DAG), yönlendirilmiş çevrimleri olmayan sonlu bir yönlü çizgedir.
+
+<br>
+
+
+**48. Bayesian network ― A Bayesian network is a directed acyclic graph (DAG) that specifies a joint distribution over random variables X=(X1,...,Xn) as a product of local conditional distributions, one for each node:**
+
+&#10230;48. Bayesçi ağ - Her düğüm için bir tane olmak üzere, yerel koşullu dağılımların bir çarpımı olarak, X = (X1, ..., Xn) rasgele değişkenleri üzerindeki bir ortak dağılımı belirten yönlü bir çevrimsiz çizgedir:
+
+<br>
+
+
+**49. Remark: Bayesian networks are factor graphs imbued with the language of probability.**
+
+&#10230; 49. Not: Bayesçi ağlar olasılık diliyle bütünleşik faktör grafikleridir.
+
+<br>
+
+
+**50. Locally normalized ― For each xParents(i), all factors are local conditional distributions. Hence they have to satisfy:**
+
+&#10230; 50. Yerel olarak normalleştirilmiş - Her xParents (i) için tüm faktörler yerel koşullu dağılımlardır. Bu nedenle yerine getirmek zorundalar:
+
+<br>
+
+
+**51. As a result, sub-Bayesian networks and conditional distributions are consistent.**
+
+&#10230;51. Sonuç olarak, alt-Bayesçi ağlar ve koşullu dağılımlar tutarlıdır.
+
+<br>
+
+
+**52. Remark: local conditional distributions are the true conditional distributions.**
+
+&#10230; 52. Not: Yerel koşullu dağılımlar gerçek koşullu dağılımlardır.
+
+<br>
+
+
+**53. Marginalization ― The marginalization of a leaf node yields a Bayesian network without that node.**
+
+&#10230; 53. Marjinalleşme - Bir yaprak düğümünün marjinalleşmesi, o düğüm olmaksızın bir Bayesçi ağı sağlar.
+
+<br>
+
+
+**54. Probabilistic programs**
+
+&#10230; 54. Olasılık programları
+
+<br>
+
+
+**55. Concept ― A probabilistic program randomizes variables assignment. That way, we can write down complex Bayesian networks that generate assignments without us having to explicitly specify associated probabilities.**
+
+&#10230; 55. Konsept - Olasılıklı bir program değişkenlerin atanmasını randomize eder. Bu şekilde, ilişkili olasılıkları açıkça belirtmek zorunda kalmadan atamalar üreten karmaşık Bayesçi ağlar yazılabilir.
+
+<br>
+
+
+**56. Remark: examples of probabilistic programs include Hidden Markov model (HMM), factorial HMM, naive Bayes, latent Dirichlet allocation, diseases and symptoms and stochastic block models.**
+
+&#10230; 56. Not: Olasılık programlarına örnekler arasında Gizli Markov modeli (Hidden Markov model-HMM), faktöriyel HMM, naif Bayes (naive Bayes), gizli Dirichlet tahsisi (latent Dirichlet allocation), hastalıklar ve semptomları belirtirler ve stokastik blok modelleri bulunmaktadır.
+
+<br>
+
+
+**57. Summary ― The table below summarizes the common probabilistic programs as well as their applications:**
+
+&#10230; 57. Özet - Aşağıdaki tablo, ortak olasılıklı programları ve bunların uygulamalarını özetlemektedir:
+
+<br>
+
+
+**58. [Program, Algorithm, Illustration, Example]**
+
+&#10230; 58. [Program, Algoritma, Gösterim, Örnek]
+
+<br>
+
+
+**59. [Markov Model, Hidden Markov Model (HMM), Factorial HMM, Naive Bayes, Latent Dirichlet Allocation (LDA)]**
+
+&#10230; 59. [Markov Modeli, Gizli Markov Modeli (HMM), Faktöriyel HMM, Naif Bayes, Gizli Dirichlet Tahsisi (Latent Dirichlet Allocation-LDA)]
+
+<br>
+
+
+**60. [Generate, distribution]**
+
+&#10230; 60. [Üretim, Dağılım]
+
+<br>
+
+
+**61. [Language modeling, Object tracking, Multiple object tracking, Document classification, Topic modeling]**
+
+&#10230; 61. [Dil modelleme, Nesne izleme, Çoklu nesne izleme, Belge sınıflandırma, Konu modelleme]
+
+<br>
+
+
+**62. Inference**
+
+&#10230; 62. Çıkarım
+
+<br>
+
+
+**63. [General probabilistic inference strategy ― The strategy to compute the probability P(Q|E=e) of query Q given evidence E=e is as follows:, Step 1: Remove variables that are not ancestors of the query Q or the evidence E by marginalization, Step 2: Convert Bayesian network to factor graph, Step 3: Condition on the evidence E=e, Step 4: Remove nodes disconnected from the query Q by marginalization, Step 5: Run a probabilistic inference algorithm (manual, variable elimination, Gibbs sampling, particle filtering)]**
+
+&#10230; 63. [Genel olasılıksal çıkarım stratejisi - E = e kanıtı verilen Q sorgusunun P (Q | E = e) olasılığını hesaplama stratejisi aşağıdaki gibidir : Adım 1: Q sorgusunun ataları olmayan değişkenlerini ya da marjinalleştirme yoluyla E kanıtını silin, Adım 2: Bayesçi ağı faktör grafiğine dönüştürün, Adım 3: Kanıtın koşulu E = e, Adım 4: Q sorgusu ile bağlantısı kesilen düğümleri marjinalleştirme yoluyla silin, Adım 5: Olasılıklı bir çıkarım algoritması çalıştırın (kılavuz, değişken eleme, Gibbs örneklemesi, parçacık filtreleme)]
+
+<br>
+
+
+**64. Forward-backward algorithm ― This algorithm computes the exact value of P(H=hk|E=e) (smoothing query) for any k∈{1,...,L} in the case of an HMM of size L. To do so, we proceed in 3 steps:**
+
+&#10230; 64. İleri-geri algoritma - Bu algoritma, L boyutunda bir HMM durumunda herhangi bir k∈ {1, ..., L} için P (H = hk | E = e) (düzeltme sorgusu) değerini hesaplar. Bunu yapmak için 3 adımda ilerlenir:
+
+<br>
+
+
+**65. Step 1: for ..., compute ...**
+
+&#10230; 65. Adım 1: ... için (for), hesapla ...
+
+<br>
+
+
+**66. with the convention F0=BL+1=1. From this procedure and these notations, we get that**
+
+&#10230; 66. F0 = BL + 1 = 1 kuralı ile. Bu prosedürden ve bu notasyonlardan anlıyoruz ki
+
+<br>
+
+
+**67. Remark: this algorithm interprets each assignment to be a path where each edge hi−1→hi is of weight p(hi|hi−1)p(ei|hi).**
+
+&#10230; 67. Not: bu algoritma, her bir atamada her bir kenarın hi − 1 → hi'nin p (hi | hi − 1) p (ei | hi) olduğu bir yol olduğunu yorumlar.
+
+<br>
+
+
+**68. [Gibbs sampling ― This algorithm is an iterative approximate method that uses a small set of assignments (particles) to represent a large probability distribution. From a random assignment x, Gibbs sampling performs the following steps for i∈{1,...,n} until convergence:, For all u∈Domaini, compute the weight w(u) of assignment x where Xi=u, Sample v from the probability distribution induced by w: v∼P(Xi=v|X−i=x−i), Set Xi=v]**
+
+&#10230; 68. [Gibbs örneklemesi - Bu algoritma, büyük olasılık dağılımını temsil etmek için küçük bir dizi atama (parçacık) kullanan tekrarlı bir yaklaşık yöntemdir. Rasgele bir x atamasından Gibbs örneklemesi, i∈ {1, ..., n} için yakınsamaya kadar aşağıdaki adımları uygular :, Tüm u∈Domaini için, x atamasının x (u) ağırlığını hesaplayın, burada Xi = u, Sample w: v∼P (Xi = v | X − i = x − i), Set Xi = v] ile uyarılmış olasılık dağılımından
+
+<br>
+
+
+**69. Remark: X−i denotes X∖{Xi} and x−i represents the corresponding assignment.**
+
+&#10230; 69. Not: X − i, X ∖ {Xi} ve x − i, karşılık gelen atamayı temsil eder.
+
+<br>
+
+
+**70. [Particle filtering ― This algorithm approximates the posterior density of state variables given the evidence of observation variables by keeping track of K particles at a time. Starting from a set of particles C of size K, we run the following 3 steps iteratively:, Step 1: proposal - For each old particle xt−1∈C, sample x from the transition probability distribution p(x|xt−1) and add x to a set C′., Step 2: weighting - Weigh each x of the set C′ by w(x)=p(et|x), where et is the evidence observed at time t., Step 3: resampling - Sample K elements from the set C′ using the probability distribution induced by w and store them in C: these are the current particles xt.]**
+
+&#10230;70. [Parçacık filtreleme - Bu algoritma, bir seferde K parçacıklarını takip ederek gözlem değişkenlerinin kanıtı olarak verilen durum değişkenlerinin önceki yoğunluğuna yaklaşır.K boyutunda bir C parçacığı kümesinden başlayarak, aşağıdaki 3 adım tekrarlı olarak çalıştırılır: Adım 1: teklif - Her eski parçacık xt − 1∈C için, geçiş olasılığı dağılımından p (x | xt − 1) örnek x'i alın ve C ′ye ekleyin. Adım 2: ağırlıklandırma - C ′nin her x değerini w (x) = p (et | x) ile ağırlıklandırın, burada et t zamanında gözlemlenen kanıttır, Adım 3: yeniden örnekleme - w ile indüklenen olasılık dağılımını kullanarak C kümesinden örnek K elemanlarını C cinsinden saklayın: bunlar şuanki xt parçacıklarıdır.]
+
+<br>
+
+
+**71. Remark: a more expensive version of this algorithm also keeps track of past particles in the proposal step.**
+
+&#10230; 71. Not: Bu algoritmanın daha pahalı bir versiyonu da teklif adımındaki geçmiş katılımcıların kaydını tutar.
+
+<br>
+
+
+**72. Maximum likelihood ― If we don't know the local conditional distributions, we can learn them using maximum likelihood.**
+
+&#10230; 72. Maksimum olabilirlik - Yerel koşullu dağılımları bilmiyorsak, maksimum olasılık kullanarak bunları öğrenebiliriz.
+
+<br>
+
+
+**73. Laplace smoothing ― For each distribution d and partial assignment (xParents(i),xi), add λ to countd(xParents(i),xi), then normalize to get probability estimates.**
+
+&#10230; 73. Laplace yumuşatma - Her d dağılımı ve (xParents (i), xi) kısmi ataması için, countd(xParents (i), xi)'a λ ekleyin, ardından olasılık tahminlerini almak için normalleştirin.
+
+<br>
+
+
+**74. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230; 74. Algoritma - Beklenti-Maksimizasyon (EM) algoritması, olasılığa art arda bir alt sınır oluşturarak (E-adım) tekrarlayarak ve bu alt sınırın (M-adımını) optimize ederek θ parametresini maksimum olasılık tahmini ile tahmin etmede aşağıdaki gibi etkin bir yöntem sunar :
+
+<br>
+
+
+**75. [E-step: Evaluate the posterior probability q(h) that each data point e came from a particular cluster h as follows:, M-step: Use the posterior probabilities q(h) as cluster specific weights on data points e to determine θ through maximum likelihood.]**
+
+&#10230; 75. [E-adım: Her bir (e) veri noktasının belirli bir (h) kümesinden geldiği gerideki q (h) durumunu şu şekilde değerlendirin: M-adım: (maksimum olasılığını belirlemek için e veri noktalarındaki küme özgül ağırlıkları olarak gerideki olasılıklar q (h) kullanın.]
+
+<br>
+
+
+**76. [Factor graphs, Arity, Assignment weight, Constraint satisfaction problem, Consistent assignment]**
+
+&#10230; 76. [Faktör grafikleri, İlişki Derecesi, Atama ağırlığı, Kısıt memnuniyet sorunu, Tutarlı atama]
+
+<br>
+
+
+**77. [Dynamic ordering, Dependent factors, Backtracking search, Forward checking, Most constrained variable, Least constrained value]**
+
+&#10230; 77. [Dinamik düzenleşim, Bağımlı faktörler, Geri izleme araması, İleriye dönük kontrol, En kısıtlı değişken, En düşük kısıtlanmış değer]
+
+<br>
+
+
+**78. [Approximate methods, Beam search, Iterated conditional modes, Gibbs sampling]**
+
+&#10230; 78. [Yaklaşık yöntemler, Işın arama , Tekrarlı koşullu modlar, Gibbs örneklemesi]
+
+<br>
+
+
+**79. [Factor graph transformations, Conditioning, Elimination]**
+
+&#10230; 79. [Faktör grafiği dönüşümleri, Koşullandırma, Eleme]
+
+<br>
+
+
+**80. [Bayesian networks, Definition, Locally normalized, Marginalization]**
+
+&#10230; 80. [Bayesçi ağlar, Tanım, Yerel normalleştirme, Marjinalleşme]
+
+<br>
+
+
+**81. [Probabilistic program, Concept, Summary]**
+
+&#10230; 81. [Olasılık programı, Kavram, Özet]
+
+<br>
+
+
+**82. [Inference, Forward-backward algorithm, Gibbs sampling, Laplace smoothing]**
+
+&#10230; 82. [Çıkarım, İleri-geri algoritması, Gibbs örneklemesi, Laplace yumuşatması]
+
+<br>
+
+
+**83. View PDF version on GitHub**
+
+&#10230; 83. GitHub'da PDF versiyonun görüntüleyin
+
+<br>
+
+
+**84. Original authors**
+
+&#10230; 84. Orijinal yazarlar
+
+<br>
+
+
+**85. Translated by X, Y and Z**
+
+&#10230; 85. X, Y ve Z tarafından çevrilmiştir.
+
+<br>
+
+
+**86. Reviewed by X, Y and Z**
+
+&#10230; 86. X,Y,Z tarafından kontrol edilmiştir.
+
+<br>
+
+
+**87. By X and Y**
+
+&#10230; 87. X ve Y ile
+
+<br>
+
+
+**88. The Artificial Intelligence cheatsheets are now available in [target language].**
+
+&#10230;88. Yapay Zeka el kitapları artık [hedef dilde] mevcuttur.
diff --git a/tr/cheatsheet-deep-learning.md b/tr/cs-229-deep-learning.md
similarity index 92%
rename from tr/cheatsheet-deep-learning.md
rename to tr/cs-229-deep-learning.md
index da5226222..7c8b3e29e 100644
--- a/tr/cheatsheet-deep-learning.md
+++ b/tr/cs-229-deep-learning.md
@@ -24,7 +24,7 @@
 
 **5. [Input layer, hidden layer, output layer]**
 
-&#10230; [Giriş katmanı, gizli katman, ürün katmanı]
+&#10230; [Giriş katmanı, gizli katman, çıkış katmanı]
 
 <br>
 
@@ -60,7 +60,7 @@
 
 **11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
 
-&#10230; Öğrenme derecesi ― Öğrenme derecesi, sıklıkla α veya bazen η olarak belirtilir, ağırlıkların hangi tempoda güncellendiğini gösterir. Bu derece sabit olabilir veya uyarlamalı olarak değişebilir. Mevcut en gözde yöntem Adam olarak adlandırılan ve öğrenme oranını uyarlayan bir yöntemdir.
+&#10230; Öğrenme oranı ― Öğrenme oranı, sıklıkla α veya bazen η olarak belirtilir, ağırlıkların hangi tempoda güncellendiğini gösterir. Bu derece sabit olabilir veya uyarlamalı olarak değişebilir. Mevcut en gözde yöntem Adam olarak adlandırılan ve öğrenme oranını uyarlayan bir yöntemdir.
 
 <br>
 
@@ -150,7 +150,7 @@
 
 **26. [Input gate, forget gate, gate, output gate]**
 
-&#10230; [Girdi kapısı, unutma kapısı, kapı, ürün kapısı]
+&#10230; [Girdi kapısı, unutma kapısı, kapı, çıktı kapısı]
 
 <br>
 
@@ -294,28 +294,28 @@
 
 **50. View PDF version on GitHub**
 
-&#10230;
+&#10230; GitHub'da PDF sürümünü görüntüle
 
 <br>
 
 **51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
 
-&#10230;
+&#10230; [Yapay Sinir Ağları, Mimari, Aktivasyon fonksiyonu, Geri yayılım, Seyreltme]
 
 <br>
 
 **52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
 
-&#10230;
+&#10230; [Evrişimsel Sinir Ağları, Evreşim katmanı, Toplu normalizasyon]
 
 <br>
 
 **53. [Recurrent Neural Networks, Gates, LSTM]**
 
-&#10230;
+&#10230; [Yinelenen Sinir Ağları, Kapılar, LSTM]
 
 <br>
 
 **54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
 
-&#10230;
+&#10230; [Pekiştirmeli öğrenme, Markov karar süreçleri, Değer/politika iterasyonu, Yaklaşık dinamik programlama, Politika araştırması]
diff --git a/tr/refresher-linear-algebra.md b/tr/cs-229-linear-algebra.md
similarity index 100%
rename from tr/refresher-linear-algebra.md
rename to tr/cs-229-linear-algebra.md
diff --git a/tr/cs-229-machine-learning-tips-and-tricks.md b/tr/cs-229-machine-learning-tips-and-tricks.md
new file mode 100644
index 000000000..b12670229
--- /dev/null
+++ b/tr/cs-229-machine-learning-tips-and-tricks.md
@@ -0,0 +1,290 @@
+**1. Machine Learning tips and tricks cheatsheet**
+
+&#10230;  Makine Öğrenmesi ipuçları ve püf noktaları el kitabı
+
+<br>
+
+**2. Classification metrics**
+
+&#10230; Sınıflandırma metrikleri
+
+<br>
+
+**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
+
+&#10230; İkili bir sınıflandırma durumunda, modelin performansını değerlendirmek için gerekli olan ana metrikler aşağıda verilmiştir.
+
+<br>
+
+**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
+
+&#10230; Karışıklık matrisi - Karışıklık matrisi, bir modelin performansını değerlendirirken daha eksiksiz bir sonuca sahip olmak için kullanılır. Aşağıdaki şekilde tanımlanmıştır:
+
+<br>
+
+**5. [Predicted class, Actual class]**
+
+&#10230; [Tahmini sınıf, Gerçek sınıf]
+
+<br>
+
+**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
+
+&#10230; Ana metrikler - Sınıflandırma modellerinin performansını değerlendirmek için aşağıda verilen metrikler yaygın olarak kullanılmaktadır:
+
+<br>
+
+**7. [Metric, Formula, Interpretation]**
+
+&#10230; [Metrik, Formül, Açıklama]
+
+<br>
+
+**8. Overall performance of model**
+
+&#10230; Modelin genel performansı
+
+<br>
+
+**9. How accurate the positive predictions are**
+
+&#10230; Doğru tahminlerin ne kadar kesin olduğu
+
+<br>
+
+**10. Coverage of actual positive sample**
+
+&#10230; Gerçek pozitif örneklerin oranı
+
+<br>
+
+**11. Coverage of actual negative sample**
+
+&#10230; Gerçek negatif örneklerin oranı
+
+<br>
+
+**12. Hybrid metric useful for unbalanced classes**
+
+&#10230; Dengesiz sınıflar için yararlı hibrit metrik
+
+<br>
+
+**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
+
+&#10230; İşlem Karakteristik Eğrisi (ROC) ― İşlem Karakteristik Eğrisi (receiver operating curve), eşik değeri değiştirilerek Doğru Pozitif Oranı-Yanlış Pozitif Oranı grafiğidir. Bu metrikler aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+**14. [Metric, Formula, Equivalent]**
+ 
+&#10230; [Metrik, Formül, Eşdeğer]
+
+<br>
+
+**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
+
+&#10230; Eğri Altında Kalan Alan (AUC) ― Aynı zamanda AUC veya AUROC olarak belirtilen işlem karakteristik eğrisi altındaki alan, aşağıdaki şekilde gösterildiği gibi İşlem Karakteristik Eğrisi (ROC)'nin altındaki alandır:
+
+<br>
+
+**16. [Actual, Predicted]**
+
+&#10230; [Gerçek, Tahmin Edilen]
+
+<br>
+
+**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
+
+&#10230; Temel metrikler - Bir f regresyon modeli verildiğinde aşağıdaki metrikler genellikle modelin performansını değerlendirmek için kullanılır:
+
+<br>
+
+**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
+
+&#10230; [Toplam karelerinin toplamı, Karelerinin toplamının açıklaması, Karelerinin toplamından artanlar]
+
+<br>
+
+**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
+
+&#10230; Belirleme katsayısı - Genellikle R2 veya r2 olarak belirtilen belirleme katsayısı, gözlemlenen sonuçların model tarafından ne kadar iyi kopyalandığının bir ölçütüdür ve aşağıdaki gibi tanımlanır:
+
+<br>
+
+**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
+
+&#10230; Ana metrikler - Aşağıdaki metrikler, göz önüne aldıkları değişken sayısını dikkate alarak regresyon modellerinin performansını değerlendirmek için yaygın olarak kullanılır:
+
+<br>
+
+**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
+
+&#10230; burada L olabilirlik ve ˆσ2, her bir yanıtla ilişkili varyansın bir tahminidir.
+
+<br>
+
+**22. Model selection**
+
+&#10230; Model seçimi
+
+<br>
+
+**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230; Kelime Bilgisi - Bir model seçerken, aşağıdaki gibi sahip olduğumuz verileri 3 farklı parçaya ayırırız:
+
+<br>
+
+**24. [Training set, Validation set, Testing set]**
+
+&#10230; [Eğitim seti, Doğrulama seti, Test seti]
+
+<br>
+
+**25. [Model is trained, Model is assessed, Model gives predictions]**
+
+&#10230; [Model eğitildi, Model değerlendirildi, Model tahminleri gerçekleştiriyor]
+
+<br>
+
+**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
+
+&#10230; [Genelde veri kümesinin %80'i, Genelde veri kümesinin %20'si]
+
+<br>
+
+**27. [Also called hold-out or development set, Unseen data]**
+
+&#10230; [Ayrıca doğrulama için bir kısmını bekletme veya geliştirme seti olarak da bilinir, Görülmemiş veri]
+
+<br>
+
+**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+&#10230; Model bir kere seçildikten sonra, tüm veri seti üzerinde eğitilir ve görünmeyen test setinde test edilir. Bunlar aşağıdaki şekilde gösterilmiştir:
+
+<br>
+
+**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
+
+&#10230; Çapraz doğrulama ― Çapraz doğrulama, başlangıçtaki eğitim setine çok fazla güvenmeyen bir modeli seçmek için kullanılan bir yöntemdir. Farklı tipleri aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
+
+&#10230; [k − 1 katı üzerinde eğitim ve geriye kalanlar üzerinde değerlendirme, n − p gözlemleri üzerine eğitim ve kalan p üzerinde değerlendirme]
+
+<br>
+
+**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
+
+&#10230; [Genel olarak k=5 veya 10, Durum p=1'e bir tanesini dışarıda bırak denir]
+
+<br>
+
+**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
+
+&#10230; En yaygın olarak kullanılan yöntem k-kat çapraz doğrulama olarak adlandırılır ve k-1 diğer katlarda olmak üzere, bu k sürelerinin hepsinde model eğitimi yapılırken, modeli bir kat üzerinde doğrulamak için eğitim verilerini k katlarına ayırır. Hata için daha sonra k-katlar üzerinden ortalama alınır ve çapraz doğrulama hatası olarak adlandırılır.
+
+<br>
+
+**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230; Düzenlileştirme (Regularization) - Düzenlileştirme prosedürü, modelin verileri aşırı öğrenmesinden kaçınılmasını ve dolayısıyla yüksek varyans sorunları ile ilgilenmeyi amaçlamaktadır. Aşağıdaki tablo, yaygın olarak kullanılan düzenlileştirme tekniklerinin farklı türlerini özetlemektedir:
+
+
+<br>
+
+**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230; [Değişkenleri 0'a kadra küçült, Değişken seçimi için iyi, Katsayıları daha küçük yap, Değişken seçimi ile küçük katsayılar arasındaki çelişki]
+
+
+<br>
+
+**35. Diagnostics**
+
+&#10230; Tanı
+
+<br>
+
+**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
+
+&#10230; Önyargı - Bir modelin önyargısı, beklenen tahmin ve verilen veri noktaları için tahmin etmeye çalıştığımız doğru model arasındaki farktır.
+
+<br>
+
+**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
+
+&#10230; Varyans - Bir modelin varyansı, belirli veri noktaları için model tahmininin değişkenliğidir.
+ 
+<br>
+
+**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
+
+&#10230; Önyargı/varyans çelişkisi - Daha basit model, daha yüksek önyargı, ve daha karmaşık model, daha yüksek varyans.
+
+
+<br>
+
+**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
+
+&#10230; [Belirtiler, Regresyon illüstrasyonu, sınıflandırma illüstrasyonu, derin öğrenme illüstrasyonu, olası çareler]
+
+<br>
+
+**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
+
+&#10230; [Yüksek eğitim hatası, Test hatasına yakın eğitim hatası, Yüksek önyargı, Eğitim hatasından biraz daha düşük eğitim hatası, Çok düşük eğitim hatası, Eğitim hatası test hatasının çok altında, Yüksek varyans]
+
+
+<br>
+
+**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
+
+&#10230; [Model karmaşıklaştığında, Daha fazla özellik ekle, Daha uzun eğitim süresi ile eğit, Düzenlileştirme gerçekleştir, Daha fazla bilgi edin]
+
+
+<br>
+
+**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
+
+&#10230; Hata analizi - Hata analizinde mevcut ve mükemmel modeller arasındaki performans farkının temel nedeni analiz edilir.
+
+<br>
+
+**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
+
+&#10230; Ablatif analiz - Ablatif analizde mevcut ve başlangıç modelleri arasındaki performans farkının temel nedeni analiz edilir.
+
+<br>
+
+**44. Regression metrics**
+
+&#10230; Regresyon metrikleri
+
+<br>
+
+**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
+
+&#10230; [Sınıflandırma metrikleri, karışıklık matrisi, doğruluk, kesinlik, geri çağırma, F1 skoru, ROC]
+
+<br>
+
+**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
+
+&#10230; [Regresyon metrikleri, R karesi, Mallow'un CP'si, AIC, BIC]
+
+<br>
+
+**47. [Model selection, cross-validation, regularization]**
+
+&#10230; [Model seçimi, çapraz doğrulama, düzenlileştirme]
+
+<br>
+
+**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
+
+&#10230; [Tanı, Önyargı/varyans çelişkisi, hata/ablatif analiz]
diff --git a/tr/cs-229-probability.md b/tr/cs-229-probability.md
new file mode 100644
index 000000000..5e30fe358
--- /dev/null
+++ b/tr/cs-229-probability.md
@@ -0,0 +1,381 @@
+**1. Probabilities and Statistics refresher**
+
+&#10230; Olasılık ve İstatistik hatırlatma
+
+<br>
+
+**2. Introduction to Probability and Combinatorics**
+
+&#10230; Olasılık ve Kombinasyonlara Giriş
+
+<br>
+
+**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
+
+&#10230; Örnek alanı - Bir deneyin olası tüm sonuçlarının kümesidir, deneyin örnek alanı olarak bilinir ve S ile gösterilir.
+
+<br>
+
+**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
+
+&#10230; Olay - Örnek alanın herhangi bir E alt kümesi, olay olarak bilinir. Yani bir olay, deneyin olası sonuçlarından oluşan bir kümedir. Deneyin sonucu E'de varsa, E'nin gerçekleştiğini söyleriz.
+
+<br>
+
+**5. Axioms of probability: For each event E, we denote P(E) as the probability of event E occuring.**
+
+&#10230; Olasılık aksiyomları: Her E olayı için, E olayının meydana gelme olasılığı P (E) olarak ifade edilir.
+
+<br>
+
+**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
+
+&#10230; Aksiyom 1 - Her olasılık 0 ve 1 de dahil olmak üzere 0 ve 1 arasındadır, yani:
+
+<br>
+
+**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
+
+&#10230; Aksiyom 2 - Tüm örnek uzayındaki temel olaylardan en az birinin ortaya çıkma olasılığı 1'dir, yani:
+
+<br>
+
+**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
+
+&#10230; Aksiyom 3 - Karşılıklı özel olayların herhangi bir dizisi için, E1, ..., En,
+
+<br>
+
+**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
+
+&#10230; Permütasyon - Permütasyon, n nesneler havuzundan r nesnelerinin belirli bir sıra ile düzenlenmesidir. Bu tür düzenlemelerin sayısı P (n, r) tarafından aşağıdaki gibi tanımlanır:
+
+<br>
+
+**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
+
+&#10230; Kombinasyon - Bir kombinasyon, sıranın önemli olmadığı n nesneler havuzundan r nesnelerinin bir düzenlemesidir. Bu tür düzenlemelerin sayısı C (n, r) tarafından aşağıdaki gibi tanımlanır:
+
+<br>
+
+**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
+
+&#10230; Not: 0⩽r⩽n için P (n, r) ⩾C (n, r) değerine sahibiz.
+
+<br>
+
+**12. Conditional Probability**
+
+&#10230; Koşullu Olasılık
+
+<br>
+
+**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
+
+&#10230; Bayes kuralı - A ve B olayları için P (B)> 0 olacak şekilde:
+
+<br>
+
+**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
+
+&#10230; Not: P(A∩B)=P(A)P(B|A)=P(A|B)P(B)
+
+<br>
+
+**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
+
+&#10230; Parça - Tüm i değerleri için Ai≠∅ olmak üzere {Ai,i∈[[1,n]]} olsun. {Ai} bir parça olduğunu söyleriz eğer :
+
+<br>
+
+**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
+
+&#10230; Not: Örneklem uzaydaki herhangi bir B olayı için P(B)=n∑i=1P(B|Ai)P(Ai)'ye sahibiz.
+
+<br>
+
+**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
+
+&#10230; Genişletilmiş Bayes kuralı formu - {Ai,i∈[[1,n]]} örneklem uzayının bir bölümü olsun. Elde edilen:
+
+<br>
+
+**18. Independence ― Two events A and B are independent if and only if we have:**
+
+&#10230; Bağımsızlık - İki olay A ve B birbirinden bağımsızdır ancak ve ancak eğer: 
+
+<br>
+
+**19. Random Variables**
+
+&#10230; Rastgele Değişkenler
+
+<br>
+
+**20. Definitions**
+
+&#10230; Tanımlamalar
+
+<br>
+
+**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
+
+&#10230; Rastgele değişken - Genellikle X olarak ifade edilen rastgele bir değişken, bir örneklem uzayındaki her öğeyi gerçek bir çizgiye eşleyen bir fonksiyondur.
+
+<br>
+
+**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
+
+&#10230; Kümülatif dağılım fonksiyonu (KDF/ Cumulative distribution function-CDF) - Monotonik olarak azalmayan ve limx→−∞F(x)=0 ve limx→+∞F(x)=1 olacak şekilde kümülatif dağılım fonksiyonu F şu şekilde tanımlanır:
+
+<br>
+
+**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
+
+&#10230; Not: P(a<X⩽B)=F(b)−F(a).
+
+<br>
+
+**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
+
+&#10230; Olasılık yoğunluğu fonksiyonu (OYF/Probability density function-PDF) - Olasılık yoğunluğu fonksiyonu f, X'in rastgele değişkenin iki bitişik gerçekleşmesi arasındaki değerleri alması ihtimalidir.
+
+<br>
+
+**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
+
+&#10230; OYF ve KDF'yi içeren ilişkiler - Ayrık (D) ve sürekli (C) olaylarında bilmeniz gereken önemli özelliklerdir.
+
+<br>
+
+**26. [Case, CDF F, PDF f, Properties of PDF]**
+
+&#10230; [Olay, KDF F, OYF f, OYF Özellikleri]
+
+<br>
+
+**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
+
+&#10230; Beklenti ve Dağılım Momentleri - Burada, ayrık ve sürekli durumlar için beklenen değer E[X], genelleştirilmiş beklenen değer E[g(X)], k. Moment E[Xk] ve karakteristik fonksiyon ψ(ω) ifadeleri verilmiştir :
+
+<br>
+
+**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
+
+&#10230; Varyans - Genellikle Var(X) veya σ2 olarak ifade edilen rastgele değişkenin varyansı, dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
+
+<br>
+
+**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
+
+&#10230; Standart sapma - Genellikle σ olarak ifade edilen rastgele bir değişkenin standart sapması, gerçek rastgele değişkenin birimleriyle uyumlu olan dağılım fonksiyonunun yayılmasının bir ölçüsüdür. Aşağıdaki şekilde belirlenir:
+
+<br>
+
+**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
+
+&#10230; Rastgele değişkenlerin dönüşümü - X ve Y değişkenlerinin bazı fonksiyonlarla bağlanır. fX ve fY'ye sırasıyla X ve Y'nin dağılım fonksiyonu şöyledir:
+
+<br>
+
+**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
+
+&#10230; Leibniz integral kuralı - g, x'e ve potansiyel olarak c'nin, c'ye bağlı olabilecek potansiyel c ve a, b sınırlarının bir fonksiyonu olsun. Elde edilen:
+
+<br>
+
+**32. Probability Distributions**
+
+&#10230; Olasılık Dağılımları
+
+<br>
+
+**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
+
+&#10230; Chebyshev'in eşitsizliği - X beklenen değeri μ olan rastgele bir değişken olsun. K, σ>0 için aşağıdaki eşitsizliği elde edilir:
+
+<br>
+
+**34. Main distributions ― Here are the main distributions to have in mind:**
+
+&#10230; Ana dağıtımlar - İşte akılda tutulması gereken ana dağıtımlar:
+
+<br>
+
+**35. [Type, Distribution]**
+
+&#10230; [Tür, Dağılım]
+
+<br>
+
+**36. Jointly Distributed Random Variables**
+
+&#10230; Ortak Dağılımlı Rastgele Değişkenler
+
+<br>
+
+**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
+
+&#10230; Marjinal yoğunluk ve kümülatif dağılım - fXY ortak yoğunluk olasılık fonksiyonundan,
+
+<br>
+
+**38. [Case, Marginal density, Cumulative function]**
+
+&#10230; [Olay, Marjinal yoğunluk, Kümülatif fonksiyon]
+
+<br>
+
+**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
+
+&#10230; Koşullu yoğunluk - Y'ye göre X'in koşullu yoğunluğu, genellikle fX|Y olarak elde edilir:
+
+<br>
+
+**40. Independence ― Two random variables X and Y are said to be independent if we have:**
+
+&#10230; Bağımsızlık - İki rastgele değişkenin X ve Y olması durumunda bağımsız olduğu söylenir:
+
+<br>
+
+**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
+
+&#10230; Kovaryans - σ2XY veya daha genel olarak Cov(X,Y) olarak elde ettiğimiz iki rastgele değişken olan X ve Y'nin kovaryansını aşağıdaki gibi tanımlarız:
+
+<br>
+
+**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
+
+&#10230; Korelasyon - σX, σY, X ve Y'nin standart sapmalarını elde ederek, ρXY olarak belirtilen rastgele X ve Y değişkenleri arasındaki korelasyonu şu şekilde tanımlarız:
+
+<br>
+
+**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
+
+&#10230; Not 1: X, Y'nin herhangi bir rastgele değişkeni için ρXY∈ [note1,1] olduğuna dikkat edin.
+
+<br>
+
+**44. Remark 2: If X and Y are independent, then ρXY=0.**
+
+&#10230; Not 2: Eğer X ve Y bağımsızsa, ρXY = 0 olur.
+
+<br>
+
+**45. Parameter estimation**
+
+&#10230; Parametre tahmini (kestirimi)
+
+<br>
+
+**46. Definitions**
+
+&#10230; Tanımlamalar
+
+<br>
+
+**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
+
+&#10230; Rastgele örnek - Rastgele bir örnek, bağımsız ve aynı şekilde X ile dağıtılan n1, ..., Xn değişkeninin rastgele değişkenidir.
+
+<br>
+
+**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
+
+&#10230; Tahminci (Kestirimci) - Tahmin edici, istatistiksel bir modelde bilinmeyen bir parametrenin değerini ortaya çıkarmak için kullanılan verilerin bir fonksiyonudur.
+
+<br>
+
+**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
+
+&#10230; Önyargı - Bir tahmin edicinin önyargısı ^ θ, ^ θ dağılımının beklenen değeri ile gerçek değer arasındaki fark olarak tanımlanır, yani:
+
+<br>
+
+**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
+
+&#10230; Not: E [^ θ] = θ olduğunda bir tahmincinin tarafsız olduğu söylenir.
+
+<br>
+
+**51. Estimating the mean**
+
+&#10230; Ortalamayı tahmin etme
+
+<br>
+
+**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
+
+&#10230; Örnek ortalaması - Rastgele bir numunenin numune ortalaması, dağılımın gerçek ortalamasını to tahmin etmek için kullanılır, genellikle ¯¯¯¯¯X olarak belirtilir ve şöyle tanımlanır:
+
+<br>
+
+**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
+
+&#10230; Not: örnek ortalama tarafsız, yani: E[¯¯¯¯¯X]=μ.
+
+<br>
+
+**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
+
+&#10230; Merkezi Limit Teoremi - Ortalama μ ve varyans σ2 ile verilen bir dağılımın ardından rastgele bir X1, ..., Xn örneğine sahip olalım.
+
+<br>
+
+**55. Estimating the variance**
+
+&#10230; Varyansı tahmin etmek
+
+<br>
+
+**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
+
+&#10230; Örnek varyansı - Rastgele bir örneğin örnek varyansı, bir dağılımın σ2 gerçek varyansını tahmin etmek için kullanılır, genellikle s2 veya ^σ2 olarak elde edilir ve aşağıdaki gibi tanımlanır:
+
+<br>
+
+**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
+
+&#10230; Not: Örneklem sapması yansızdır,E[s2]=σ2.
+
+<br>
+
+**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
+
+&#10230; Örnek varyansı ile ki-kare ilişkisi - s2, rastgele bir örneğin örnek varyansı olsun. Elde edilir:
+
+<br>
+
+**59. [Introduction, Sample space, Event, Permutation]**
+
+&#10230; [Giriş, Örnek uzay, Olay, Permütasyon]
+
+<br>
+
+**60. [Conditional probability, Bayes' rule, Independence]**
+
+&#10230; [Koşullu olasılık, Bayes kuralı, Bağımsızlık]
+
+<br>
+
+**61. [Random variables, Definitions, Expectation, Variance]**
+
+&#10230; [Rastgele değişkenler, Tanımlamalar, Beklenti, Varyans]
+
+<br>
+
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
+
+&#10230; [Olasılık dağılımları, Chebyshev eşitsizliği, Ana dağılımlar]
+
+<br>
+
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
+
+&#10230; [Ortak dağınık rastgele değişkenler, Yoğunluk, Kovaryans, Korelasyon]
+
+<br>
+
+**64. [Parameter estimation, Mean, Variance]**
+
+&#10230; [Parameter tahmini, Ortalama, Varyans]
diff --git a/tr/cs-229-supervised-learning.md b/tr/cs-229-supervised-learning.md
new file mode 100644
index 000000000..90d816803
--- /dev/null
+++ b/tr/cs-229-supervised-learning.md
@@ -0,0 +1,567 @@
+**1. Supervised Learning cheatsheet**
+
+&#10230; Gözetimli Öğrenme El kitabı
+
+<br> 
+
+**2. Introduction to Supervised Learning**
+
+&#10230; Gözetimli Öğrenmeye Giriş
+
+<br> 
+
+**3. Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
+
+&#10230; {y(1),...,y(m)} çıktı kümesi ile ilişkili olan {x(1),...,x(m)} veri noktalarının kümesi göz önüne alındığında, y'den x'i nasıl tahmin edebileceğimizi öğrenen bir sınıflandırıcı tasarlamak istiyoruz. 
+
+<br> 
+
+**4. Type of prediction ― The different types of predictive models are summed up in the table below:**
+
+&#10230; Tahmin türü ― Farklı tahmin modelleri aşağıdaki tabloda özetlenmiştir: 
+
+<br> 
+
+**5. [Regression, Classifier, Outcome, Examples]**
+
+&#10230; [Regresyon, Sınıflandırıcı, Çıktı , Örnekler]
+
+<br> 
+
+**6. [Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
+
+&#10230; [Sürekli, Sınıf, Lineer regresyon (bağlanım), Lojistik regresyon (bağlanım), Destek Vektör Makineleri (DVM), Naive Bayes]
+
+<br> 
+
+**7. Type of model ― The different models are summed up in the table below:**
+
+&#10230; Model türleri ― Farklı modeller aşağıdaki tabloda özetlenmiştir:
+
+<br> 
+
+**8. [Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
+
+&#10230; [Ayırt edici model, Üretici model, Amaç, Öğrenilenler, Örnekleme, Örnekler]
+
+<br> 
+
+**9. [Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
+
+&#10230; [ Doğrudan tahmin P (y|x), P (y|x)'i tahmin etmek için P(x|y)'i tahmin etme, Karar Sınırı, Verilerin olasılık dağılımı, Regresyon, Destek Vektör Makineleri, Gauss Diskriminant Analizi, Naive Bayes] 
+
+<br> 
+
+**10. Notations and general concepts**
+
+&#10230; Gösterimler ve genel konsept
+
+<br>
+
+**11. Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
+
+&#10230;  Hipotez ― Hipotez hθ olarak belirtilmiştir ve bu bizim seçtiğimiz modeldir. Verilen x(i) verisi için modelin tahminlediği çıktı hθ(x(i))'dir.
+
+<br> 
+
+**12. Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
+
+&#10230;  Kayıp fonksiyonu ― L:(z,y)∈R×Y⟼L(z,y)∈R şeklinde tanımlanan bir kayıp fonksiyonu y gerçek değerine karşılık geleceği öngörülen z değerini girdi olarak alan ve ne kadar farklı olduklarını gösteren bir fonksiyondur. Yaygın kayıp fonksiyonları aşağıdaki tabloda özetlenmiştir:
+
+<br> 
+
+**13. [Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
+
+&#10230; [En küçük kareler hatası, Lojistik yitimi (kaybı), Menteşe yitimi (kaybı), Çapraz entropi]
+
+<br>
+
+**14. [Linear regression, Logistic regression, SVM, Neural Network]**
+
+&#10230; [Lineer regresyon (bağlanım), Lojistik regresyon (bağlanım), Destek Vektör Makineleri, Sinir Ağı]
+
+<br>
+
+**15. Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
+
+&#10230; Maliyet fonksiyonu ― J maliyet fonksiyonu genellikle bir modelin performansını değerlendirmek için kullanılır ve L kayıp fonksiyonu aşağıdaki gibi tanımlanır:
+
+<br>
+
+**16. Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
+
+&#10230; Bayır inişi ― α∈R öğrenme oranı olmak üzere, bayır inişi için güncelleme kuralı olarak ifade edilen öğrenme oranı ve J maliyet fonksiyonu aşağıdaki gibi ifade edilir:
+<br> 
+
+
+**17. Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
+
+&#10230; Not: Stokastik bayır inişi her eğitim örneğine bağlı olarak parametreyi günceller, ve yığın bayır inişi bir dizi eğitim örneği üzerindedir.
+
+<br>
+
+**18. Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
+
+&#10230; Olabilirlik - θ parametreleri verilen bir L (θ) modelinin olabilirliğini,olabilirliği maksimize ederek en uygun θ  parametrelerini bulmak için kullanılır. bulmak için kullanılır. Uygulamada, optimize edilmesi daha kolay olan log-olabilirlik ℓ (θ) = log (L (θ))'i kullanıyoruz. Sahip olduklarımız:
+
+<br>      
+
+**19. Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
+
+&#10230; Newton'un algoritması - ℓ′(θ)=0 olacak şekilde bir θ bulan nümerik bir yöntemdir. Güncelleme kuralı aşağıdaki gibidir:
+
+<br>
+
+**20. Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
+
+&#10230; Not: Newton-Raphson yöntemi olarak da bilinen çok boyutlu genelleme aşağıdaki güncelleme kuralına sahiptir:
+
+<br>
+
+**21. Linear models**
+
+&#10230; Lineer modeller
+
+<br>
+
+**22. Linear regression**
+
+&#10230; Lineer regresyon
+
+<br>
+
+**23. We assume here that y|x;θ∼N(μ,σ2)**
+
+&#10230;y|x;θ∼N(μ,σ2) olduğunu varsayıyoruz
+
+<br>
+
+**24. Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
+
+&#10230; Normal denklemler - X matris tasarımı olmak üzere, maliyet fonksiyonunu en aza indiren θ değeri X'in matris tasarımını not ederek, maliyet fonksiyonunu en aza indiren θ değeri kapalı formlu bir çözümdür:
+
+<br>  
+
+**25. LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
+
+&#10230; En Küçük Ortalama Kareler algoritması (Least Mean Squares-LMS) - α öğrenme oranı olmak üzere, m veri noktasını içeren eğitim kümesi için Widrow-Hoff öğrenme oranı olarak bilinen En Küçük Ortalama Kareler Algoritmasının güncelleme kuralı aşağıdaki gibidir:
+
+<br> 
+
+**26. Remark: the update rule is a particular case of the gradient ascent.**
+
+&#10230; Not: güncelleme kuralı, bayır yükselişinin özel bir halidir.
+
+<br> 
+
+**27. LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
+
+&#10230; Yerel Ağırlıklı Regresyon (Locally Weighted Regression-LWR) - LWR olarak da bilinen Yerel Ağırlıklı Regresyon ağırlıkları her eğitim örneğini maliyet fonksiyonunda w (i) (x) ile ölçen doğrusal regresyonun bir çeşididir.
+
+<br> 
+
+**28. Classification and logistic regression**
+
+&#10230; Sınıflandırma ve lojistik regresyon
+
+<br>
+
+**29. Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
+
+&#10230; Sigmoid fonksiyonu - Lojistik fonksiyonu olarak da bilinen sigmoid fonksiyonu g, aşağıdaki gibi tanımlanır:
+
+<br> 
+
+**30. Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
+
+&#10230; Lojistik regresyon - y|x;θ∼Bernoulli(ϕ) olduğunu varsayıyoruz. Aşağıdaki forma sahibiz:
+
+<br>
+
+**31. Remark: there is no closed form solution for the case of logistic regressions.**
+
+&#10230; Not: Lojistik regresyon durumunda kapalı form çözümü yoktur.
+
+<br> 
+
+**32. Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
+
+&#10230; Softmax regresyonu - Çok sınıflı lojistik regresyon olarak da adlandırılan Softmax regresyonu 2'den fazla sınıf olduğunda lojistik regresyonu genelleştirmek için kullanılır. Genel kabul olarak, her i sınıfı için Bernoulli parametresi ϕi'nin eşit olmasını sağlaması için θK=0 olarak ayarlanır.
+
+<br>
+
+**33. Generalized Linear Models**
+
+&#10230; Genelleştirilmiş Lineer Modeller
+
+<br>
+
+**34. Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
+
+&#10230; Üstel aile - Eğer kanonik parametre veya bağlantı fonksiyonu olarak adlandırılan doğal bir parametre η, yeterli bir istatistik T (y) ve aşağıdaki gibi bir log-partition fonksiyonu a (η) şeklinde yazılabilirse, dağılım sınıfının üstel ailede olduğu söylenir:
+
+<br> 
+
+**35. Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
+
+&#10230; Not: Sık sık T (y) = y olur. Ayrıca, exp (−a (η)), olasılıkların birleştiğinden emin olan normalleştirme parametresi olarak görülebilir.
+
+<br>
+
+**36. Here are the most common exponential distributions summed up in the following table:**
+
+&#10230; Aşağıdaki tabloda özetlenen en yaygın üstel dağılımlar:
+
+<br>
+
+**37. [Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
+
+&#10230; [Dağılım, Bernoulli, Gauss, Poisson, Geometrik]
+
+<br>
+
+**38. Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
+
+&#10230; Genelleştirilmiş Lineer Modellerin  (Generalized Linear Models-GLM) Yaklaşımları - Genelleştirilmiş Lineer Modeller x∈Rn+1 için rastgele bir y değişkenini tahminlemeyi hedeflen ve aşağıdaki 3 varsayıma dayanan bir fonksiyondur:
+
+<br>
+
+**39. Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
+
+&#10230; Not: sıradan en küçük kareler ve lojistik regresyon, genelleştirilmiş doğrusal modellerin özel durumlarıdır.
+
+<br>
+
+**40. Support Vector Machines**
+
+&#10230; Destek Vektör Makineleri
+
+<br>
+
+**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
+
+&#10230; Destek Vektör Makinelerinin amacı minimum mesafeyi maksimuma çıkaran doğruyu bulmaktır.
+
+<br>
+
+**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
+
+&#10230; Optimal marj sınıflandırıcısı - h optimal marj sınıflandırıcısı şöyledir:
+
+<br> 
+
+**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
+
+&#10230; burada (w,b)∈Rn×R, aşağıdaki optimizasyon probleminin çözümüdür:
+
+<br>
+
+**44. such that**
+
+&#10230; öyle ki
+
+<br>
+
+**45. support vectors**
+
+&#10230; destek vektörleri
+
+<br>
+
+**46. Remark: the line is defined as wTx−b=0.**
+
+&#10230; Not: doğru wTx−b=0 şeklinde tanımlanır.
+
+<br>
+
+**47. Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
+
+&#10230; Menteşe yitimi (kaybı) - Menteşe yitimi Destek Vektör Makinelerinin ayarlarında kullanılır ve aşağıdaki gibi tanımlanır:
+
+<br>
+
+**48. Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
+
+&#10230; Çekirdek - ϕ gibi bir özellik haritası verildiğinde, K olarak tanımlanacak çekirdeği tanımlarız:
+
+<br>  
+**49. In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
+
+&#10230; Uygulamada, K (x, z) = exp (- || x − z || 22σ2) tarafından tanımlanan çekirdek K, Gauss çekirdeği olarak adlandırılır ve yaygın olarak kullanılır.
+
+<br>
+
+**50. [Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
+
+&#10230; [Lineer olmayan ayrılabilirlik, Çekirdek Haritalamının Kullanımı, Orjinal uzayda karar sınırı]
+
+<br> 
+
+**51. Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
+
+&#10230; Not: Çekirdeği kullanarak maliyet fonksiyonunu hesaplamak için "çekirdek numarası" nı kullandığımızı söylüyoruz çünkü genellikle çok karmaşık olan ϕ açık haritalamasını bilmeye gerek yok. Bunun yerine, yalnızca K(x,z) değerlerine ihtiyacımız vardır.
+
+<br> 
+
+**52. Lagrangian ― We define the Lagrangian L(w,b) as follows:**
+
+&#10230; Lagranj - Lagranj L(w,b) şeklinde şöyle tanımlanır: 
+
+<br>
+
+**53. Remark: the coefficients βi are called the Lagrange multipliers.**
+
+&#10230; Not: βi katsayılarına Lagranj çarpanları denir.
+
+<br>
+
+**54. Generative Learning**
+
+&#10230; Üretici Öğrenme
+
+<br>
+
+**55. A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
+
+&#10230; Üretken bir model, önce Bayes kuralını kullanarak P (y | x) değerini tahmin etmek için kullanabileceğimiz P (x | y) değerini tahmin ederek verilerin nasıl üretildiğini öğrenmeye çalışır.
+
+<br>
+
+**56. Gaussian Discriminant Analysis**
+
+&#10230; Gauss Diskriminant (Ayırtaç) Analizi
+
+<br>
+
+**57. Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
+
+&#10230; Yöntem - Gauss Diskriminant Analizi y ve x|y=0 ve x|y=1 'in şu şekilde olduğunu varsayar:
+
+<br>
+
+**58. Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
+
+&#10230; Tahmin - Aşağıdaki tablo, olasılığı en üst düzeye çıkarırken bulduğumuz tahminleri özetlemektedir:
+
+<br>
+
+**59. Naive Bayes**
+
+&#10230; Naive Bayes
+
+<br>
+
+**60. Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
+
+&#10230; Varsayım - Naive Bayes modeli, her veri noktasının özelliklerinin tamamen bağımsız olduğunu varsayar:
+
+<br>
+
+**61. Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
+
+&#10230; Çözümler - Log-olabilirliğinin k∈{0,1},l∈[[1,L]] ile birlikte aşağıdaki çözümlerle maksimize edilmesi:
+
+<br>
+
+**62. Remark: Naive Bayes is widely used for text classification and spam detection.**
+
+&#10230; Not: Naive Bayes, metin sınıflandırması ve spam tespitinde yaygın olarak kullanılır.
+
+<br>
+
+**63. Tree-based and ensemble methods**
+
+&#10230; Ağaç temelli ve topluluk yöntemleri
+
+<br>
+
+**64. These methods can be used for both regression and classification problems.**
+
+&#10230; Bu yöntemler hem regresyon hem de sınıflandırma problemleri için kullanılabilir.
+
+<br>
+
+**65. CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
+
+&#10230; CART - Sınıflandırma ve Regresyon Ağaçları (Classification and Regression Trees (CART)), genellikle karar ağaçları olarak bilinir, ikili ağaçlar olarak temsil edilirler.
+
+<br> 
+
+**66. Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
+
+&#10230; Rastgele orman - Rastgele seçilen özelliklerden oluşan çok sayıda karar ağacı kullanan ağaç tabanlı bir tekniktir.
+Basit karar ağacının tersine, oldukça yorumlanamaz bir yapıdadır ancak genel olarak iyi performansı onu popüler bir algoritma yapar.
+
+<br>
+
+**67. Remark: random forests are a type of ensemble methods.**
+
+&#10230; Not: Rastgele ormanlar topluluk yöntemlerindendir.
+
+<br>
+
+**68. Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
+
+&#10230; Artırım - Artırım yöntemlerinin temel fikri bazı zayıf öğrenicileri biraraya getirerek güçlü bir öğrenici oluşturmaktır. Temel yöntemler aşağıdaki tabloda özetlenmiştir:
+
+<br> 
+
+**69. [Adaptive boosting, Gradient boosting]**
+
+&#10230; [Adaptif artırma, Gradyan artırma]
+
+<br>
+
+**70. High weights are put on errors to improve at the next boosting step**
+
+&#10230; Yüksek ağırlıklar bir sonraki artırma adımında iyileşmesi için hatalara maruz kalır.
+
+<br>
+
+**71. Weak learners trained on remaining errors**
+
+&#10230; Zayıf öğreniciler kalan hatalar üzerinde eğitildi
+
+<br>
+
+**72. Other non-parametric approaches**
+
+&#10230; Diğer parametrik olmayan yaklaşımlar
+
+<br>
+
+**73. k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
+
+&#10230; k-en yakın komşular - genellikle k-NN olarak adlandırılan k- en yakın komşular algoritması, bir veri noktasının tepkisi eğitim kümesindeki kendi k komşularının doğası ile belirlenen parametrik olmayan bir yaklaşımdır. Hem sınıflandırma hem de regresyon yöntemleri için kullanılabilir.
+
+<br>
+
+**74. Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
+
+&#10230; Not: k parametresi ne kadar yüksekse, yanlılık okadar  yüksek ve k parametresi ne kadar düşükse, varyans o kadar yüksek olur.
+
+<br>  
+
+**75. Learning Theory**
+
+&#10230; Öğrenme Teorisi
+
+<br>
+
+**76. Union bound ― Let A1,...,Ak be k events. We have:**
+
+&#10230; Birleşim sınırı - A1,...,Ak k olayları olsun. Sahip olduklarımız:
+
+<br>
+
+**77. Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
+
+&#10230; Hoeffding eşitsizliği - Z1, .., Zm, ϕ parametresinin Bernoulli dağılımından çizilen değişkenler olsun. Örnek ortalamaları mean ve γ>0 sabit olsun. Sahip olduklarımız:
+
+<br> 
+
+**78. Remark: this inequality is also known as the Chernoff bound.**
+
+&#10230; Not: Bu eşitsizlik, Chernoff sınırı olarak da bilinir.
+
+<br>
+
+**79. Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
+
+&#10230; Eğitim hatası - Belirli bir h sınıflandırıcısı için, ampirik risk veya ampirik hata olarak da bilinen eğitim hatasını ˆϵ (h) şöyle tanımlarız:
+
+<br> 
+
+**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
+
+&#10230; Olası Yaklaşık Doğru (Probably Approximately Correct (PAC)) ― PAC, öğrenme teorisi üzerine sayısız sonuçların kanıtlandığı ve aşağıdaki varsayımlara sahip olan bir çerçevedir:
+<br> 
+
+
+**81: the training and testing sets follow the same distribution **
+
+&#10230; eğitim ve test kümeleri aynı dağılımı takip ediyor
+
+<br>
+
+**82. the training examples are drawn independently**
+
+&#10230; eğitim örnekleri bağımsız olarak çizilir
+
+<br>
+
+**83. Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
+
+&#10230; Parçalanma ― S={x(1),...,x(d)} kümesi ve H sınıflandırıcıların kümesi verildiğinde, H herhangi bir etiketler kümesi S'e parçalar.
+
+<br>
+
+**84. Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
+
+&#10230; Üst sınır teoremi ― |H|=k , δ ve örneklem sayısı m'nin sabit olduğu sonlu bir hipotez sınıfı H olsun. Ardından, en az 1−δ olasılığı ile elimizde:
+
+<br>
+
+**85. VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
+
+&#10230; VC boyutu ― VC(H) olarak ifade edilen belirli bir sonsuz H hipotez sınıfının Vapnik-Chervonenkis (VC) boyutu,  H tarafından parçalanan en büyük kümenin boyutudur.
+
+<br> 
+
+**86. Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
+
+&#10230; Not: H = {2 boyutta doğrusal sınıflandırıcılar kümesi}'nin VC boyutu 3'tür.
+
+<br> 
+
+**87. Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
+
+&#10230; Teorem (Vapnik) - H, VC(H)=d ve eğitim örneği sayısı m verilmiş olsun. En az 1−δ olasılığı ile, sahip olduklarımız:
+
+<br>
+
+**88. [Introduction, Type of prediction, Type of model]**
+
+&#10230; [Giriş, Tahmin türü, Model türü]
+
+<br>
+
+**89. [Notations and general concepts, loss function, gradient descent, likelihood]**
+
+&#10230; [Notasyonlar ve genel kavramlar,kayıp fonksiyonu, bayır inişi, olabilirlik]
+
+<br> 
+
+**90. [Linear models, linear regression, logistic regression, generalized linear models]**
+
+&#10230; [Lineer modeller, Lineer regresyon, lojistik regresyon, genelleştirilmiş lineer modeller]
+
+<br>
+
+**91. [Support vector machines, Optimal margin classifier, Hinge loss, Kernel]**
+
+&#10230; [Destek vektör makineleri, optimal marj sınıflandırıcı, Menteşe yitimi, Çekirdek]
+
+<br>
+
+**92. [Generative learning, Gaussian Discriminant Analysis, Naive Bayes]**
+
+&#10230; [Üretici öğrenme, Gauss Diskriminant Analizi, Naive Bayes]
+
+<br>
+
+**93. [Trees and ensemble methods, CART, Random forest, Boosting]**
+
+&#10230; [Ağaçlar ve topluluk yöntemleri, CART, Rastegele orman, Artırma]
+
+<br>
+
+**94. [Other methods, k-NN]**
+
+&#10230; [Diğer yöntemler, k-NN]
+
+<br>
+
+**95. [Learning theory, Hoeffding inequality, PAC, VC dimension]**
+
+&#10230; [Öğrenme teorisi, Hoeffding eşitsizliği, PAC, VC boyutu]
diff --git a/tr/cs-229-unsupervised-learning.md b/tr/cs-229-unsupervised-learning.md
new file mode 100644
index 000000000..c6392c414
--- /dev/null
+++ b/tr/cs-229-unsupervised-learning.md
@@ -0,0 +1,340 @@
+**1. Unsupervised Learning cheatsheet**
+
+&#10230; Gözetimsiz Öğrenme El Kitabı
+
+<br>
+
+**2. Introduction to Unsupervised Learning**
+
+&#10230; Gözetimsiz Öğrenmeye Giriş
+
+<br>
+
+**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
+
+&#10230; Motivasyon ― Gözetimsiz öğrenmenin amacı etiketlenmemiş verilerdeki gizli örüntüleri bulmaktır {x (1), ..., x (m)}.
+
+<br>
+
+**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
+
+&#10230; Jensen eşitsizliği - f bir konveks fonksiyon ve X bir rastgele değişken olsun. Aşağıdaki eşitsizliklerimiz:
+
+<br>
+
+**5. Clustering**
+
+&#10230; Kümeleme
+
+<br>
+
+**6. Expectation-Maximization**
+
+&#10230; Beklenti-Ençoklama (Maksimizasyon)
+
+<br>
+
+**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
+
+&#10230; Gizli değişkenler - Gizli değişkenler, tahmin problemlerini zorlaştıran ve çoğunlukla z olarak adlandırılan gizli / gözlemlenmemiş değişkenlerdir. Gizli değişkenlerin bulunduğu yerlerdeki en yaygın ayarlar şöyledir:
+
+<br>
+
+**8. [Setting, Latent variable z, Comments]**
+
+&#10230; Yöntem, Gizli değişken z, Açıklamalar
+
+<br>
+
+**9. [Mixture of k Gaussians, Factor analysis]**
+
+&#10230; [K Gaussianların birleşimi, Faktör analizi]
+
+<br>
+
+**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230; Algoritma - Beklenti-Ençoklama (Maksimizasyon) (BE) algoritması, θ parametresinin maksimum olabilirlik kestirimiyle tahmin edilmesinde, olasılığa ard arda alt sınırlar oluşturan (E-adımı) ve bu alt sınırın (M-adımı) aşağıdaki gibi optimize edildiği etkin bir yöntem sunar:
+
+<br>
+
+**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
+
+&#10230; E-adımı: Her bir veri noktasının x(i)'in belirli bir kümeden z(i) geldiğinin sonsal olasılık değerinin Qi(z(i)) hesaplanması aşağıdaki gibidir:
+
+<br>
+
+**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
+
+&#10230; M-adımı: Her bir küme modelini ayrı ayrı yeniden tahmin etmek için x(i) veri noktalarındaki kümeye özgü ağırlıklar olarak Qi(z(i)) sonsal olasılıklarının kullanımı aşağıdaki gibidir:
+
+<br>
+
+**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
+
+&#10230; [Gauss ilklendirme, Beklenti adımı, Maksimizasyon adımı, Yakınsaklık]
+
+<br>
+
+**14. k-means clustering**
+
+&#10230; k-ortalamalar (k-means) kümeleme
+
+<br>
+
+**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
+
+&#10230; C(i), i veri noktasının bulunduğu küme olmak üzere, μj j kümesinin merkez noktasıdır.
+
+<br>
+
+**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230; Algoritma - Küme ortalamaları μ1, μ2, ..., μk∈Rn rasgele olarak başlatıldıktan sonra, k-ortalamalar algoritması yakınsayana kadar aşağıdaki adımı tekrar eder:
+
+<br>
+
+**17. [Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230; [Başlangıç ortalaması, Küme Tanımlama, Ortalama Güncelleme, Yakınsama]
+
+<br>
+
+**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
+
+&#10230; Bozulma fonksiyonu - Algoritmanın yakınsadığını görmek için aşağıdaki gibi tanımlanan bozulma fonksiyonuna bakarız:
+
+<br>
+
+**19. Hierarchical clustering**
+
+&#10230; Hiyerarşik kümeleme
+
+<br>
+
+**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
+
+&#10230; Algoritma - Ardışık olarak iç içe geçmiş kümelerden oluşturan hiyerarşik bir yaklaşıma sahip bir kümeleme algoritmasıdır.
+
+<br>
+
+**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
+
+&#10230; Türler - Aşağıdaki tabloda özetlenen farklı amaç fonksiyonlarını optimize etmeyi amaçlayan farklı hiyerarşik kümeleme algoritmaları vardır:
+
+<br>
+
+**22. [Ward linkage, Average linkage, Complete linkage]**
+
+&#10230; [Ward bağlantı, Ortalama bağlantı, Tam bağlantı]
+
+<br>
+
+**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
+
+&#10230; [Küme mesafesi içinde minimize edin, Küme çiftleri arasındaki ortalama uzaklığı en aza indirin, Küme çiftleri arasındaki maksimum uzaklığı en aza indirin]
+
+<br>
+
+**24. Clustering assessment metrics**
+
+&#10230; Kümeleme değerlendirme metrikleri
+
+<br>
+
+**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
+
+&#10230; Gözetimsiz bir öğrenme ortamında, bir modelin performansını değerlendirmek çoğu zaman zordur, çünkü gözetimli öğrenme ortamında olduğu gibi, gerçek referans etiketlere sahip değiliz.
+
+<br>
+
+**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
+
+&#10230; Siluet katsayısı - Bir örnek ile aynı sınıftaki diğer tüm noktalar arasındaki ortalama mesafeyi ve bir örnek ile bir sonraki en yakın kümedeki diğer tüm noktalar arasındaki ortalama mesafeyi not ederek, tek bir örnek için siluet katsayısı aşağıdaki gibi tanımlanır:
+
+<br>
+
+**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
+
+&#10230; Calinski-Harabaz indeksi - k kümelerin sayısını belirtmek üzere Bk ve Wk sırasıyla, kümeler arası ve küme içi dağılım matrisleri olarak aşağıdaki gibi tanımlanır
+
+<br>
+
+**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
+
+&#10230; Calinski-Harabaz indeksi s(k), kümelenme modelinin kümeleri ne kadar iyi tanımladığını gösterir, böylece skor ne kadar yüksek olursa, kümeler daha yoğun ve iyi ayrılır. Aşağıdaki şekilde tanımlanmıştır:
+
+<br>
+
+**29. Dimension reduction**
+
+&#10230; Boyut küçültme
+
+<br>
+
+**30. Principal component analysis**
+
+&#10230; Temel bileşenler analizi
+
+<br>
+
+**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
+
+&#10230; Verilerin yansıtılacağı yönleri maksimize eden varyansı bulan bir boyut küçültme tekniğinidir.
+
+<br>
+
+**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230; Özdeğer, özvektör - Bir matris A∈Rn×n verildiğinde λ'nın, özvektör olarak adlandırılan bir vektör z∈Rn∖{0} varsa, A'nın bir özdeğeri olduğu söylenir:
+
+<br>
+
+**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230; Spektral teorem - A∈Rn×n olsun. Eğer A simetrik ise, o zaman A gerçek ortogonal matris U∈Rn×n n ile diyagonalleştirilebilir. Λ=diag(λ1, ..., λn) yazarak, bizde:
+
+<br>
+
+**34. diagonal**
+
+&#10230; diyagonal
+
+<br>
+
+**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230; Not: En büyük özdeğere sahip özvektör, matris A'nın temel özvektörü olarak adlandırılır.
+
+<br>
+
+**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
+dimensions by maximizing the variance of the data as follows:**
+
+&#10230; Algoritma - Temel Bileşen Analizi (TBA) yöntemi, verilerin aşağıdaki gibi varyansı en üst düzeye çıkararak veriyi k boyutlarına yansıtan bir boyut azaltma tekniğidir:
+
+<br>
+
+**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230; Adım 1: Verileri ortalama 0 ve standart sapma 1 olacak şekilde normalleştirin.
+
+<br>
+
+**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
+
+&#10230; Adım 2: Gerçek özdeğerler ile simetrik olan Σ=1mm∑i=1x(i)x(i)T∈Rn×n hesaplayın.
+
+<br>
+
+**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
+
+&#10230; u1, ...,uk∈Rn olmak üzere Σ ort'nin ortogonal ana özvektörlerini, yani k en büyük özdeğerlerin ortogonal özvektörlerini hesaplayın.
+
+<br>
+
+**40. Step 4: Project the data on spanR(u1,...,uk).**
+
+&#10230; Adım 4: spanR (u1, ..., uk) üzerindeki verileri gösterin.
+
+<br>
+
+**41. This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230; Bu yöntem tüm k-boyutlu uzaylar arasındaki varyansı en üst düzeye çıkarır.
+
+<br>
+
+**42. [Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230; [Öznitelik uzayındaki veri, Temel bileşenleri bul, Temel bileşenler uzayındaki veri]
+
+<br>
+
+**43. Independent component analysis**
+
+&#10230; Bağımsız bileşen analizi
+
+<br>
+
+**44. It is a technique meant to find the underlying generating sources.**
+
+&#10230; Temel oluşturan kaynakları bulmak için kullanılan bir tekniktir.
+
+<br>
+
+**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
+
+&#10230; Varsayımlar - Verilerin x'in n boyutlu kaynak vektörü s=(s1, ..., sn) tarafından üretildiğini varsayıyoruz, burada si bağımsız rasgele değişkenler, bir karışım ve tekil olmayan bir matris A ile aşağıdaki gibi:
+
+<br>
+
+**46. The goal is to find the unmixing matrix W=A−1.**
+
+&#10230; Amaç, işlem görmemiş matrisini W=A−1 bulmaktır.
+
+<br>
+
+**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
+
+&#10230; Bell ve Sejnowski ICA algoritması - Bu algoritma, aşağıdaki adımları izleyerek işlem görmemiş matrisi W'yi bulur:
+
+<br>
+
+**48. Write the probability of x=As=W−1s as:**
+
+&#10230; X=As=W−1s olasılığını aşağıdaki gibi yazınız:
+
+<br>
+
+**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
+
+&#10230; Eğitim verisi {x(i),i∈[[1, m]]} ve g sigmoid fonksiyonunu not ederek log olasılığını yazınız:
+
+<br>
+
+**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
+
+&#10230; Bu nedenle, rassal (stokastik) eğim yükselme öğrenme kuralı, her bir eğitim örneği için x(i), W'yi aşağıdaki gibi güncelleştiririz:
+
+<br>
+
+**51. The Machine Learning cheatsheets are now available in Turkish.**
+
+&#10230; Makine Öğrenmesi El Kitabı artık Türkçe dilinde mevcuttur.
+
+<br>
+
+**52. Original authors**
+
+&#10230; Orjinal yazarlar
+
+<br>
+
+**53. Translated by X, Y and Z**
+
+&#10230; X, Y ve Z ile çevrilmiştir.
+
+<br>
+
+**54. Reviewed by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından yorumlandı
+
+<br>
+
+**55. [Introduction, Motivation, Jensen's inequality]**
+
+&#10230; [Giriş, Motivasyon, Jensen'in eşitsizliği]
+
+<br>
+
+**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
+
+&#10230; [Kümeleme, Beklenti-Ençoklama (Maksimizasyon), k-ortalamalar, Hiyerarşik kümeleme, Metrikler]
+
+<br>
+
+**57. [Dimension reduction, PCA, ICA]**
+
+&#10230; [Boyut küçültme, TBA(PCA), BBA(ICA)]
diff --git a/tr/cs-230-convolutional-neural-networks.md b/tr/cs-230-convolutional-neural-networks.md
new file mode 100644
index 000000000..e1fd03e51
--- /dev/null
+++ b/tr/cs-230-convolutional-neural-networks.md
@@ -0,0 +1,712 @@
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230; Evrişimli Sinir Ağları el kitabı
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Derin Öğrenme
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+&#10230; [Genel bakış, Mimari yapı]
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230; [Katman tipleri, Evrişim, Ortaklama, Tam bağlantı]
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230; [Filtre hiperparametreleri, Boyut, Adım aralığı/Adım kaydırma, Ekleme/Doldurma]
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230; [Hiperparametrelerin ayarlanması, Parametre uyumluluğu, Model karmaşıklığı, Receptive field]
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230; [Aktivasyon fonksiyonları, Düzeltilmiş Doğrusal Birim, Softmax]
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230; [Nesne algılama, Model tipleri, Algılama, Kesiştirilmiş Bölgeler, Maksimum olmayan bastırma, YOLO, R-CNN]
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230; [Yüz doğrulama/tanıma, Tek atış öğrenme, Siamese ağ, Üçlü yitim/kayıp]
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230; [Sinirsel stil aktarımı, Aktivasyon, Stil matrisi, Stil/içerik maliyet fonksiyonu]
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230; [İşlemsel püf nokta mimarileri, Çekişmeli Üretici Ağ, ResNet, Inception Ağı]
+
+<br>
+
+
+**12. Overview**
+
+&#10230; Genel bakış
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230; Geleneksel bir CNN (Evrişimli Sinir Ağı) mimarisi - CNN'ler olarak da bilinen evrişimli sinir ağları, genellikle aşağıdaki katmanlardan oluşan belirli bir tür sinir ağıdır:
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230; Evrişim katmanı ve ortaklama katmanı, sonraki bölümlerde açıklanan hiperparametreler ile ince ayar (fine-tuned) yapılabilir.
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230; Katman tipleri
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230; Evrişim katmanı (CONV) ― Evrişim katmanı (CONV) evrişim işlemlerini gerçekleştiren filtreleri, I girişini boyutlarına göre tararken kullanır. Hiperparametreleri F filtre boyutunu ve S adımını içerir. Elde edilen çıktı O, öznitelik haritası veya aktivasyon haritası olarak adlandırılır.
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230; Not: evrişim adımı, 1B ve 3B durumlarda da genelleştirilebilir (B: boyut).
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230; Ortaklama (POOL) - Ortaklama katmanı (POOL), tipik olarak bir miktar uzamsal değişkenlik gösteren bir evrişim katmanından sonra uygulanan bir örnekleme işlemidir. Özellikle, maksimum ve ortalama ortaklama, sırasıyla maksimum ve ortalama değerin alındığı özel ortaklama türleridir.
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230; [Tip, Amaç, Görsel Açıklama, Açıklama]
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230; [Maksimum ortaklama, Ortalama ortaklama, Her ortaklama işlemi, geçerli matrisin maksimum değerini seçer, Her ortaklama işlemi, geçerli matrisin değerlerinin ortalaması alır.]
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230; [Algılanan özellikleri korur, En çok kullanılan, Boyut azaltarak örneklenmiştelik öznitelik haritası, LeNet'te kullanılmış]
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230; Tam Bağlantı (FC) ― Tam bağlı katman (FC), her girişin tüm nöronlara bağlı olduğu bir giriş üzerinde çalışır. Eğer varsa, FC katmanları genellikle CNN mimarisinin sonuna doğru bulunur ve sınıf skorları gibi hedefleri optimize etmek için kullanılabilir.
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230; Hiperparametrelerin filtrelenmesi
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230; Evrişim katmanı, hiperparametrelerinin ardındaki anlamı bilmenin önemli olduğu filtreler içerir.
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230; Bir filtrenin boyutları - C kanalları içeren bir girişe uygulanan F×F boyutunda bir filtre, I×I×C boyutundaki bir girişte evrişim gerçekleştiren ve aynı zamanda bir çıkış özniteliği haritası üreten F aktivitesi (aktivasyon olarak da adlandırılır) O) O×O×1 boyutunda harita.
+
+<br>
+
+
+**26. Filter**
+
+&#10230; Filtre
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230; Not: F×F boyutunda K filtrelerinin uygulanması, O×O×K boyutunda bir çıktı öznitelik haritasının oluşmasını sağlar.
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230; Adım aralığı ― Evrişimli veya bir ortaklama işlemi için, S adımı (adım aralığı), her işlemden sonra pencerenin hareket ettiği piksel sayısını belirtir.
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230; Sıfır ekleme/doldurma ― Sıfır ekleme/doldurma, girişin sınırlarının her bir tarafına P sıfır ekleme işlemini belirtir. Bu değer manuel olarak belirlenebilir veya aşağıda detaylandırılan üç moddan biri ile otomatik olarak ayarlanabilir:
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230; [Mod, Değer, Görsel Açıklama, Amaç, Geçerli, Aynı, Tüm]
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230; [Ekleme/doldurma yok, Boyutlar uyuşmuyorsa son evrişimi düşürür, Öznitelik harita büyüklüğüne sahip ekleme/doldurma ⌈IS⌉, Çıktı boyutu matematiksel olarak uygundur, 'Yarım' ekleme olarak da bilinir, Son konvolüsyonların giriş sınırlarına uygulandığı maksimum ekleme, Filtre girişi uçtan uca "görür"]
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230; Hiperparametreleri ayarlama
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230; Evrişim katmanında parametre uyumu - Girdinin hacim büyüklüğü I uzunluğu, F filtresinin uzunluğu, P sıfır ekleme miktarı, S adım aralığı, daha sonra bu boyut boyunca öznitelik haritasının O çıkış büyüklüğü belirtilir:
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230; [Giriş, Filtre, Çıktı]
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230; Not: çoğunlukla, Pstart=Pend≜P, bu durumda Pstart+Pend'i yukarıdaki formülde 2P ile değiştirebiliriz.
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230; Modelin karmaşıklığını anlama - Bir modelin karmaşıklığını değerlendirmek için mimarisinin sahip olacağı parametrelerin sayısını belirlemek genellikle yararlıdır. Bir evrişimsli sinir ağının belirli bir katmanında, aşağıdaki şekilde yapılır:
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230; [Görsel Açıklama, Giriş boyutu, Çıkış boyutu, Parametre sayısı, Not]
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230; [Filtre başına bir bias(önyargı) parametresi, Çoğu durumda, S<F, K için ortak bir seçenek 2C'dir.]
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230; [Ortaklama işlemi kanal bazında yapılır, Çoğu durumda S=F]
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230; [Giriş bağlantılanmış, Nöron başına bir bias parametresi, tam bağlantı (FC) nöronlarının sayısı yapısal kısıtlamalardan arındırılmış]
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230; Evrişim sonucu oluşan haritanın boyutu ― K katmanında filtre çıkışı, k-inci aktivasyon haritasının her bir pikselinin 'görebileceği' girişin Rk×Rk olarak belirtilen alanını ifade eder. Fj, j ve Si katmanlarının filtre boyutu, i katmanının adım aralığı ve S0=1 (ilk adım aralığının 1 seçilmesi durumu) kuralıyla, k katmanındaki işlem sonucunda elde edilen aktivasyon haritasının boyutları bu formülle hesaplanabilir:
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230; Aşağıdaki örnekte, F1=F2=3 ve S1=S2=1 için R2=1+2⋅1+2⋅1=5 sonucu elde edilir.
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230; Yaygın olarak kullanılan aktivasyon fonksiyonları
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230; Düzeltilmiş Doğrusal Birim ― Düzeltilmiş doğrusal birim katmanı (ReLU), (g)'nin tüm elemanlarında kullanılan bir aktivasyon fonksiyonudur. Doğrusal olmamaları ile ağın öğrenmesi amaçlanmaktadır. Çeşitleri aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230;[ReLU, Sızıntı ReLU, ELU, ile]
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230; [Doğrusal olmama karmaşıklığı biyolojik olarak yorumlanabilir, Negatif değerler için ölen ReLU sorununu giderir, Her yerde türevlenebilir]
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230; Softmax ― Softmax adımı, x∈Rn skorlarının bir vektörünü girdi olarak alan ve mimarinin sonunda softmax fonksiyonundan p∈Rn çıkış olasılık vektörünü oluşturan genelleştirilmiş bir lojistik fonksiyon olarak görülebilir. Aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**48. where**
+
+&#10230; buna karşılık
+
+<br>
+
+
+**49. Object detection**
+
+&#10230; Nesne algılama
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230; Model tipleri ― Burada, nesne tanıma algoritmasının doğası gereği 3 farklı kestirim türü vardır. Aşağıdaki tabloda açıklanmıştır:
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230; [Görüntü sınıflandırma, Sınıflandırma ve lokalizasyon (konumlama), Algılama]
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230; [Oyuncak ayı, Kitap]
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230; [Bir görüntüyü sınıflandırır, Nesnenin olasılığını tahmin eder, Görüntüdeki bir nesneyi algılar/tanır, Nesnenin olasılığını ve bulunduğu yeri tahmin eder, Bir görüntüdeki birden fazla nesneyi algılar, Nesnelerin olasılıklarını ve nerede olduklarını tahmin eder]
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230; [Geleneksel CNN, Basitleştirilmiş YOLO (You-Only-Look-Once), R-CNN (R: Region - Bölge), YOLO, R-CNN]
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230; Algılama ― Nesne algılama bağlamında, nesneyi konumlandırmak veya görüntüdeki daha karmaşık bir şekli tespit etmek isteyip istemediğimize bağlı olarak farklı yöntemler kullanılır. İki ana tablo aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230; [Sınırlayıcı kutu ile tespit, Karakteristik nokta algılama]
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230; [Görüntüde nesnenin bulunduğu yeri algılar, Bir nesnenin şeklini veya özelliklerini algılar (örneğin gözler), Daha ayrıntılı]
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230; [Kutu merkezi (bx,by), yükseklik bh ve genişlik bw, Referans noktalar (l1x,l1y), ..., (lnx,lny)]
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230; Kesiştirilmiş Bölgeler - Kesiştirilmiş Bölgeler, IoU (Intersection over Union) olarak da bilinir, Birleştirilmiş sınırlama kutusu, tahmin edilen sınırlama kutusu (Bp) ile gerçek sınırlama kutusu Ba üzerinde ne kadar doğru konumlandırıldığını ölçen bir fonksiyondur. Olarak tanımlanır:
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230; Not: Her zaman IoU∈ [0,1] ile başlarız. Kural olarak, Öngörülen bir sınırlama kutusu Bp, IoU (Bp, Ba)⩾0.5 olması durumunda makul derecede iyi olarak kabul edilir.
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230; Öneri (Anchor) kutular, örtüşen sınırlayıcı kutuları öngörmek için kullanılan bir tekniktir. Uygulamada, ağın aynı anda birden fazla kutuyu tahmin etmesine izin verilir, burada her kutu tahmini belirli bir geometrik öznitelik setine sahip olmakla sınırlıdır. Örneğin, ilk tahmin potansiyel olarak verilen bir formun dikdörtgen bir kutusudur, ikincisi ise farklı bir geometrik formun başka bir dikdörtgen kutusudur.
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230; Maksimum olmayan bastırma - Maksimum olmayan bastırma tekniği, nesne için yinelenen ve örtüşen öneri kutuları içinde en uygun temsilleri seçerek örtüşmesi düşük olan kutuları kaldırmayı amaçlar. Olasılık tahmini 0.6'dan daha düşük olan tüm kutuları çıkardıktan sonra, kalan kutular ile aşağıdaki adımlar tekrarlanır:
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230; [Verilen bir sınıf için, Adım 1: En büyük tahmin olasılığı olan kutuyu seçin., Adım 2: Önceki kutuyla IoU⩾0.5 olan herhangi bir kutuyu çıkarın.]
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230; [Kutu tahmini/kestirimi, Maksimum olasılığa göre kutu seçimi, Aynı sınıf için örtüşme kaldırma, Son sınırlama kutuları]
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230; YOLO ― You Only Look Once (YOLO), aşağıdaki adımları uygulayan bir nesne algılama algoritmasıdır:
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230; [Adım 1: Giriş görüntüsünü G×G kare parçalara (hücrelere) bölün., Adım 2: Her bir hücre için, aşağıdaki formdan y'yi öngören bir CNN çalıştırın: k kez tekrarlayın]
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230; pc'nin bir nesneyi algılama olasılığı olduğu durumlarda, bx, by, bh, bw tespit edilen olası sınırlayıcı kutusunun özellikleridir, cl, ..., cp, p sınıflarının tespit edilen one-hot temsildir ve k öneri (anchor) kutularının sayısıdır.
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230; Adım3: Potansiyel yineli çakışan sınırlayıcı kutuları kaldırmak için maksimum olmayan bastırma algoritmasını çalıştır.
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230; [Orijinal görüntü, GxG kare parçalara (hücrelere) bölünmesi, Sınırlayıcı kutu kestirimi, Maksimum olmayan bastırma]
+ 
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230; Not: pc=0 olduğunda, ağ herhangi bir nesne algılamamaktadır. Bu durumda, ilgili bx, ..., cp tahminleri dikkate alınmamalıdır.
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230; R-CNN - Evrişimli Sinir Ağları ile Bölge Bulma (R-CNN), potansiyel olarak sınırlayıcı kutuları bulmak için görüntüyü bölütleyen (segmente eden) ve daha sonra sınırlayıcı kutularda en olası nesneleri bulmak için algılama algoritmasını çalıştıran bir nesne algılama algoritmasıdır.
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230; [Orijinal görüntü, Bölütleme (Segmentasyon), Sınırlayıcu kutu kestirimi, Maksimum olmayan bastırma]
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230; Not: Orijinal algoritma hesaplamalı olarak maliyetli ve yavaş olmasına rağmen, yeni mimariler algoritmanın Hızlı R-CNN ve Daha Hızlı R-CNN gibi daha hızlı çalışmasını sağlamıştır.
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230; Yüz doğrulama ve tanıma
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230; Model tipleri ― İki temel model aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230; [Yüz doğrulama, Yüz tanıma, Sorgu, Kaynak, Veri tabanı]
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230; [Bu doğru kişi mi?, Bire bir arama, Veritabanındaki K kişilerden biri mi?, Bire-çok arama]
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230; Tek Atış (Onr-Shot) Öğrenme - Tek Atış Öğrenme, verilen iki görüntünün ne kadar farklı olduğunu belirleyen benzerlik fonksiyonunu öğrenmek için sınırlı bir eğitim seti kullanan bir yüz doğrulama algoritmasıdır. İki resme uygulanan benzerlik fonksiyonu sıklıkla kaydedilir (resim 1, resim 2).
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230; Siyam (Siamese) Ağı - Siyam Ağı, iki görüntünün ne kadar farklı olduğunu ölçmek için görüntülerin nasıl kodlanacağını öğrenmeyi amaçlar. Belirli bir giriş görüntüsü x(i) için kodlanmış çıkış genellikle f(x(i)) olarak alınır.
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230; Üçlü kayıp - Üçlü kayıp ℓ, A (öneri), P (pozitif) ve N (negatif) görüntülerinin üçlüsünün gömülü gösterimde hesaplanan bir kayıp fonksiyonudur. Öneri ve pozitif örnek aynı sınıfa aitken, negatif örnek bir diğerine aittir. α∈R+ marjın parametresini çağırarak, bu kayıp aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230; Sinirsel stil transferi (aktarımı)
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230; Motivasyon ― Sinirsel stil transferinin amacı, verilen bir C içeriğine ve verilen bir S stiline dayanan bir G görüntüsü oluşturmaktır.
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230; [İçerik C, Stil S, Oluşturulan görüntü G]
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230; Aktivasyon ― Belirli bir l katmanında, aktivasyon [l] olarak gösterilir ve nH×nw×nc boyutlarındadır
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230; İçerik maliyeti fonksiyonu ― İçerik maliyeti fonksiyonu Jcontent(C,G), G oluşturulan görüntüsünün, C orijinal içerik görüntüsünden ne kadar farklı olduğunu belirlemek için kullanılır.Aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230; Stil matrisi - Stil matrisi G[l], belirli bir l katmanının her birinin G[l]kk′ elemanlarının k ve k′ kanallarının ne kadar ilişkili olduğunu belirlediği bir Gram matristir. A[l] aktivasyonlarına göre aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230; Not: Stil görüntüsü ve oluşturulan görüntü için stil matrisi, sırasıyla G[l] (S) ve G[l] (G) olarak belirtilmiştir.
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230; Stil maliyeti fonksiyonu - Stil maliyeti fonksiyonu Jstyle(S,G), oluşturulan G görüntüsünün S stilinden ne kadar farklı olduğunu belirlemek için kullanılır. Aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230; Genel maliyet fonksiyonu - Genel maliyet fonksiyonu, α, β parametreleriyle ağırlıklandırılan içerik ve stil maliyet fonksiyonlarının bir kombinasyonu olarak tanımlanır:
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230; Not: yüksek bir α değeri modelin içeriğe daha fazla önem vermesini sağlarken, yüksek bir β değeri de stile önem verir.
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230; Hesaplama ipuçları kullanan mimariler
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230; Çekişmeli Üretici Ağlar - GAN olarak da bilinen çekişmeli üretici ağlar, modelin üretici denen ve gerçek imajı ayırt etmeyi amaçlayan ayırıcıya beslenecek en doğru çıktının oluşturulmasını amaçladığı üretici ve ayırt edici bir modelden oluşur.
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230; [Eğitim, Gürültü, Gerçek dünya görüntüsü, Üretici, Ayırıcı, Gerçek Sahte]
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230; Not: GAN'ın kullanım alanları, yazıdan görüntüye, müzik üretimi ve sentezi.
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230; ResNet ― Artık Ağ mimarisi (ResNet olarak da bilinir), eğitim hatasını azaltmak için çok sayıda katman içeren artık bloklar kullanır. Artık blok aşağıdaki karakterizasyon denklemine sahiptir:
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230; Inception Ağ ― Bu mimari inception modüllerini kullanır ve özelliklerini çeşitlendirme yoluyla performansını artırmak için farklı evrişim kombinasyonları denemeyi amaçlamaktadır. Özellikle, hesaplama yükünü sınırlamak için 1x1 evrişm hilesini kullanır.
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230; Derinöğrenme el kitabı artık kullanıma hazır [hedef dilde].
+
+<br>
+
+
+**98. Original authors**
+
+&#10230; Orijinal yazarlar
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından çevirildi
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından kontrol edildi
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230; GitHub'da PDF sürümünü görüntüleyin
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230; X ve Y ile
+
+<br>
diff --git a/tr/cs-230-deep-learning-tips-and-tricks.md b/tr/cs-230-deep-learning-tips-and-tricks.md
new file mode 100644
index 000000000..8bc96d387
--- /dev/null
+++ b/tr/cs-230-deep-learning-tips-and-tricks.md
@@ -0,0 +1,450 @@
+**1. Deep Learning Tips and Tricks cheatsheet**
+
+&#10230; Derin öğrenme püf noktaları ve ipuçları el kitabı
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Derin Öğrenme
+
+<br>
+
+
+**3. Tips and tricks**
+
+&#10230; Püf noktaları ve ipuçları
+
+<br>
+
+
+**4. [Data processing, Data augmentation, Batch normalization]**
+
+&#10230; [Veri işleme, Veri artırma, Küme normalizasyonu]
+
+<br>
+
+**5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
+
+&#10230; [Bir sinir ağının eğitilmesi, Dönem (Epok), Mini-küme, Çapraz-entropy yitimi (kaybı), Geriye yayılım, Gradyan (Bayır) iniş, Ağırlıkların güncellenmesi, Gradyan (Bayır) kontrolü]
+
+<br>
+
+
+**6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
+
+&#10230; [Parametrelerin ayarlanması, Xavier başlatma, Transfer öğrenme, Öğrenme oranı, Uyarlamalı öğrenme oranları]
+
+<br>
+
+
+**7. [Regularization, Dropout, Weight regularization, Early stopping]**
+
+&#10230; [Düzenlileştirme, Seyreltme, Ağırlıkların düzeltilmesi, Erken durdurma]
+
+<br>
+
+
+**8. [Good practices, Overfitting small batch, Gradient checking]**
+
+&#10230; [İyi örnekler, Küçük kümelerin aşırı öğrenmesi, Gradyan kontrolü]
+
+<br>
+
+
+**9. View PDF version on GitHub**
+
+&#10230; GitHub'da PDF sürümünü görüntüleyin
+
+<br>
+
+
+**10. Data processing**
+
+&#10230; Veri işleme
+
+<br>
+
+
+**11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
+
+&#10230; Veri artırma ― Derin öğrenme modelleri genellikle uygun şekilde eğitilmek için çok fazla veriye ihtiyaç duyar. Veri artırma tekniklerini kullanarak mevcut verilerden daha fazla veri üretmek genellikle yararlıdır. Temel işlemler aşağıdaki tabloda özetlenmiştir. Daha doğrusu, aşağıdaki girdi görüntüsüne bakıldığında, uygulayabileceğimiz teknikler şunlardır:
+
+<br>
+
+
+**12. [Original, Flip, Rotation, Random crop]**
+
+&#10230; [Orijinal, Çevirme, Rotasyon (Yönlendirme), Rastgele kırpma/kesme]
+ 
+<br>
+
+
+**13. [Image without any modification, Flipped with respect to an axis for which the meaning of the image is preserved, Rotation with a slight angle, Simulates incorrect horizon calibration, Random focus on one part of the image, Several random crops can be done in a row]**
+
+&#10230; [Herhangi bir değişiklik yapılmamış görüntü, Görüntünün anlamının korunduğu bir eksene göre çevrilmiş görüntü, Hafif açılı döndürme, Yanlış yatay kalibrasyonu simule eder, Görüntünün bir bölümüne rastgele odaklanma, Arka arkaya birkaç rasgele kesme yapılabilir]
+
+<br>
+
+
+**14. [Color shift, Noise addition, Information loss, Contrast change]**
+
+&#10230; [Renk değişimi, Gürültü ekleme, Bilgi kaybı, Kontrast değişimi]
+
+<br>
+
+
+**15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
+
+&#10230; [RGB'nin nüansları biraz değiştirilmesi, Işığa maruz kalırken oluşabilecek gürültü, Gürültü ekleme, Girdilerin kalite değişkenliğine daha fazla toleranslı olması, Yok sayılan görüntüler, Görüntünün parçalardaki olası kayıplarını kopyalanması, Gün içindeki ışık ve renk değişimim kontrolü]
+
+<br>
+
+
+**16. Remark: data is usually augmented on the fly during training.**
+
+&#10230; Not: Veriler genellikle eğitim sırasında artırılır.
+
+<br>
+
+
+**17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
+
+&#10230; Küme normalleştirme - Bu, {xi} kümesini normalleştiren, β hiperparametresinin bir adımıdır. μB ve σ2B'ye dikkat ederek, kümeyi düzeltmek istediklerimizin ortalaması ve varyansı şu şekilde yapılır:
+
+<br>
+
+
+**18. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
+
+&#10230; Genellikle tam-tüm bağlı/evrişimli bir katmandan sonra ve doğrusal olmayan bir katmandan önce yapılır. Daha yüksek öğrenme oranlarına izin vermeyi ve başlangıç durumuna güçlü bir şekilde bağımlılığı azaltmayı amaçlar.
+
+<br>
+
+
+**19. Training a neural network**
+
+&#10230; Bir sinir ağının eğitilmesi
+
+<br>
+
+
+**20. Definitions**
+
+&#10230; Tanımlamalar
+
+<br>
+
+
+**21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
+
+&#10230; Dönem (Epok/Epoch) ― Bir modelin eğitimi kapsamında, modelin ağırlıklarını güncellemek için tüm eğitim setini kullandığı bir yinelemeye ifade etmek için kullanılan bir terimdir.
+
+<br>
+
+
+**22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
+
+&#10230; Mini-küme gradyan (bayır) iniş ― Eğitim aşamasında, ağırlıkların güncellenmesi genellikle hesaplama karmaşıklıkları nedeniyle bir kerede ayarlanan tüm eğitime veya gürültü sorunları nedeniyle bir veri noktasına dayanmaz. Bunun yerine, güncelleme adımı bir toplu işdeki veri noktalarının sayısının ayarlayabileceğimiz bir hiperparametre olduğu mini kümelerle yapılır. Veriler mini-kümeler halinde alınır.
+
+<br>
+
+
+**23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
+
+&#10230; Yitim fonksiyonu  ― Belirli bir modelin nasıl bir performans gösterdiğini ölçmek için, L yitim (kayıp) fonksiyonu genellikle y gerçek çıktıların, z model çıktıları tarafından ne kadar doğru tahmin edildiğini değerlendirmek için kullanılır.
+
+<br>
+
+
+**24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
+
+&#10230; Çapraz-entropi kaybı ― Yapay sinir ağlarında ikili sınıflandırma bağlamında, çapraz entropi kaybı L (z, y) yaygın olarak kullanılır ve şöyle tanımlanır:
+
+<br>
+
+
+**25. Finding optimal weights**
+
+&#10230; Optimum ağırlıkların bulunması
+
+<br>
+
+
+**26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
+
+&#10230; Geriye yayılım ― Geri yayılım, asıl çıktıyı ve istenen çıktıyı dikkate alarak sinir ağındaki ağırlıkları güncellemek için kullanılan bir yöntemdir. Her bir ağırlığa göre türev, zincir kuralı kullanılarak hesaplanır.
+
+<br>
+
+
+**27. Using this method, each weight is updated with the rule:**
+
+&#10230; Bu yöntemi kullanarak, her ağırlık kurala göre güncellenir:
+
+<br>
+
+
+**28. Updating weights ― In a neural network, weights are updated as follows:**
+
+&#10230; Ağırlıkların güncellenmesi ― Bir sinir ağında, ağırlıklar aşağıdaki gibi güncellenir:
+
+<br>
+
+
+**29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
+
+&#10230; [Adım 1: Bir küme eğitim verisi alın ve kaybı hesaplamak için ileriye doğru ilerleyin, Step 2: Her ağırlığa göre kaybın derecesini elde etmek için kaybı tekrar geriye doğru yayın, Adım 3: Ağın ağırlıklarını güncellemek için gradyanları kullanın.]
+
+
+<br>
+
+
+**30. [Forward propagation, Backpropagation, Weights update]**
+
+&#10230; [İleri yayılım, Geriye yayılım, Ağırlıkların güncellenmesi]
+
+<br>
+
+
+**31. Parameter tuning**
+
+&#10230; Parametre ayarlama
+
+<br>
+
+
+**32. Weights initialization**
+
+&#10230; Ağırlıkların başlangıçlandırılması
+
+<br>
+
+
+**33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
+
+&#10230; Xavier başlangıcı (ilklendirme) ― Ağırlıkları tamamen rastgele bir şekilde başlatmak yerine, Xavier başlangıcı, mimariye özgü özellikleri dikkate alan ilk ağırlıkların alınmasını sağlar.
+
+<br>
+
+
+**34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
+
+&#10230; Transfer öğrenme ― Bir derin öğrenme modelini eğitmek çok fazla veri ve daha da önemlisi çok zaman gerektirir. Kullanım durumumuza yönelik eğitim yapmak ve güçlendirmek için günler/haftalar süren dev veri setleri üzerinde önceden eğitilmiş ağırlıklardan yararlanmak genellikle yararlıdır. Elimizdeki ne kadar veri olduğuna bağlı olarak, aşağıdakilerden yararlanmanın farklı yolları:
+
+<br>
+
+
+**35. [Training size, Illustration, Explanation]**
+
+&#10230; [Eğitim boyutu, Görselleştirme, Açıklama]
+
+<br>
+
+
+**36. [Small, Medium, Large]**
+
+&#10230; [Küçük, Orta, Büyük]
+
+<br>
+
+
+**37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
+
+&#10230; [Tüm katmanlar dondurulur, Softmax'taki ağırlıkları eğitilir, Çoğu katmanlar dondurulur, son katmanlar ve softmax katmanı ağırlıklar ile eğitilir, Önceden eğitilerek elde edilen ağırlıkları kullanarak katmanlar ve softmax için kullanır]
+
+<br>
+
+
+**38. Optimizing convergence**
+
+&#10230; Yakınsamayı optimize etmek
+
+<br>
+
+
+**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
+
+&#10230; Öğrenme oranı (adımı) ― Genellikle α veya bazen η olarak belirtilen öğrenme oranı, ağırlıkların hangi hızda güncellendiğini belirler. Sabitlenebilir veya uyarlanabilir şekilde değiştirilebilir. Mevcut en popüler yöntemin adı Adam'dır ve öğrenme hızını ayarlayan bir yöntemdir.
+
+<br>
+
+**40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
+
+&#10230; Uyarlanabilir öğrenme oranları ― Bir modelin eğitilmesi sırasında öğrenme oranının değişmesine izin vermek eğitim süresini kısaltabilir ve sayısal optimum çözümü iyileştirebilir. Adam optimizasyonu yöntemi en çok kullanılan teknik olmasına rağmen, diğer yöntemler de faydalı olabilir. Bunlar aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**41. [Method, Explanation, Update of w, Update of b]**
+
+&#10230; [Yöntem, Açıklama, w'ların güncellenmesi, b'nin güncellenmesi]
+
+<br>
+
+
+**42. [Momentum, Dampens oscillations, Improvement to SGD, 2 parameters to tune]**
+
+&#10230; [Momentum, Osilasyonların azaltılması/yumuşatılması, SGD (Stokastik Gradyan/Bayır İniş) iyileştirmesi, Ayarlanacak 2 parametre]
+
+<br>
+
+
+**43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
+
+&#10230; [RMSprop, Ortalama Karekök yayılımı, Osilasyonları kontrol ederek öğrenme algoritmasını hızlandırır]
+
+<br>
+
+
+**44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
+
+&#10230; [Adam, Uyarlamalı Moment tahmini/kestirimi, En popüler yöntem, Ayarlanacak 4 parametre]
+
+<br>
+
+
+**45. Remark: other methods include Adadelta, Adagrad and SGD.**
+
+&#10230; Not: diğer yöntemler içinde Adadelta, Adagrad ve SGD.
+
+<br>
+
+
+**46. Regularization**
+
+&#10230; Düzenlileştirme
+
+<br>
+
+
+**47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
+
+&#10230; Seyreltme ― Seyreltme, sinir ağlarında, p>0 olasılıklı nöronları silerek eğitim verilerinin fazla kullanılmaması için kullanılan bir tekniktir. Modeli, belirli özellik kümelerine çok fazla güvenmekten kaçınmaya zorlar.
+
+<br>
+
+
+**48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
+
+&#10230; Not: Çoğunlukla derin öğrenme kütüphanleri, 'keep' ('tutma') parametresi 1−p aracılığıyla seyreltmeyi parametrize eder.
+
+<br>
+
+
+**49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
+
+&#10230; Ağırlık düzenlileştirme ― Ağırlıkların çok büyük olmadığından ve modelin eğitim setine uygun olmadığından emin olmak için, genellikle model ağırlıklarında düzenlileştirme teknikleri uygulanır. Temel olanlar aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**50. [LASSO, Ridge, Elastic Net]**
+
+&#10230; [LASSO, Ridge, Elastic Net]
+
+<br>
+
+**50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230; [Katsayıları 0'a düşürür, Değişken seçimi için iyi, Katsayıları daha küçük yapar, Değişken seçimi ile küçük katsayılar arasında ödünleşim sağlar]
+
+<br>
+
+**51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
+
+&#10230; Erken durdurma ― Bu düzenleme tekniği, onaylama kaybı bir stabilliğe ulaştığında veya artmaya başladığında eğitim sürecini durdurur.
+
+<br>
+
+
+**52. [Error, Validation, Training, early stopping, Epochs]**
+
+&#10230; [Hata, Geçerleme/Doğrulama, Eğitim, erken durdurma, Epochs]
+
+<br>
+
+
+**53. Good practices**
+
+&#10230; İyi uygulamalar
+
+<br>
+
+
+**54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
+
+&#10230; Küçük kümelerin ezberlenmesi ― Bir modelde hata ayıklama yaparken, modelin mimarisinde büyük bir sorun olup olmadığını görmek için hızlı testler yapmak genellikle yararlıdır. Özellikle, modelin uygun şekilde eğitilebildiğinden emin olmak için, ezberleyecek mi diye görmek için ağ içinde bir mini küme ile eğitilir. Olmazsa, modelin normal boyutta bir eğitim setini bırakmadan, küçük bir kümeyi bile ezberleyecek kadar çok karmaşık ya da yeterince karmaşık olmadığı anlamına gelir. 
+
+<br>
+
+
+**55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
+
+&#10230; Gradyanların kontrolü ― Gradyan kontrolü, bir sinir ağının geriye doğru geçişinin uygulanması sırasında kullanılan bir yöntemdir. Analitik gradyanların değerini verilen noktalardaki sayısal gradyanlarla karşılaştırır ve doğruluk için bir kontrol rolü oynar.
+
+<br>
+
+
+**56. [Type, Numerical gradient, Analytical gradient]**
+
+&#10230; [Tip, Sayısal gradyan, Analitik gradyan]
+
+<br>
+
+
+**57. [Formula, Comments]**
+
+&#10230; [Formül, Açıklamalar]
+
+<br>
+
+
+**58. [Expensive; loss has to be computed two times per dimension, Used to verify correctness of analytical implementation, Trade-off in choosing h not too small (numerical instability) nor too large (poor gradient approximation)]**
+
+&#10230; [Maliyetli; Kayıp, boyut başına iki kere hesaplanmalı, Analitik uygulamanın doğruluğunu anlamak için kullanılır, Ne çok küçük (sayısal dengesizlik) ne de çok büyük (zayıf gradyan yaklaşımı) seçimi yapılmalı, bunun için ödünleşim gerekir]
+
+<br>
+
+
+**59. ['Exact' result, Direct computation, Used in the final implementation]**
+
+&#10230; ['Kesin' sonuç, Doğrudan hesaplama, Son uygulamada kullanılır]
+
+<br>
+
+
+**60. The Deep Learning cheatsheets are now available in [target language].
+
+&#10230; Derin Öğrenme el kitabı şimdi [hedef dilde] mevcuttur.
+
+**61. Original authors**
+
+&#10230; Orijinal yazarlar
+
+<br>
+
+**62.Translated by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından çevirildi
+
+<br>
+
+**63.Reviewed by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından gözden geçirildi
+
+<br>
+
+**64.View PDF version on GitHub**
+
+&#10230; GitHub'da PDF sürümünü görüntüleyin
+
+<br>
+
+**65.By X and Y**
+
+&#10230; X ve Y tarafından
+
+<br>
diff --git a/tr/cs-230-recurrent-neural-networks.md b/tr/cs-230-recurrent-neural-networks.md
new file mode 100644
index 000000000..17536b665
--- /dev/null
+++ b/tr/cs-230-recurrent-neural-networks.md
@@ -0,0 +1,674 @@
+**1. Recurrent Neural Networks cheatsheet**
+
+&#10230; Tekrarlayan Yapay Sinir Ağları (Recurrent Neural Networks-RNN) El Kitabı
+
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS 230 - Derin Öğrenme
+
+<br>
+
+
+**3. [Overview, Architecture structure, Applications of RNNs, Loss function, Backpropagation]**
+
+&#10230; [Genel bakış, Mimari yapı, RNN'lerin uygulamaları, Kayıp fonksiyonu, Geriye Yayılım]
+
+<br>
+
+
+**4. [Handling long term dependencies, Common activation functions, Vanishing/exploding gradient, Gradient clipping, GRU/LSTM, Types of gates, Bidirectional RNN, Deep RNN]**
+
+&#10230; [Uzun vadeli bağımlılıkların ele alınması, Ortak aktivasyon fonksiyonları, Gradyanın kaybolması / patlaması, Gradyan kırpma, GRU / LSTM, Kapı tipleri, Çift Yönlü RNN, Derin RNN]
+
+<br>
+
+
+**5. [Learning word representation, Notations, Embedding matrix, Word2vec, Skip-gram, Negative sampling, GloVe]**
+
+&#10230; [Kelime gösterimini öğrenme, Notasyonlar, Gömme matrisi, Word2vec, Skip-gram, Negatif örnekleme, GloVe]
+
+<br>
+
+
+**6. [Comparing words, Cosine similarity, t-SNE]**
+
+&#10230; [Kelimeleri karşılaştırmak, Cosine benzerliği, t-SNE]
+
+<br>
+
+
+**7. [Language model, n-gram, Perplexity]**
+
+&#10230; [Dil modeli, n-gram, Karışıklık]
+
+<br>
+
+
+**8. [Machine translation, Beam search, Length normalization, Error analysis, Bleu score]**
+
+&#10230; [Makine çevirisi, Işın araması, Uzunluk normalizasyonu, Hata analizi, Bleu skoru]
+
+<br>
+
+
+**9. [Attention, Attention model, Attention weights]**
+
+&#10230; [Dikkat, Dikkat modeli, Dikkat ağırlıkları]
+
+<br>
+
+
+**10. Overview**
+
+&#10230; Genel Bakış
+
+<br>
+
+
+**11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
+
+&#10230; Geleneksel bir RNN mimarisi - RNN'ler olarak da bilinen tekrarlayan sinir ağları, gizli durumlara sahipken önceki çıktıların girdi olarak kullanılmasına izin veren bir sinir ağları sınıfıdır. Tipik olarak aşağıdaki gibidirler:
+
+<br>
+
+
+**12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
+
+&#10230; Her bir t zamanında, a<t> aktivasyonu ve y<t> çıktısı aşağıdaki gibi ifade edilir:
+
+<br>
+
+
+**13. and**
+
+&#10230; ve
+
+<br>
+
+
+**14. where Wax,Waa,Wya,ba,by are coefficients that are shared temporally and g1,g2 activation functions.**
+
+&#10230; burada Wax,Waa,Wya,ba,by geçici olarak paylaşılan katsayılardır ve g1,g2 aktivasyon fonksiyonlarıdır.
+
+<br>
+
+
+**15. The pros and cons of a typical RNN architecture are summed up in the table below:**
+
+&#10230; Tipik bir RNN mimarisinin artıları ve eksileri aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**16. [Advantages, Possibility of processing input of any length, Model size not increasing with size of input, Computation takes into account historical information, Weights are shared across time]**
+
+&#10230; [Avantajlar, Herhangi bir uzunluktaki girdilerin işlenmesi imkanı, Girdi büyüklüğüyle artmayan model boyutu, Geçmiş bilgileri dikkate alarak hesaplama, Zaman içinde paylaşılan ağırlıklar]
+
+<br>
+
+
+**17. [Drawbacks, Computation being slow, Difficulty of accessing information from a long time ago, Cannot consider any future input for the current state]**
+
+&#10230; [Dezavantajları, Yavaş hesaplama, Uzun zaman önceki bilgiye erişme zorluğu, Mevcut durum için gelecekteki herhangi bir girdinin düşünülememesi]
+
+<br>
+
+
+**18. Applications of RNNs ― RNN models are mostly used in the fields of natural language processing and speech recognition. The different applications are summed up in the table below:**
+
+&#10230; RNN'lerin Uygulamaları ― RNN modelleri çoğunlukla doğal dil işleme ve konuşma tanıma alanlarında kullanılır. Farklı uygulamalar aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**19. [Type of RNN, Illustration, Example]**
+
+&#10230; [RNN Türü, Örnekleme, Örnek]
+
+<br>
+
+
+**20. [One-to-one, One-to-many, Many-to-one, Many-to-many]**
+
+&#10230; [Bire bir, Bire çok, Çoka bir, Çoka çok]
+
+<br>
+
+
+**21. [Traditional neural network, Music generation, Sentiment classification, Name entity recognition, Machine translation]**
+
+&#10230; [Geleneksel sinir ağı, Müzik üretimi, Duygu sınıflandırma, İsim varlık tanıma, Makine çevirisi]
+
+<br>
+
+
+**22. Loss function ― In the case of a recurrent neural network, the loss function L of all time steps is defined based on the loss at every time step as follows:**
+
+&#10230; Kayıp fonksiyonu ― Tekrarlayan bir sinir ağı olması durumunda, tüm zaman dilimlerindeki L kayıp fonksiyonu, her zaman dilimindeki kayıbı temel alınarak aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**23. Backpropagation through time ― Backpropagation is done at each point in time. At timestep T, the derivative of the loss L with respect to weight matrix W is expressed as follows:**
+
+&#10230; Zamanla geri yayılım ― Geriye yayılım zamanın her noktasında yapılır. T zaman diliminde, ağırlık matrisi W'ye göre L kaybının türevi aşağıdaki gibi ifade edilir:
+
+<br>
+
+
+**24. Handling long term dependencies**
+
+&#10230; Uzun vadeli bağımlılıkların ele alınması
+
+<br>
+
+
+**25. Commonly used activation functions ― The most common activation functions used in RNN modules are described below:**
+
+&#10230; Yaygın olarak kullanılan aktivasyon fonksiyonları ― RNN modüllerinde kullanılan en yaygın aktivasyon fonksiyonları aşağıda açıklanmıştır:
+
+<br>
+
+
+**26. [Sigmoid, Tanh, RELU]**
+
+&#10230; [Sigmoid, Tanh, RELU]
+
+<br>
+
+
+**27. Vanishing/exploding gradient ― The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long term dependencies because of multiplicative gradient that can be exponentially decreasing/increasing with respect to the number of layers.**
+
+&#10230; Kaybolan / patlayan gradyan ― Kaybolan ve patlayan gradyan fenomenlerine RNN'ler bağlamında sıklıkla rastlanır. Bunların olmasının nedeni, katman sayısına göre katlanarak azalan / artan olabilen çarpımsal gradyan nedeniyle uzun vadeli bağımlılıkları yakalamanın zor olmasıdır.
+
+<br>
+
+
+**28. Gradient clipping ― It is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.**
+
+&#10230; Gradyan kırpma ― Geri yayılım işlemi sırasında bazen karşılaşılan patlayan gradyan sorunuyla başa çıkmak için kullanılan bir tekniktir. Gradyan için maksimum değeri sınırlayarak, bu durum pratikte kontrol edilir.
+
+<br>
+
+
+**29. clipped**
+
+&#10230; kırpılmış
+
+<br>
+
+
+**30. Types of gates ― In order to remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted Γ and are equal to:**
+
+&#10230; Giriş Kapıları Çeşitleri ― Kaybolan gradyan problemini çözmek için bazı RNN türlerinde belirli kapılar kullanılır ve genellikle iyi tanımlanmış bir amaca sahiptir. Genellikle Γ olarak ifade edilir ve şuna eşittir:
+
+<br>
+
+
+**31. where W,U,b are coefficients specific to the gate and σ is the sigmoid function. The main ones are summed up in the table below:**
+
+&#10230; burada W, U, b kapıya özgü katsayılardır ve σ ise sigmoid fonksiyondur. Temel olanlar aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**32. [Type of gate, Role, Used in]**
+
+&#10230; [Kapının tipi, Rol, Kullanılan]
+
+<br>
+
+
+**33. [Update gate, Relevance gate, Forget gate, Output gate]**
+
+&#10230; [Güncelleme kapısı, Uygunluk kapısı, Unutma kapısı, Çıkış kapısı]
+
+<br>
+
+
+**34. [How much past should matter now?, Drop previous information?, Erase a cell or not?, How much to reveal of a cell?]**
+
+&#10230; [Şimdi ne kadar geçmiş olması gerekir?, Önceki bilgiyi bırak?, Bir hücreyi sil ya da silme?, Bir hücreyi ortaya çıkarmak için ne kadar?]
+
+<br>
+
+
+**35. [LSTM, GRU]**
+
+&#10230; [LSTM, GRU]
+
+<br>
+
+
+**36. GRU/LSTM ― Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU. Below is a table summing up the characterizing equations of each architecture:**
+
+&#10230; GRU/LSTM ― Geçitli Tekrarlayan Birim (Gated Recurrent Unit-GRU) ve Uzun Kısa Süreli Bellek Birimleri (Long Short-Term Memory-LSTM), geleneksel RNN'lerin karşılaştığı kaybolan gradyan problemini ele alır, LSTM ise GRU'nun genelleştirilmiş halidir. Her bir mimarinin karakterizasyon denklemlerini özetleyen tablo aşağıdadır:
+
+<br>
+
+
+**37. [Characterization, Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), Dependencies]**
+
+&#10230; [Karakterizasyon, Geçitli Tekrarlayan Birim (GRU), Uzun Kısa Süreli Bellek (LSTM), Bağımlılıklar]
+
+<br>
+
+
+**38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
+
+&#10230; Not: ⋆ işareti iki vektör arasındaki birimsel çarpımı belirtir.
+
+<br>
+
+
+**39. Variants of RNNs ― The table below sums up the other commonly used RNN architectures:**
+
+&#10230; RNN varyantları ― Aşağıdaki tablo, diğer yaygın kullanılan RNN mimarilerini özetlemektedir:
+
+<br>
+
+
+**40. [Bidirectional (BRNN), Deep (DRNN)]**
+
+&#10230; [Çift Yönlü (Bidirectional-BRNN), Derin (Deep-DRNN)]
+
+<br>
+
+
+**41. Learning word representation**
+
+&#10230; Kelime temsilini öğrenme
+
+<br>
+
+
+**42. In this section, we note V the vocabulary and |V| its size.**
+
+&#10230; Bu bölümde V kelimeleri, |V| ise kelimelerin boyutlarını ifade eder.
+
+<br>
+
+
+**43. Motivation and notations**
+
+&#10230; Motivasyon ve notasyon
+
+<br>
+
+
+**44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
+
+&#10230; Temsil etme teknikleri ― Kelimeleri temsil etmenin iki temel yolu aşağıdaki tabloda özetlenmiştir:
+
+<br>
+
+
+**45. [1-hot representation, Word embedding]**
+
+&#10230; [1-hot gösterim, Kelime gömme]
+
+<br>
+
+
+**46. [teddy bear, book, soft]**
+
+&#10230; [oyuncak ayı, kitap, yumuşak]
+
+<br>
+
+
+**47. [Noted ow, Naive approach, no similarity information, Noted ew, Takes into account words similarity]**
+
+&#10230;
+
+<br> [ow not edildi, Naive yaklaşım, benzerlik bilgisi yok, ew not edildi, kelime benzerliği dikkate alınır]
+
+
+**48. Embedding matrix ― For a given word w, the embedding matrix E is a matrix that maps its 1-hot representation ow to its embedding ew as follows:**
+
+&#10230; Gömme matrisi ― Belirli bir w kelimesi için E gömme matrisi, 1-hot temsilini ew gömmesi sayesinde aşağıdaki gibi eşleştiren bir matristir:
+
+<br>
+
+
+**49. Remark: learning the embedding matrix can be done using target/context likelihood models.**
+
+&#10230; Not: Gömme matrisinin öğrenilmesi hedef / içerik olabilirlik modelleri kullanılarak yapılabilir.
+
+<br>
+
+
+**50. Word embeddings**
+
+&#10230; Kelime gömmeleri
+
+<br>
+
+
+**51. Word2vec ― Word2vec is a framework aimed at learning word embeddings by estimating the likelihood that a given word is surrounded by other words. Popular models include skip-gram, negative sampling and CBOW.**
+
+&#10230; Word2vec ― Word2vec, belirli bir kelimenin diğer kelimelerle çevrili olma olasılığını tahmin ederek kelime gömmelerini öğrenmeyi amaçlayan bir çerçevedir. Popüler modeller arasında skip-gram, negatif örnekleme ve CBOW bulunur.
+
+<br>
+
+
+**52. [A cute teddy bear is reading, teddy bear, soft, Persian poetry, art]**
+
+&#10230; [Sevimli ayıcık okuyor, ayıcık, yumuşak, Farsça şiir, sanat]
+
+<br>
+
+
+**53. [Train network on proxy task, Extract high-level representation, Compute word embeddings]**
+
+&#10230; [Proxy görevinde ağı eğitme, üst düzey gösterimi çıkartme, Kelime gömme hesaplama]
+
+<br>
+
+
+**54. Skip-gram ― The skip-gram word2vec model is a supervised learning task that learns word embeddings by assessing the likelihood of any given target word t happening with a context word c. By noting θt a parameter associated with t, the probability P(t|c) is given by:**
+
+&#10230; Skip-gram ― Skip-gram word2vec modeli verilen herhangi bir t hedef kelimesinin c gibi bir bağlam kelimesi ile gerçekleşme olasılığını değerlendirerek kelime gömmelerini öğrenen denetimli bir öğrenme görevidir.
+
+<br>
+
+
+**55. Remark: summing over the whole vocabulary in the denominator of the softmax part makes this model computationally expensive. CBOW is another word2vec model using the surrounding words to predict a given word.**
+
+&#10230; Not: Softmax bölümünün paydasındaki tüm kelime dağarcığını toplamak, bu modeli hesaplama açısından maliyetli kılar. CBOW, verilen bir kelimeyi tahmin etmek için çevreleyen kelimeleri kullanan başka bir word2vec modelidir.
+
+<br>
+
+
+**56. Negative sampling ― It is a set of binary classifiers using logistic regressions that aim at assessing how a given context and a given target words are likely to appear simultaneously, with the models being trained on sets of k negative examples and 1 positive example. Given a context word c and a target word t, the prediction is expressed by:**
+
+&#10230; Negatif örnekleme - Belirli bir bağlamın ve belirli bir hedef kelimenin eşzamanlı olarak ortaya çıkmasının muhtemel olup olmadığının değerlendirilmesini, modellerin k negatif örnek kümeleri ve 1 pozitif örnek kümesinde eğitilmesini hedefleyen, lojistik regresyon kullanan bir ikili sınıflandırma kümesidir. Bağlam sözcüğü c ve hedef sözcüğü t göz önüne alındığında, tahmin şöyle ifade edilir:
+
+<br>
+
+
+**57. Remark: this method is less computationally expensive than the skip-gram model.**
+
+&#10230; Not: Bu yöntem, skip-gram modelinden daha az hesaplamalıdır.
+
+<br>
+
+
+**57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
+
+&#10230; GloVe ― Kelime gösterimi için Global vektörler tanımının kısaltılmış hali olan GloVe, eşzamanlı bir X matrisi kullanan ki burada her bir Xi,j , bir hedefin bir j bağlamında gerçekleştiği sayısını belirten bir kelime gömme tekniğidir. Maliyet fonksiyonu J aşağıdaki gibidir:
+
+<br>
+
+
+**58. where f is a weighting function such that Xi,j=0⟹f(Xi,j)=0.
+Given the symmetry that e and ? play in this model, the final word embedding e(final)w is given by:**
+
+&#10230; f, Xi,j=0⟹f(Xi,j)=0 olacak şekilde bir ağırlıklandırma fonksiyonudur.
+Bu modelde e ve θ'nin oynadığı simetri göz önüne alındığında, e (final) w'nin kelime gömmesi şöyle ifade edilir:
+
+<br>
+
+
+**59. Remark: the individual components of the learned word embeddings are not necessarily interpretable.**
+
+&#10230; Not: Öğrenilen kelime gömme bileşenlerinin ayrı ayrı bileşenleri tam olarak yorumlanamaz.
+
+<br>
+
+
+**60. Comparing words**
+
+&#10230; Kelimelerin karşılaştırılması
+
+<br>
+
+
+**61. Cosine similarity ― The cosine similarity between words w1 and w2 is expressed as follows:**
+
+&#10230; Kosinüs benzerliği ― w1 ve w2 kelimeleri arasındaki kosinüs benzerliği şu şekilde ifade edilir:
+
+<br>
+
+
+**62. Remark: θ is the angle between words w1 and w2.**
+
+&#10230; Not: θ, w1 ve w2 kelimeleri arasındaki açıdır.
+
+<br>
+
+
+**63. t-SNE ― t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique aimed at reducing high-dimensional embeddings into a lower dimensional space. In practice, it is commonly used to visualize word vectors in the 2D space.**
+
+&#10230; t-SNE ― t-SNE (t-dağıtımlı Stokastik Komşu Gömme), yüksek boyutlu gömmeleri daha düşük boyutlu bir alana indirmeyi amaçlayan bir tekniktir. Uygulamada, kelime uzaylarını 2B alanda görselleştirmek için yaygın olarak kullanılır.
+
+<br>
+
+
+**64. [literature, art, book, culture, poem, reading, knowledge, entertaining, loveable, childhood, kind, teddy bear, soft, hug, cute, adorable]**
+
+&#10230; [edebiyat, sanat, kitap, kültür, şiir, okuma, bilgi, eğlendirici, sevimli, çocukluk, kibar, ayıcık, yumuşak, sarılmak, sevimli, sevimli]
+
+<br>
+
+
+**65. Language model**
+
+&#10230; Dil modeli
+
+<br>
+
+
+**66. Overview ― A language model aims at estimating the probability of a sentence P(y).**
+
+&#10230; Genel bakış - Bir dil modeli P(y) cümlesinin olasılığını tahmin etmeyi amaçlar.
+
+<br>
+
+
+**67. n-gram model ― This model is a naive approach aiming at quantifying the probability that an expression appears in a corpus by counting its number of appearance in the training data.**
+
+&#10230; n-gram modeli ― Bu model, eğitim verilerindeki görünüm sayısını sayarak bir ifadenin bir korpusta ortaya çıkma olasılığını ölçmeyi amaçlayan naif bir yaklaşımdır.
+
+<br>
+
+
+**68. Perplexity ― Language models are commonly assessed using the perplexity metric, also known as PP, which can be interpreted as the inverse probability of the dataset normalized by the number of words T. The perplexity is such that the lower, the better and is defined as follows:**
+
+&#10230; Karışıklık - Dil modelleri yaygın olarak, PP olarak da bilinen karışıklık metriği kullanılarak değerlendirilir ve bunlar T kelimelerinin sayısıyla normalize edilmiş veri setinin ters olasılığı olarak yorumlanabilir. Karışıklık, daha düşük, daha iyi ve şöyle tanımlanır:
+
+<br>
+
+
+**69. Remark: PP is commonly used in t-SNE.**
+
+&#10230; Not: PP, t-SNE'de yaygın olarak kullanılır.
+
+<br>
+
+
+**70. Machine translation**
+
+&#10230; Makine çevirisi
+
+<br>
+
+
+**71. Overview ― A machine translation model is similar to a language model except it has an encoder network placed before. For this reason, it is sometimes referred as a conditional language model. The goal is to find a sentence y such that:**
+
+&#10230; Genel bakış ― Bir makine çeviri modeli, daha önce yerleştirilmiş bir kodlayıcı ağına sahip olması dışında, bir dil modeline benzer. Bu nedenle, bazen koşullu dil modeli olarak da adlandırılır. Amaç şu şekilde bir cümle bulmaktır:
+
+<br>
+
+
+**72. Beam search ― It is a heuristic search algorithm used in machine translation and speech recognition to find the likeliest sentence y given an input x.**
+
+&#10230; Işın arama ― Makine çevirisinde ve konuşma tanımada kullanılan ve x girişi verilen en olası cümleyi bulmak için kullanılan sezgisel bir arama algoritmasıdır.
+
+<br>
+
+
+**73. [Step 1: Find top B likely words y<1>, Step 2: Compute conditional probabilities y<k>|x,y<1>,...,y<k-1>, Step 3: Keep top B combinations x,y<1>,...,y<k>, End process at a stop word]**
+  
+&#10230; [Adım 1: En olası B kelimeleri bulun y<1>, 2. Adım: Koşullu olasılıkları hesaplayın y|x,y<1>, ..., y, 3. Adım: En olası B kombinasyonlarını koruyun x,y<1>, ..., y, İşlemi durdurarak sonlandırın]
+
+<br>
+
+
+**74. Remark: if the beam width is set to 1, then this is equivalent to a naive greedy search.**
+
+&#10230; Not: Eğer ışın genişliği 1 olarak ayarlanmışsa, bu naif (naive) bir açgözlü (greedy) aramaya eşdeğerdir.
+
+<br>
+
+
+**75. Beam width ― The beam width B is a parameter for beam search. Large values of B yield to better result but with slower performance and increased memory. Small values of B lead to worse results but is less computationally intensive. A standard value for B is around 10.**
+
+&#10230; Işın genişliği ― Işın genişliği B, ışın araması için bir parametredir. Daha yüksek B değerleri daha iyi sonuç elde edilmesini sağlar fakat daha düşük performans ve daha yüksek hafıza ile. Küçük B değerleri daha kötü sonuçlara neden olur, ancak hesaplama açısından daha az yoğundur. B için standart bir değer 10 civarındadır.
+
+<br>
+
+
+**76. Length normalization ― In order to improve numerical stability, beam search is usually applied on the following normalized objective, often called the normalized log-likelihood objective, defined as:**
+
+&#10230; Uzunluk normalizasyonu ― Sayısal stabiliteyi arttırmak için, ışın arama genellikle, aşağıdaki gibi tanımlanan normalize edilmiş log-olabilirlik amacı olarak adlandırılan normalize edilmiş hedefe uygulanır:
+
+<br>
+
+
+**77. Remark: the parameter α can be seen as a softener, and its value is usually between 0.5 and 1.**
+
+&#10230; Not: α parametresi yumuşatıcı olarak görülebilir ve değeri genellikle 0,5 ile 1 arasındadır.
+
+<br>
+
+
+**78. Error analysis ― When obtaining a predicted translation ˆy that is bad, one can wonder why we did not get a good translation y∗ by performing the following error analysis:**
+
+&#10230; Hata analizi ― Kötü bir çeviri elde edildiğinde, aşağıdaki hata analizini yaparak neden iyi bir çeviri almadığımızı araştırabiliriz:
+
+<br>
+
+
+**79. [Case, Root cause, Remedies]**
+
+&#10230; [Durum, Ana neden, Çözümler]
+
+<br>
+
+
+**80. [Beam search faulty, RNN faulty, Increase beam width, Try different architecture, Regularize, Get more data]**
+
+&#10230; [Işın arama hatası, RNN hatası, Işın genişliğini artırma, farklı mimariyi deneme, Düzenlileştirme, Daha fazla bilgi edinme]
+
+<br>
+
+
+**81. Bleu score ― The bilingual evaluation understudy (bleu) score quantifies how good a machine translation is by computing a similarity score based on n-gram precision. It is defined as follows:**
+
+&#10230; Bleu puanı ― İki dilli değerlendirme alt ölçeği (bleu) puanı, makine çevirisinin ne kadar iyi olduğunu, n-gram hassasiyetine dayalı bir benzerlik puanı hesaplayarak belirler. Aşağıdaki gibi tanımlanır:
+
+<br>
+
+
+**82. where pn is the bleu score on n-gram only defined as follows:**
+
+&#10230; pn, n-gramdaki bleu skorunun sadece aşağıdaki şekilde tanımlandığı durumlarda:
+
+<br>
+
+
+**83. Remark: a brevity penalty may be applied to short predicted translations to prevent an artificially inflated bleu score.**
+
+&#10230; Not: Yapay olarak şişirilmiş bir bleu skorunu önlemek için kısa öngörülen çevirilere küçük bir ceza verilebilir.
+
+<br>
+
+
+**84. Attention**
+
+&#10230; Dikkat
+
+<br>
+
+
+**85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y should pay to the activation a<t′> and c the context at time t, we have:**
+
+&#10230; Dikkat modeli ― Bu model, bir RNN'de girişin önemli olduğu düşünülen belirli kısımlarına dikkat etmesine olanak sağlar,sonuçta ortaya çıkan modelin pratikteki performansını arttırır. α<t,t′> ile ifade edilen dikkat miktarı, a<t′> aktivasyonu ve t zamanındaki c bağlamını y çıktısı olarak verir.
+
+<br>
+
+
+**86. with**
+
+&#10230; ile
+
+<br>
+
+
+**87. Remark: the attention scores are commonly used in image captioning and machine translation.**
+
+&#10230; Not: Dikkat skorları, görüntü altyazılama ve makine çevirisinde yaygın olarak kullanılır.
+
+<br>
+
+
+**88. A cute teddy bear is reading Persian literature.**
+
+&#10230; Sevimli bir oyuncak ayı Fars edebiyatı okuyor.
+
+<br>
+
+
+**89. Attention weight ― The amount of attention that the output y should pay to the activation a<t′> is given by α<t,t′> computed as follows:**
+
+&#10230; Dikkat ağırlığı ― Y çıktısının a<t′> aktivasyonuna vermesi gereken dikkat miktarı, aşağıdaki gibi hesaplanan α<t,t′> ile verilir:
+
+<br>
+
+
+**90. Remark: computation complexity is quadratic with respect to Tx.**
+
+&#10230; Not: hesaplama karmaşıklığı Tx'e göre ikinci derecedendir.
+
+<br>
+
+
+**91. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230; Derin Öğrenme el kitapları şimdi [hedef dilde] mevcuttur.
+
+<br>
+
+**92. Original authors**
+
+&#10230;Orijinal yazarlar
+
+<br>
+
+**93. Translated by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından çevrilmiştir.
+
+<br>
+
+**94. Reviewed by X, Y and Z**
+
+&#10230; X, Y ve Z tarafından gözden geçirilmiştir.
+
+<br>
+
+**95. View PDF version on GitHub**
+
+&#10230; GitHub'da PDF versiyonunu görüntüleyin.
+
+<br>
+
+**96. By X and Y**
+
+&#10230; X ve Y tarafından
+
+<br>
diff --git a/tr/refresher-probability.md b/tr/refresher-probability.md
deleted file mode 100644
index 5c9b34656..000000000
--- a/tr/refresher-probability.md
+++ /dev/null
@@ -1,381 +0,0 @@
-**1. Probabilities and Statistics refresher**
-
-&#10230;
-
-<br>
-
-**2. Introduction to Probability and Combinatorics**
-
-&#10230;
-
-<br>
-
-**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
-
-&#10230;
-
-<br>
-
-**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
-
-&#10230;
-
-<br>
-
-**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
-
-&#10230;
-
-<br>
-
-**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
-
-&#10230;
-
-<br>
-
-**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
-
-&#10230;
-
-<br>
-
-**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
-
-&#10230;
-
-<br>
-
-**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
-
-&#10230;
-
-<br>
-
-**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
-
-&#10230;
-
-<br>
-
-**12. Conditional Probability**
-
-&#10230;
-
-<br>
-
-**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
-
-&#10230;
-
-<br>
-
-**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
-
-&#10230;
-
-<br>
-
-**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
-
-&#10230;
-
-<br>
-
-**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
-
-&#10230;
-
-<br>
-
-**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
-
-&#10230;
-
-<br>
-
-**18. Independence ― Two events A and B are independent if and only if we have:**
-
-&#10230;
-
-<br>
-
-**19. Random Variables**
-
-&#10230;
-
-<br>
-
-**20. Definitions**
-
-&#10230;
-
-<br>
-
-**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
-
-&#10230;
-
-<br>
-
-**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
-
-&#10230;
-
-<br>
-
-**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
-
-&#10230;
-
-<br>
-
-**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
-
-&#10230;
-
-<br>
-
-**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
-
-&#10230;
-
-<br>
-
-**26. [Case, CDF F, PDF f, Properties of PDF]**
-
-&#10230;
-
-<br>
-
-**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
-
-&#10230;
-
-<br>
-
-**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
-
-&#10230;
-
-<br>
-
-**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
-
-&#10230;
-
-<br>
-
-**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
-
-&#10230;
-
-<br>
-
-**32. Probability Distributions**
-
-&#10230;
-
-<br>
-
-**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
-
-&#10230;
-
-<br>
-
-**34. Main distributions ― Here are the main distributions to have in mind:**
-
-&#10230;
-
-<br>
-
-**35. [Type, Distribution]**
-
-&#10230;
-
-<br>
-
-**36. Jointly Distributed Random Variables**
-
-&#10230;
-
-<br>
-
-**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
-
-&#10230;
-
-<br>
-
-**38. [Case, Marginal density, Cumulative function]**
-
-&#10230;
-
-<br>
-
-**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
-
-&#10230;
-
-<br>
-
-**40. Independence ― Two random variables X and Y are said to be independent if we have:**
-
-&#10230;
-
-<br>
-
-**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
-
-&#10230;
-
-<br>
-
-**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
-
-&#10230;
-
-<br>
-
-**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
-
-&#10230;
-
-<br>
-
-**44. Remark 2: If X and Y are independent, then ρXY=0.**
-
-&#10230;
-
-<br>
-
-**45. Parameter estimation**
-
-&#10230;
-
-<br>
-
-**46. Definitions**
-
-&#10230;
-
-<br>
-
-**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
-
-&#10230;
-
-<br>
-
-**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
-
-&#10230;
-
-<br>
-
-**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
-
-&#10230;
-
-<br>
-
-**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
-
-&#10230;
-
-<br>
-
-**51. Estimating the mean**
-
-&#10230;
-
-<br>
-
-**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
-
-&#10230;
-
-<br>
-
-**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
-
-&#10230;
-
-<br>
-
-**55. Estimating the variance**
-
-&#10230;
-
-<br>
-
-**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
-
-&#10230;
-
-<br>
-
-**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
-
-&#10230;
-
-<br>
-
-**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
-
-&#10230;
-
-<br>
-
-**59. [Introduction, Sample space, Event, Permutation]**
-
-&#10230;
-
-<br>
-
-**60. [Conditional probability, Bayes' rule, Independence]**
-
-&#10230;
-
-<br>
-
-**61. [Random variables, Definitions, Expectation, Variance]**
-
-&#10230;
-
-<br>
-
-**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
-
-&#10230;
-
-<br>
-
-**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
-
-&#10230;
-
-<br>
-
-**64. [Parameter estimation, Mean, Variance]**
-
-&#10230;
diff --git a/uk/cs-229-probability.md b/uk/cs-229-probability.md
new file mode 100644
index 000000000..a09ab965d
--- /dev/null
+++ b/uk/cs-229-probability.md
@@ -0,0 +1,381 @@
+**1. Probabilities and Statistics refresher**
+
+&#10230; Швидке повторення з теорії ймовірностей та комбінаторики.
+
+<br>
+
+**2. Introduction to Probability and Combinatorics**
+
+&#10230; Вступ до теорії ймовірностей та комбінаторики.
+
+<br>
+
+**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
+
+&#10230; Простір елементарних подій ― Множина всіх можливих результатiв експерименту називається простором елементарних подій і позначається літерою S.
+
+<br>
+
+**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
+
+&#10230; Випадкова подія - будь-яка підмножина E, що належить до певного простору елементарних подій, називається подією. Таким чином, подія це множина, що містить можливі результати експерименту. Якщо результати експерименту містяться в Е, тоді ми говоримо що Е відбулася.
+
+<br>
+
+**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
+
+&#10230; Аксіоми теорії ймовірностей. Для кожної події Е, P(E) є ймовірністю події Е.
+
+<br>
+
+**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
+
+&#10230; Аксіома 1 - Всі ймовірності існують між 0 та 1 включно.
+
+<br>
+
+**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
+
+&#10230; Аксіома 2 - Ймовірність що як мінімум одна подія з простору елементарних подій відбудеться дорівнює 1.
+
+<br>
+
+**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
+
+&#10230; Аксіома 3 - Для будь-якої послідовності взаємновиключних подій E1,...,En, ми маємо:
+
+<br>
+
+**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
+
+&#10230; Підстановка - підстановка це спосіб вибору r об'єктів з набору n об'єктів в певному порядку. Кількість таких способів вибору задається через P(n,r):
+
+<br>
+
+**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
+
+&#10230; Комбiнацiя - комбiнацiя це спосіб вибору r об'єктів з набору n об'єктів, де порядок не має значення. Кількість таких способів вибору задається через C(n,r):
+
+<br>
+
+**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
+
+&#10230; Примітка: ми зауважуємо що для 0⩽r⩽n, ми маємо P(n,r)⩾C(n,r)
+
+<br>
+
+**12. Conditional Probability**
+
+&#10230; Умовна ймовірність
+
+<br>
+
+**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
+
+&#10230; Теорема Баєса - Для подій А і В таких що P(B)>0, маємо:
+
+<br>
+
+**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
+
+&#10230; Примітка: P(A∩B)=P(A)P(B|A)=P(A|B)P(B)
+
+<br>
+
+**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
+
+&#10230; Поділ множини - Нехай {Ai,i∈[[1,n]]} буде таким для всіх i, Ai≠∅. Ми називаємо {Ai} поділом множини якщо:
+
+<br>
+
+**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
+
+&#10230; Примітка: для будь-якої події В в просторі елементарних подій, маємо P(B)=n∑i=1P(B|Ai)P(Ai).
+
+<br>
+
+**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
+
+&#10230; Розгорнута форма теореми Баєса - Нехай {Ai,i∈[[1,n]]} буде поділом множини простору елементарних подій. Маємо:
+
+<br>
+
+**18. Independence ― Two events A and B are independent if and only if we have:**
+
+&#10230; Незалежність - Дві події А і В є незалежними якщо і тільки якщо ми маємо:
+
+<br>
+
+**19. Random Variables**
+
+&#10230; Випадкові змінні
+
+<br>
+
+**20. Definitions**
+
+&#10230; Означення
+
+<br>
+
+**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
+
+&#10230; Випадкова змінна - Випадкова змінна, часто означена X, є функцією що проектує кожну подію в просторі елементарних подій на реальну лінію.
+
+<br>
+
+**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
+
+&#10230; Функція розподілу ймовірностей (CDF) - Функція розподілу ймовірностей F, що є монотонно зростаючою і є такою, що limx→−∞F(x)=0 та limx→+∞F(x)=1 і задається як:
+
+<br>
+
+**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
+
+&#10230; Примітка: маємо P(a<X⩽B)=F(b)−F(a).
+
+<br>
+
+**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
+
+&#10230; Функція густини імовірності (PDF) - Функція густини імовірності F є імовірністю що X набирає значень між двома сусідніми випадковими величинами.
+
+<br>
+
+**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
+
+&#10230; Залежність між PDF та CDF - Ось деякі важливі характеристики в одиночних i тривалих випадках:
+
+<br>
+
+**26. [Case, CDF F, PDF f, Properties of PDF]**
+
+&#10230; [Випадок, CDF F, PDF f, характеристики PDF]
+
+<br>
+
+**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
+
+&#10230; Математичне сподівання і моменти випадкового значення - Ось вирази очікуваного значення E[X], узагальненого очікуваного значення E[g(X)], k-го моменту E[Xk] та характеристичною функцією ψ(ω) дискретного або неперервного значення величини:
+
+<br>
+
+**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
+
+&#10230; Дисперсія випадкової змiнної - Дисперсія випадкової змiнної, що позначається Var(X) або σ2 є мірою величини розподілення значень Функції. Вона визначаєтья:
+
+<br>
+
+**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
+
+&#10230; Стандартне відхилення - Стандартне відхилення випадкової величини, що позначається σ, є мірою величини розподілення значень функції, сумісною з одиницями випадкової величини. Вона визначаєтья:
+
+<br>
+
+**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
+
+&#10230; Перетворення випадкових величин - Нехай змінні X та Y будуть поєднані певною функцією. Називаючи fX та fY розподілом відповідно функцій X та Y, маємо:
+
+<br>
+
+**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
+
+&#10230; Інтегральне правило Лейбніца - Нехай g буде функцією x і потенційно c, і a,b будуть кордонами що можуть залежати від с. Маємо :
+
+<br>
+
+**32. Probability Distributions**
+
+&#10230; Розподіл ймовірностей
+
+<br>
+
+**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
+
+&#10230; Нерівність Чебишова ― Нехай X буде випадковою змінною з очікуваною велечиною μ. Для k,σ>0, маємо наступну нерівність :
+
+<br>
+
+**34. Main distributions ― Here are the main distributions to have in mind:**
+
+&#10230; Головні розподіли - Ось кілька найважливіших розподілів які варто знати:
+
+<br>
+
+**35. [Type, Distribution]**
+
+&#10230; [Тип, Розподіл]
+
+<br>
+
+**36. Jointly Distributed Random Variables**
+
+&#10230; Спільно розподілені випадкові величини
+
+<br>
+
+**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
+
+&#10230; Відособлена густина та розподіл ймовірностей - Виходячи з формули спільної густини ймовірностей fXY, маємо :
+
+<br>
+
+**38. [Case, Marginal density, Cumulative function]**
+
+&#10230; [Випадок, Відособлена густина, Розподіл ймовірностей]
+
+<br>
+
+**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
+
+&#10230; Умовна густина  ― Умовна густина X відносно Y, означена fX|Y, визначаєтья:
+
+<br>
+
+**40. Independence ― Two random variables X and Y are said to be independent if we have:**
+
+&#10230; Незалежність - Дві події А і В є незалежними якщо і тільки якщо ми маємо:
+
+<br>
+
+**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
+
+&#10230; Коваріація ― Коваріація двох випадкових змінних X та Y, що означена як σ2XY або частіше як Cov(X,Y), визначаєтья :
+
+<br>
+
+**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
+
+&#10230; Кореляція ― Означивши σX,σY станартним відхиленням X та Y, ми визначаємо кореляцію X та Y, означену ρXY, в наступний спосіб :
+
+<br>
+
+**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
+
+&#10230; Примітка 1: ми зазначаємо що для будь-яких випадкових змінних X, Y, маємо ρXY∈[−1,1].
+
+<br>
+
+**44. Remark 2: If X and Y are independent, then ρXY=0.**
+
+&#10230; Примітка 2 : Якщо X та Y є незалежними, тоді ρXY=0.
+
+<br>
+
+**45. Parameter estimation**
+
+&#10230; Оцінювання параметрів
+
+<br>
+
+**46. Definitions**
+
+&#10230; Визначення
+
+<br>
+
+**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
+
+&#10230; Випадкова вибірка ― Випадкова вибірка це набір випадкових змінних X1,...,Xn які є незалежними і ідентично розподіленими в X.
+
+<br>
+
+**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
+
+&#10230; Статистична оцінка - Статистична оцінка це функція даних що використовується щоб визначити невідомий параметр статистичної моделі.
+
+<br>
+
+**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
+
+&#10230; Систематична похибка ― Систематична похибка статистичної оцінки ^θ визначаєтья як різниця очікуваної величини розподілу ^θ і фактичної величини, тобіж:
+
+<br>
+
+**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
+
+&#10230; Примітка: оцінка немає похибки якщо E[^θ]=θ.
+
+<br>
+
+**51. Estimating the mean**
+
+&#10230; Оцінка середнього значення
+
+<br>
+
+**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
+
+&#10230; Середнє значення вибірки ― Середнє значення вибірки ¯¯¯¯¯X вказує середнє μ розподілу і визначаєтья:
+
+<br>
+
+**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
+
+&#10230; Примітка : середнє значення не має похибки, тобто E[¯¯¯¯¯X]=μ.
+
+<br>
+
+**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
+
+&#10230; Центральна гранична теорема ― Маючи випадкову вибірку X1,...,Xn слідуючи даному розподілу з середнім значенням σ2, маємо :
+
+<br>
+
+**55. Estimating the variance**
+
+&#10230; Розрахунок дисперсії
+
+<br>
+
+**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
+
+&#10230; Дисперсія вибірки ― Дисперсія випадкової вибірки - s2 або ^σ2, використовується щоб визначити справжню дисперсію σ2 вибірки, і визначаєтья:
+
+<br>
+
+**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
+
+&#10230; Примітка: дисперсія вибірки не має похибки, тобто E[s2]=σ2.
+
+<br>
+
+**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
+
+&#10230; Розподіл хі-квадрат та дисперсія вибірки ― Нехай s2 буде дисперсією випадкової вибірка. Маємо:
+
+<br>
+
+**59. [Introduction, Sample space, Event, Permutation]**
+
+&#10230; [Вступ, Простір елементарних подій, Подія, Підстановка];
+
+<br>
+
+**60. [Conditional probability, Bayes' rule, Independence]**
+
+&#10230; [Умовна ймовірність, Теорема Баєса, Незалежність];
+
+<br>
+
+**61. [Random variables, Definitions, Expectation, Variance]**
+
+&#10230; [Випадкові змінні, Означення, Очікування, Дисперсія]
+
+<br>
+
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
+
+&#10230; [Розподіли ймовірності, Нерівність Чебишова, Головні розподіли]
+
+<br>
+
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
+
+&#10230; [Спільно розподілені випадкові величини, Щільність, Коваріація, Кореляція]
+
+<br>
+
+**64. [Parameter estimation, Mean, Variance]**
+
+&#10230; [Оцінювання параметрів, Середнє значення, Дисперсія]
diff --git a/zh-tw/cheatsheet-deep-learning.md b/zh-tw/cs-229-deep-learning.md
similarity index 100%
rename from zh-tw/cheatsheet-deep-learning.md
rename to zh-tw/cs-229-deep-learning.md
diff --git a/zh/refresher-linear-algebra.md b/zh-tw/cs-229-linear-algebra.md
similarity index 58%
rename from zh/refresher-linear-algebra.md
rename to zh-tw/cs-229-linear-algebra.md
index 6cef234fe..36d4cef5d 100644
--- a/zh/refresher-linear-algebra.md
+++ b/zh-tw/cs-229-linear-algebra.md
@@ -1,339 +1,338 @@
 1. **Linear Algebra and Calculus refresher**
 
 &#10230;
-
+線性代數與微積分回顧
 <br>
 
 2. **General notations**
 
 &#10230;
-
+通用符號
 <br>
 
 3. **Definitions**
 
 &#10230;
-
+定義
 <br>
 
 4. **Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
 
 &#10230;
-
+向量 - 我們定義 x∈Rn 是一個向量，包含 n 維元素，xi∈R 是第 i 維元素：
 <br>
 
 5. **Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
 
 &#10230;
-
+矩陣 - 我們定義 A∈Rm×n 是一個 m 列 n 行的矩陣，Ai,j∈R 代表位在第 i 列第 j 行的元素：
 <br>
 
 6. **Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
 
 &#10230;
-
+注意：上述定義的向量 x 可以視為 nx1 的矩陣，或是更常被稱為行向量
 <br>
 
 7. **Main matrices**
 
 &#10230;
-
+主要的矩陣
 <br>
 
 8. **Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
 
 &#10230;
-
+單位矩陣 - 單位矩陣 I∈Rn×n 是一個方陣，其主對角線皆為 1，其餘皆為 0
 <br>
 
 9. **Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
 
 &#10230;
-
+注意：對於所有矩陣 A∈Rn×n，我們有 A×I=I×A=A
 <br>
 
 10. **Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
 
 &#10230;
-
+對角矩陣 - 對角矩陣 D∈Rn×n 是一個方陣，其主對角線為非 0，其餘皆為 0
 <br>
 
 11. **Remark: we also note D as diag(d1,...,dn).**
 
 &#10230;
-
+注意：我們令 D 為 diag(d1,...,dn)
 <br>
 
 12. **Matrix operations**
 
 &#10230;
-
+矩陣運算
 <br>
 
 13. **Multiplication**
 
 &#10230;
-
+乘法
 <br>
 
 14. **Vector-vector ― There are two types of vector-vector products:**
 
 &#10230;
-
+向量-向量 - 有兩種類型的向量-向量相乘：
 <br>
 
 15. **inner product: for x,y∈Rn, we have:**
 
 &#10230;
-
+內積：對於 x,y∈Rn，我們可以得到：
 <br>
 
 16. **outer product: for x∈Rm,y∈Rn, we have:**
 
 &#10230;
-
+外積：對於 x∈Rm,y∈Rn，我們可以得到：
 <br>
 
 17. **Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
 
 &#10230;
-
+矩陣-向量 - 矩陣 A∈Rm×n 和向量 x∈Rn 的乘積是一個大小為 Rm 的向量，使得：
 <br>
 
 18. **where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
 
 &#10230;
-
+其中 aTr,i 是 A 的列向量、ac,j 是 A 的行向量、xi 是 x 的元素
 <br>
 
 19. **Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
 
 &#10230;
-
+矩陣-矩陣：矩陣 A∈Rm×n 和 B∈Rn×p 的乘積為一個大小 Rm×p 的矩陣，使得：
 <br>
 
 20. **where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
 
 &#10230;
-
+其中，aTr,i,bTr,i 和 ac,j,bc,j 分別是 A 和 B 的列向量與行向量
 <br>
 
 21. **Other operations**
 
 &#10230;
-
+其他操作
 <br>
 
 22. **Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
 
 &#10230;
-
+轉置 - 一個矩陣的轉置矩陣 A∈Rm×n，記作 AT，指的是其中元素的翻轉：
 <br>
 
 23. **Remark: for matrices A,B, we have (AB)T=BTAT**
 
 &#10230;
-
+注意：對於矩陣 A、B，我們有 (AB)T=BTAT
 <br>
 
 24. **Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
 
 &#10230;
-
+可逆 - 一個可逆矩陣 A 記作 A−1，存在唯一的矩陣，使得：
 <br>
 
 25. **Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
 
 &#10230;
-
+注意：並非所有的方陣都是可逆的。同樣的，對於矩陣 A、B 來說，我們有 (AB)−1=B−1A−1
 <br>
 
 26. **Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
 
 &#10230;
-
+跡 - 一個方陣 A 的跡，記作 tr(A)，指的是主對角線元素之合：
 <br>
 
 27. **Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
 
 &#10230;
-
+注意：對於矩陣 A、B 來說，我們有 tr(AT)=tr(A) 及 tr(AB)=tr(BA)
 <br>
 
 28. **Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
 
 &#10230;
-
+行列式 - 一個方陣 A∈Rn×n 的行列式，記作|A| 或 det(A)，可以透過 A∖i,∖j 來遞迴表示，它是一個沒有第 i 列和第 j 行的矩陣 A：
 <br>
 
 29. **Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
 
 &#10230;
-
+注意：A 是一個可逆矩陣，若且唯若 |A|≠0。同樣的，|AB|=|A||B| 且 |AT|=|A|
 <br>
 
 30. **Matrix properties**
 
 &#10230;
-
+矩陣的性質
 <br>
 
 31. **Definitions**
 
 &#10230;
-
+定義
 <br>
 
 32. **Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
 
 &#10230;
-
+對稱分解 - 給定一個矩陣 A，它可以透過其對稱和反對稱的部分表示如下：
 <br>
 
 33. **[Symmetric, Antisymmetric]**
 
 &#10230;
-
+[對稱, 反對稱]
 <br>
 
 34. **Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
 
 &#10230;
-
+範數 - 範數指的是一個函式 N:V⟶[0,+∞[，其中 V 是一個向量空間，且對於所有 x,y∈V，我們有：
 <br>
 
 35. **N(ax)=|a|N(x) for a scalar**
 
 &#10230;
-
+對一個純量來說，我們有 N(ax)=|a|N(x)
 <br>
 
 36. **if N(x)=0, then x=0**
 
 &#10230;
-
+若 N(x)=0 時，則 x=0
 <br>
 
 37. **For x∈V, the most commonly used norms are summed up in the table below:**
 
 &#10230;
-
+對於 x∈V，最常用的範數總結如下表：
 <br>
 
 38. **[Norm, Notation, Definition, Use case]**
 
 &#10230;
-
+[範數, 表示法, 定義, 使用情境]
 <br>
 
 39. **Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
 
 &#10230;
-
+線性相關 - 當集合中的一個向量可以用被定義為集合中其他向量的線性組合時，則則稱此集合的向量為線性相關
 <br>
 
 40. **Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
 
 &#10230;
-
+注意：如果沒有向量可以如上表示時，則稱此集合的向量彼此為線性獨立
 <br>
 
 41. **Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
 
 &#10230;
-
+矩陣的秩 - 一個矩陣 A 的秩記作 rank(A)，指的是其列向量空間所產生的維度，等價於 A 的線性獨立的最大最大行向量
 <br>
 
 42. **Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
 
 &#10230;
-
+半正定矩陣 - 當以下成立時，一個矩陣 A∈Rn×n 是半正定矩陣 (PSD)，且記作A⪰0：
 <br>
 
 43. **Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
 
 &#10230;
-
+注意：同樣的，一個矩陣 A 是一個半正定矩陣 (PSD)，且滿足所有非零向量 x，xTAx>0 時，稱之為正定矩陣，記作 A≻0
 <br>
 
 44. **Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
 &#10230;
-
+特徵值、特徵向量 - 給定一個矩陣 A∈Rn×n，當存在一個向量 z∈Rn∖{0} 時，此向量被稱為特徵向量，λ 稱之為 A 的特徵值，且滿足：
 <br>
 
 45. **Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
 &#10230;
-
+譜分解 - 令 A∈Rn×n，如果 A 是對稱的，則 A 可以被一個實數正交矩陣 U∈Rn×n 給對角化。令 Λ=diag(λ1,...,λn)，我們得到：
 <br>
 
 46. **diagonal**
 
 &#10230;
-
+對角線
 <br>
 
 47. **Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
 
 &#10230;
-
+奇異值分解 - 對於給定維度為 mxn 的矩陣 A，其奇異值分解指的是一種因子分解技巧，保證存在 mxm 的單式矩陣 U、對角線矩陣 Σ m×n 和 nxn 的單式矩陣 V，滿足：
 <br>
 
 48. **Matrix calculus**
 
 &#10230;
-
+矩陣導數
 <br>
 
 49. **Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
 
 &#10230;
-
+梯度 - 令 f:Rm×n→R 是一個函式，且 A∈Rm×n 是一個矩陣。f 相對於 A 的梯度是一個 mxn 的矩陣，記作 ∇Af(A)，滿足：
 <br>
 
 50. **Remark: the gradient of f is only defined when f is a function that returns a scalar.**
 
 &#10230;
-
+注意：f 的梯度僅在 f 為一個函數且該函數回傳一個純量時有效
 <br>
 
 51. **Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
 
 &#10230;
-
+海森 - 令 f:Rn→R 是一個函式，且 x∈Rn 是一個向量，則一個 f 的海森對於向量 x 是一個 nxn 的對稱矩陣，記作 ∇2xf(x)，滿足：
 <br>
 
 52. **Remark: the hessian of f is only defined when f is a function that returns a scalar**
 
 &#10230;
-
+注意：f 的海森僅在 f 為一個函數且該函數回傳一個純量時有效
 <br>
 
 53. **Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
-
+梯度運算 - 對於矩陣 A、B、C，下列的梯度性質值得牢牢記住：
 &#10230;
 
-<br>
-
 54. **[General notations, Definitions, Main matrices]**
 
 &#10230;
-
+[通用符號, 定義, 主要矩陣]
 <br>
 
 55. **[Matrix operations, Multiplication, Other operations]**
 
 &#10230;
-
+[矩陣運算, 矩陣乘法, 其他運算]
 <br>
 
 56. **[Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
 
 &#10230;
-
+[矩陣性質, 範數, 特徵值/特徵向量, 奇異值分解]
 <br>
 
 57. **[Matrix calculus, Gradient, Hessian, Operations]**
 
 &#10230;
+[矩陣導數, 梯度, 海森, 運算]
\ No newline at end of file
diff --git a/zh/cheatsheet-machine-learning-tips-and-tricks.md b/zh-tw/cs-229-machine-learning-tips-and-tricks.md
similarity index 59%
rename from zh/cheatsheet-machine-learning-tips-and-tricks.md
rename to zh-tw/cs-229-machine-learning-tips-and-tricks.md
index 61fab788c..b7a5db1c0 100644
--- a/zh/cheatsheet-machine-learning-tips-and-tricks.md
+++ b/zh-tw/cs-229-machine-learning-tips-and-tricks.md
@@ -1,285 +1,257 @@
 1. **Machine Learning tips and tricks cheatsheet**
 
 &#10230;
-
+機器學習秘訣和技巧參考手冊
 <br>
 
 2. **Classification metrics**
 
 &#10230;
-
+分類器的評估指標
 <br>
 
 3. **In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
 
 &#10230;
-
+在二元分類的問題上，底下是主要用來衡量模型表現的指標
 <br>
 
 4. **Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
 
 &#10230;
-
+混淆矩陣 - 混淆矩陣是用來衡量模型整體表現的指標
 <br>
 
 5. **[Predicted class, Actual class]**
 
 &#10230;
-
+[預測類別, 真實類別]
 <br>
 
 6. **Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
 
 &#10230;
-
+主要的衡量指標 - 底下的指標經常用在評估分類模型的表現
 <br>
 
 7. **[Metric, Formula, Interpretation]**
 
 &#10230;
-
+[指標, 公式, 解釋]
 <br>
 
 8. **Overall performance of model**
 
 &#10230;
-
+模型的整體表現
 <br>
 
 9. **How accurate the positive predictions are**
 
 &#10230;
-
+預測的類別有多精準的比例
 <br>
 
 10. **Coverage of actual positive sample**
 
 &#10230;
-
+實際正的樣本的覆蓋率有多少
 <br>
 
 11. **Coverage of actual negative sample**
 
 &#10230;
-
+實際負的樣本的覆蓋率
 <br>
 
 12. **Hybrid metric useful for unbalanced classes**
 
 &#10230;
-
+對於非平衡類別相當有用的混合指標
 <br>
 
 13. **ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
 
 &#10230;
-
+ROC - 接收者操作特徵曲線 (ROC Curve)，又被稱為 ROC，是透過改變閥值來表示 TPR 和 FPR 之間關係的圖形。這些指標總結如下：
 <br>
 
 14. **[Metric, Formula, Equivalent]**
 
 &#10230;
-
+[衡量指標, 公式, 等同於]
 <br>
 
 15. **AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
 
 &#10230;
-
+AUC - 在接收者操作特徵曲線 (ROC) 底下的面積，也稱為 AUC 或 AUROC：
 <br>
 
 16. **[Actual, Predicted]**
 
 &#10230;
-
+[實際值, 預測值]
 <br>
 
 17. **Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
 
 &#10230;
-
+基本的指標 - 給定一個迴歸模型 f，底下是經常用來評估此模型的指標：
 <br>
 
 18. **[Total sum of squares, Explained sum of squares, Residual sum of squares]**
 
 &#10230;
-
+[總平方和, 被解釋平方和, 殘差平方和]
 <br>
 
 19. **Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
 
 &#10230;
-
+決定係數 - 決定係數又被稱為 R2 or r2，它提供了模型是否具備復現觀測結果的能力。定義如下：
 <br>
 
 20. **Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
 
 &#10230;
-
+主要的衡量指標 - 藉由考量變數 n 的數量，我們經常用使用底下的指標來衡量迴歸模型的表現：
 <br>
 
 21. **where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
 
 &#10230;
-
+當中，L 代表的是概似估計，ˆσ2 則是變異數的估計
 <br>
 
 22. **Model selection**
 
 &#10230;
-
+模型選擇
 <br>
 
 23. **Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
 
 &#10230;
-
+詞彙 - 當進行模型選擇時，我們會針對資料進行以下區分：
 <br>
 
 24. **[Training set, Validation set, Testing set]**
 
 &#10230;
-
+[訓練資料集, 驗證資料集, 測試資料集]
 <br>
 
 25. **[Model is trained, Model is assessed, Model gives predictions]**
 
 &#10230;
-
+[用來訓練模型, 用來評估模型, 模型用來預測用的資料集]
 <br>
 
 26. **[Usually 80% of the dataset, Usually 20% of the dataset]**
 
 &#10230;
-
+[通常是 80% 的資料集, 通常是 20% 的資料集]
 <br>
 
 27. **[Also called hold-out or development set, Unseen data]**
 
 &#10230;
-
+[又被稱為 hold-out 資料集或開發資料集, 模型沒看過的資料集]
 <br>
 
 28. **Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
 
 &#10230;
-
+當模型被選擇後，就會使用整個資料集來做訓練，並且在沒看過的資料集上做測試。你可以參考以下的圖表：
 <br>
 
 29. **Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
 
 &#10230;
-
+交叉驗證 - 交叉驗證，又稱之為 CV，它是一種不特別依賴初始訓練集來挑選模型的方法。幾種不同的方法如下：
 <br>
 
-30. [**Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
+30. **[Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
 
 &#10230;
-
+[把資料分成 k 份，利用 k-1 份資料來訓練，剩下的一份用來評估模型效能, 在 n-p 份資料上進行訓練，剩下的  p 份資料用來評估模型效能]
 <br>
 
 31. **[Generally k=5 or 10, Case p=1 is called leave-one-out]**
 
 &#10230;
-
+[一般來說 k=5 或 10, 當 p=1 時，又稱為 leave-one-out]
 <br>
 
 32. **The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
 
 &#10230;
-
+最常用到的方法叫做 k-fold 交叉驗證。它將訓練資料切成 k 份，在 k-1 份資料上進行訓練，而剩下的一份用來評估模型的效能，這樣的流程會重複 k 次次。最後計算出來的模型損失是 k 次結果的平均，又稱為交叉驗證損失值。
 <br>
 
 33. **Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
 
 &#10230;
-
+正規化 - 正歸化的目的是為了避免模型對於訓練資料過擬合，進而導致高方差。底下的表格整理了常見的正規化技巧：
 <br>
 
 34. **[Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
 &#10230;
-
+[將係數縮減為 0, 有利變數的選擇, 將係數變得更小, 在變數的選擇和小係數之間作權衡]
 <br>
 
 35. **Diagnostics**
 
 &#10230;
-
+診斷
 <br>
 
 36. **Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
 
 &#10230;
-
+偏差 - 模型的偏差指的是模型預測值與實際值之間的差異
 <br>
 
 37. **Variance ― The variance of a model is the variability of the model prediction for given data points.**
 
 &#10230;
-
+變異 - 變異指的是模型在預測資料時的變異程度
 <br>
 
 38. **Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
 
 &#10230;
-
+偏差/變異的權衡 - 越簡單的模型，偏差就越大。而越複雜的模型，變異就越大
 <br>
 
 39. **[Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
 
 &#10230;
-
+[現象, 迴歸圖示, 分類圖示, 深度學習圖示, 可能的解法]
 <br>
 
 40. **[High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
 
 &#10230;
-
+[訓練錯誤較高, 訓練錯誤和測試錯誤接近, 高偏差, 訓練誤差會稍微比測試誤差低, 訓練誤差很低, 訓練誤差比測試誤差低很多, 高變異]
 <br>
 
 41. **[Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
 
 &#10230;
-
+[使用較複雜的模型, 增加更多特徵, 訓練更久, 採用正規化化的方法, 取得更多資料]
 <br>
 
 42. **Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
 
 &#10230;
-
+誤差分析 - 誤差分析指的是分析目前使用的模型和最佳模型之間差距的根本原因
 <br>
 
 43. **Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
 
 &#10230;
-
-<br>
-
-44. **Regression metrics**
-
-&#10230;
-
+銷蝕分析 (Ablative analysis) - 銷蝕分析指的是分析目前模型和基準模型之間差異的根本原因
 <br>
-
-45. **[Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
-
-&#10230;
-
-<br>
-
-46. **[Regression metrics, R squared, Mallow's CP, AIC, BIC]**
-
-&#10230;
-
-<br>
-
-47. **[Model selection, cross-validation, regularization]**
-
-&#10230;
-
-<br>
-
-48. **[Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
-
-&#10230;
diff --git a/zh/refresher-probability.md b/zh-tw/cs-229-probability.md
similarity index 56%
rename from zh/refresher-probability.md
rename to zh-tw/cs-229-probability.md
index 52e0056e0..0db481cf5 100644
--- a/zh/refresher-probability.md
+++ b/zh-tw/cs-229-probability.md
@@ -1,381 +1,382 @@
 1. **Probabilities and Statistics refresher**
 
 &#10230;
-
+機率和統計回顧
 <br>
 
 2. **Introduction to Probability and Combinatorics**
 
 &#10230;
-
+幾率與組合數學介紹
 <br>
 
 3. **Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
 
 &#10230;
-
+樣本空間 - 一個實驗的所有可能結果的集合稱之為這個實驗的樣本空間，記做 S
 <br>
 
 4. **Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
 
 &#10230;
-
+事件 - 樣本空間的任何子集合 E 被稱之為一個事件。也就是說，一個事件是實驗的可能結果的集合。如果該實驗的結果包含 E，我們稱我們稱 E 發生
 <br>
 
 5. **Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
 
 &#10230;
-
+機率公理。對於每個事件 E，我們用 P(E) 表示事件 E 發生的機率
 <br>
 
 6. **Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
 
 &#10230;
-
+公理 1 - 每一個機率值介於 0 到 1 之間，包含兩端點。即：
 <br>
 
 7. **Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
 
 &#10230;
-
+公理 2 - 至少一個基本事件出現在整個樣本空間中的機率是 1。即：
 <br>
 
 8. **Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
 
 &#10230;
-
+公理 3 - 對於任何互斥的事件 E1,...,En，我們定義如下：
 <br>
 
 9. **Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
 
 &#10230;
-
+排列 - 排列指的是從 n 個相異的物件中，取出 r 個物件按照固定順序重新安排，這樣安排的數量用 P(n,r) 來表示，定義為：
 <br>
 
 10. **Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
 
 &#10230;
-
+組合 - 組合指的是從 n 個物件中，取出 r 個物件，但不考慮他的順序。這樣組合要考慮的數量用 C(n,r) 來表示，定義為：
 <br>
 
 11. **Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
 
 &#10230;
-
+注意：對於 0⩽r⩽n，我們會有 P(n,r)⩾C(n,r)
 <br>
 
 12. **Conditional Probability**
 
 &#10230;
-
+條件機率
 <br>
 
 13. **Bayes' rule ― For events A and B such that P(B)>0, we have:**
 
 &#10230;
-
+貝氏定理 - 對於事件 A 和 B 滿足 P(B)>0 時，我們定義如下：
 <br>
 
 14. **Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
 
 &#10230;
-
+注意：P(A∩B)=P(A)P(B|A)=P(A|B)P(B)
 <br>
 
 15. **Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
 
 &#10230;
-
+分割 - 令 {Ai,i∈[[1,n]]} 對所有的 i，Ai≠∅，我們說 {Ai} 是一個分割，當底下成立時：
 <br>
 
 16. **Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
 
 &#10230;
-
+注意：對於任何在樣本空間的事件 B 來說，P(B)=n∑i=1P(B|Ai)P(Ai)
 <br>
 
 17. **Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
 
 &#10230;
-
+貝氏定理的擴展 - 令 {Ai,i∈[[1,n]]} 為樣本空間的一個分割，我們定義：
 <br>
 
 18. **Independence ― Two events A and B are independent if and only if we have:**
 
 &#10230;
-
+獨立 - 當以下條件滿足時，兩個事件 A 和 B 為獨立事件：
 <br>
 
 19. **Random Variables**
 
 &#10230;
-
+隨機變數
 <br>
 
 20. **Definitions**
 
 &#10230;
-
+定義
 <br>
 
 21. **Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
 
 &#10230;
-
+隨機變數 - 一個隨機變數 X，它是一個將樣本空間中的每個元素映射到實數域的函數
 <br>
 
 22. **Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
 
 &#10230;
-
+累積分佈函數 (CDF) - 累積分佈函數 F 是單調遞增的函數，其 limx→−∞F(x)=0 且 limx→+∞F(x)=1，定義如下：
 <br>
 
 23. **Remark: we have P(a<X⩽B)=F(b)−F(a).**
 
 &#10230;
-
+注意：P(a<X⩽B)=F(b)−F(a)
 <br>
 
 24. **Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
 
 &#10230;
-
+機率密度函數 - 機率密度函數 f 是隨機變數 X 在兩個相鄰的實數值附近取值的機率
 <br>
 
 25. **Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
 
 &#10230;
-
+機率密度函數和累積分佈函數的關係 - 底下是一些關於離散 (D) 和連續 (C) 的情況下的重要屬性
 <br>
 
 26. **[Case, CDF F, PDF f, Properties of PDF]**
 
 &#10230;
-
+[情況, 累積分佈函數 F, 機率密度函數 f, 機率密度函數的屬性]
 <br>
 
 27. **Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
 
 &#10230;
-
+分佈的期望值和動差 - 底下是期望值 E[X]、一般期望值  E[g(X)]、第 k 個動差和特徵函數 ψ(ω) 在離散和連續的情況下的表示式：
 <br>
 
 28. **Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
 
 &#10230;
-
+變異數 - 隨機變數的變異數通常表示為 Var(X) 或 σ2，用來衡量一個分佈離散程度的指標。其表示如下：
 <br>
 
 29. **Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
 
 &#10230;
-
+標準差 - 一個隨機變數的標準差通常表示為 σ，用來衡量一個分佈離散程度的指標，其單位和實際的隨機變數相容，表示如下：
 <br>
 
 30. **Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
 
 &#10230;
-
+隨機變數的轉換 - 令變數 X 和 Y 由某個函式連結在一起。我們定義 fX 和 fY 是 X 和 Y 的分佈函式，可以得到：
 <br>
 
 31. **Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
 
 &#10230;
-
+萊布尼茲積分法則 - 令 g 為 x 和 c 的函數，a 和 b 是依賴於 c 的的邊界，我們得到：
 <br>
 
 32. **Probability Distributions**
 
 &#10230;
-
+機率分佈
 <br>
 
 33. **Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
 
 &#10230;
-
+柴比雪夫不等式 - 令 X 是一隨機變數，期望值為 μ。對於 k, σ>0，我們有以下不等式：
 <br>
 
 34. **Main distributions ― Here are the main distributions to have in mind:**
 
 &#10230;
-
+主要的分佈 - 底下是我們需要熟悉的幾個主要的不等式：
 <br>
 
 35. **[Type, Distribution]**
 
 &#10230;
-
+[種類, 分佈]
 <br>
 
 36. **Jointly Distributed Random Variables**
 
 &#10230;
-
+聯合分佈隨機變數
 <br>
 
 37. **Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
 
 &#10230;
-
+邊緣密度和累積分佈 - 從聯合密度機率函數 fXY 中我們可以得到：
 <br>
 
 38. **[Case, Marginal density, Cumulative function]**
 
 &#10230;
-
+[種類, 邊緣密度函數, 累積函數]
 <br>
 
 39. **Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
 
 &#10230;
-
+條件密度 - X 對於 Y 的條件密度，通常用 fX|Y 表示如下：
 <br>
 
 40. **Independence ― Two random variables X and Y are said to be independent if we have:**
 
 &#10230;
-
+獨立 - 當滿足以下條件時，我們稱隨機變數 X 和 Y 互相獨立：
 <br>
 
 41. **Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
 
 &#10230;
-
+共變異數 - 我們定義隨機變數 X 和 Y 的共變異數為 σ2XY 或 Cov(X,Y) 如下：
 <br>
 
 42. **Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
 
 &#10230;
-
+相關性 - 我們定義 σX、σY 為 X 和 Y 的標準差，而 X 和 Y 的相關係數 ρXY 定義如下：
 <br>
 
 43. **Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
 
 &#10230;
-
+注意一：對於任何隨機變數 X 和 Y 來說，ρXY∈[−1,1] 成立
 <br>
 
 44. **Remark 2: If X and Y are independent, then ρXY=0.**
 
 &#10230;
-
+注意二：當 X 和 Y 獨立時，ρXY=0
 <br>
 
 45. **Parameter estimation**
 
 &#10230;
-
+參數估計
 <br>
 
 46. **Definitions**
 
 &#10230;
-
+定義
 <br>
 
 47. **Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
 
 &#10230;
-
+隨機抽樣 - 隨機抽樣指的是 n 個隨機變數 X1,...,Xn 和 X 獨立且同分佈的集合
 <br>
 
 48. **Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
 
 &#10230;
-
+估計量 - 估計量是一個資料的函數，用來推斷在統計模型中未知參數的值
 <br>
 
 49. **Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
 
 &#10230;
-
+偏差 - 一個估計量的偏差 ^θ 定義為 ^θ 分佈期望值和真實值之間的差距：
 <br>
 
 50. **Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
 
 &#10230;
-
+注意：當 E[^θ]=θ 時，我們稱為不偏估計量
 <br>
 
 51. **Estimating the mean**
 
 &#10230;
-
+預估平均數
 <br>
 
 52. **Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯X and is defined as follows:**
 
 &#10230;
-
+樣本平均 - 一個隨機樣本的樣本平均是用來預估一個分佈的真實平均 μ，通常我們用 ¯X 來表示，定義如下：
 <br>
 
 53. **Remark: the sample mean is unbiased, i.e E[¯X]=μ.**
 
 &#10230;
-
+注意：當 E[¯X]=μ 時，則為不偏樣本平均
 <br>
 
 54. **Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
 
 &#10230;
-
+中央極限定理 - 當我們有一個隨機樣本 X1,...,Xn 滿足一個給定的分佈，其平均數為 μ，變異數為 σ2，我們有：
 <br>
 
 55. **Estimating the variance**
 
 &#10230;
-
+估計變異數
 <br>
 
 56. **Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
 
 &#10230;
-
+樣本變異數 - 一個隨機樣本的樣本變異數是用來估計一個分佈的真實變異數 σ2，通常使用 s2 或 ^σ2 來表示，定義如下：
 <br>
 
 57. **Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
 
 &#10230;
-
+注意：當 E[s2]=σ2 時，稱之為不偏樣本變異數
 <br>
 
 58. **Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
 
 &#10230;
-
+與樣本變異數的卡方關聯 - 令 s2 是一個隨機樣本的樣本變異數，我們可以得到：
 <br>
 
-59. **[Introduction, Sample space, Event, Permutation]**
+**59. [Introduction, Sample space, Event, Permutation]**
 
 &#10230;
-
+[介紹, 樣本空間, 事件, 排列]
 <br>
 
-60. **[Conditional probability, Bayes' rule, Independence]**
+**60. [Conditional probability, Bayes' rule, Independence]**
 
 &#10230;
-
+[條件機率, 貝氏定理, 獨立性]
 <br>
 
-61. **[Random variables, Definitions, Expectation, Variance]**
+**61. [Random variables, Definitions, Expectation, Variance]**
 
 &#10230;
-
+[隨機變數, 定義, 期望值, 變異數]
 <br>
 
-62. **[Probability distributions, Chebyshev's inequality, Main distributions]**
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
 
 &#10230;
-
+[機率分佈, 柴比雪夫不等式, 主要分佈]
 <br>
 
-63. **[Jointly distributed random variables, Density, Covariance, Correlation]**
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
 
 &#10230;
-
+[聯合分佈隨機變數, 密度, 共變異數, 相關]
 <br>
 
-64. **[Parameter estimation, Mean, Variance]**
+**64. [Parameter estimation, Mean, Variance]**
 
 &#10230;
+[參數估計, 平均數, 變異數]
\ No newline at end of file
diff --git a/zh-tw/cs-229-supervised-learning.md b/zh-tw/cs-229-supervised-learning.md
new file mode 100644
index 000000000..0b329e8db
--- /dev/null
+++ b/zh-tw/cs-229-supervised-learning.md
@@ -0,0 +1,352 @@
+1. **Supervised Learning cheatsheet**
+
+&#10230; 監督式學習參考手冊
+
+2. **Introduction to Supervised Learning**
+
+&#10230; 監督式學習介紹
+
+3. **Given a set of data points {x(1),...,x(m)} associated to a set of outcomes {y(1),...,y(m)}, we want to build a classifier that learns how to predict y from x.**
+
+&#10230; 給定一組資料點 {x(1),...,x(m)}，以及對應的一組輸出 {y(1),...,y(m)}，我們希望建立一個分類器，用來學習如何從 x 來預測 y
+
+4. **Type of prediction ― The different types of predictive models are summed up in the table below:**
+
+&#10230; 預測的種類 - 根據預測的種類不同，我們將預測模型分為底下幾種：
+
+5. **[Regression, Classifier, Outcome, Examples]**
+
+&#10230; [迴歸, 分類器, 結果, 範例]
+
+6. **[Continuous, Class, Linear regression, Logistic regression, SVM, Naive Bayes]**
+
+&#10230; [連續, 類別, 線性迴歸, 邏輯迴歸, 支援向量機 (SVM) , 單純貝式分類器]
+
+7. **Type of model ― The different models are summed up in the table below:**
+
+&#10230; 模型種類 - 不同種類的模型歸納如下表：
+
+8. **[Discriminative model, Generative model, Goal, What's learned, Illustration, Examples]**
+
+&#10230; [判別模型, 生成模型, 目標, 學到什麼, 示意圖, 範例]
+
+9. **[Directly estimate P(y|x), Estimate P(x|y) to then deduce P(y|x), Decision boundary,  	Probability distributions of the data, Regressions, SVMs, GDA, Naive Bayes]**
+
+&#10230; [直接估計 P(y|x), 先估計 P(x|y)，然後推論出 P(y|x), 決策分界線, 資料的機率分佈, 迴歸, 支援向量機 (SVM), 高斯判別分析 (GDA), 單純貝氏 (Naive Bayes)]
+
+10. **Notations and general concepts**
+
+&#10230; 符號及一般概念
+
+11. **Hypothesis ― The hypothesis is noted hθ and is the model that we choose. For a given input data x(i) the model prediction output is hθ(x(i)).**
+
+&#10230; 假設 - 我們使用 hθ 來代表所選擇的模型，對於給定的輸入資料 x(i)，模型預測的輸出是 hθ(x(i))
+
+12. **Loss function ― A loss function is a function L:(z,y)∈R×Y⟼L(z,y)∈R that takes as inputs the predicted value z corresponding to the real data value y and outputs how different they are. The common loss functions are summed up in the table below:**
+
+&#10230; 損失函數 - 損失函數是一個函數 L:(z,y)∈R×Y⟼L(z,y)∈R，
+目的在於計算預測值 z 和實際值 y 之間的差距。底下是一些常見的損失函數：
+
+13. **[Least squared error, Logistic loss, Hinge loss, Cross-entropy]**
+
+&#10230; [最小平方法, Logistic 損失函數, Hinge 損失函數, 交叉熵]
+
+14. **[Linear regression, Logistic regression, SVM, Neural Network]**
+
+&#10230; [線性迴歸, 邏輯迴歸, 支援向量機 (SVM), 神經網路]
+
+15. **Cost function ― The cost function J is commonly used to assess the performance of a model, and is defined with the loss function L as follows:**
+
+&#10230; 代價函數 - 代價函數 J 通常用來評估一個模型的表現，它可以透過損失函數 L 來定義：
+
+16. **Gradient descent ― By noting α∈R the learning rate, the update rule for gradient descent is expressed with the learning rate and the cost function J as follows:**
+
+&#10230; 梯度下降 - 使用 α∈R 表示學習速率，我們透過學習速率和代價函數來使用梯度下降的方法找出網路參數更新的方法可以表示為：
+
+17. **Remark: Stochastic gradient descent (SGD) is updating the parameter based on each training example, and batch gradient descent is on a batch of training examples.**
+
+&#10230; 注意：隨機梯度下降法 (SGD) 使用每一個訓練資料來更新參數。而批次梯度下降法則是透過一個批次的訓練資料來更新參數。
+
+18. **Likelihood ― The likelihood of a model L(θ) given parameters θ is used to find the optimal parameters θ through maximizing the likelihood. In practice, we use the log-likelihood ℓ(θ)=log(L(θ)) which is easier to optimize. We have:**
+
+&#10230; 概似估計 - 在給定參數 θ 的條件下，一個模型 L(θ) 的概似估計的目的是透過最大概似估計法來找到最佳的參數。實務上，我們會使用對數概似估計函數 (log-likelihood) ℓ(θ)=log(L(θ))，會比較容易最佳化。如下：
+
+19. **Newton's algorithm ― The Newton's algorithm is a numerical method that finds θ such that ℓ′(θ)=0. Its update rule is as follows:**
+
+&#10230; 牛頓演算法 - 牛頓演算法是一個數值方法，目的在於找到一個 θ，讓 ℓ′(θ)=0。其更新的規則為：
+
+20. **Remark: the multidimensional generalization, also known as the Newton-Raphson method, has the following update rule:**
+
+&#10230; 注意：多維度正規化的方法，或又被稱之為牛頓-拉弗森 (Newton-Raphson) 演算法，是透過以下的規則更新：
+
+21. **Linear models**
+
+&#10230; 線性模型
+
+22. **Linear regression**
+
+&#10230; 線性迴歸
+
+23. **We assume here that y|x;θ∼N(μ,σ2)**
+
+&#10230; 我們假設 y|x;θ∼N(μ,σ2)
+
+24. **Normal equations ― By noting X the matrix design, the value of θ that minimizes the cost function is a closed-form solution such that:**
+
+&#10230; 正規方程法 - 我們使用 X 代表矩陣，讓代價函數最小的 θ 值有一個封閉解，如下：
+
+25. **LMS algorithm ― By noting α the learning rate, the update rule of the Least Mean Squares (LMS) algorithm for a training set of m data points, which is also known as the Widrow-Hoff learning rule, is as follows:**
+
+&#10230; 最小均方演算法 (LMS) - 我們使用 α 表示學習速率，針對 m 個訓練資料，透過最小均方演算法的更新規則，或是叫做 Widrow-Hoff 學習法如下：
+
+26. **Remark: the update rule is a particular case of the gradient ascent.**
+
+&#10230; 注意：這個更新的規則是梯度上升的一種特例
+
+27. **LWR ― Locally Weighted Regression, also known as LWR, is a variant of linear regression that weights each training example in its cost function by w(i)(x), which is defined with parameter τ∈R as:**
+
+&#10230; 局部加權迴歸 ，又稱為 LWR，是線性洄歸的變形，通過w(i)(x) 對其成本函數中的每個訓練樣本進行加權，其中參數 τ∈R 定義為：
+
+28. **Classification and logistic regression**
+
+&#10230; 分類與邏輯迴歸
+
+29. **Sigmoid function ― The sigmoid function g, also known as the logistic function, is defined as follows:**
+
+&#10230; Sigmoid 函數 - Sigmoid 函數 g，也可以稱為邏輯函數定義如下：
+
+30. **Logistic regression ― We assume here that y|x;θ∼Bernoulli(ϕ). We have the following form:**
+
+&#10230; 邏輯迴歸 - 我們假設 y|x;θ∼Bernoulli(ϕ)，請參考以下：
+
+31. **Remark: there is no closed form solution for the case of logistic regressions.**
+
+&#10230; 注意：對於這種情況的邏輯迴歸，並沒有一個封閉解
+
+32. **Softmax regression ― A softmax regression, also called a multiclass logistic regression, is used to generalize logistic regression when there are more than 2 outcome classes. By convention, we set θK=0, which makes the Bernoulli parameter ϕi of each class i equal to:**
+
+&#10230; Softmax 迴歸 - Softmax 迴歸又稱做多分類邏輯迴歸，目的是用在超過兩個以上的分類時的迴歸使用。按照慣例，我們設定 θK=0，讓每一個類別的 Bernoulli 參數 ϕi 等同於：
+
+33. **Generalized Linear Models**
+
+&#10230; 廣義線性模型
+
+34. **Exponential family ― A class of distributions is said to be in the exponential family if it can be written in terms of a natural parameter, also called the canonical parameter or link function, η, a sufficient statistic T(y) and a log-partition function a(η) as follows:**
+
+&#10230; 指數族分佈 - 一個分佈如果可以透過自然參數 (或稱之為正準參數或連結函數) η、充分統計量 T(y) 和對數區分函數 (log-partition function) a(η) 來表示時，我們就稱這個分佈是屬於指數族分佈。該分佈可以表示如下：
+
+35. **Remark: we will often have T(y)=y. Also, exp(−a(η)) can be seen as a normalization parameter that will make sure that the probabilities sum to one.**
+
+&#10230; 注意：我們經常讓 T(y)=y，同時，exp(−a(η)) 可以看成是一個正規化的參數，目的在於讓機率總和為一。
+
+36. **Here are the most common exponential distributions summed up in the following table:**
+
+&#10230; 底下是最常見的指數分佈：
+
+37. **[Distribution, Bernoulli, Gaussian, Poisson, Geometric]**
+
+&#10230; [分佈, 白努利 (Bernoulli), 高斯 (Gaussian), 卜瓦松 (Poisson), 幾何 (Geometric)]
+
+38. **Assumptions of GLMs ― Generalized Linear Models (GLM) aim at predicting a random variable y as a function fo x∈Rn+1 and rely on the following 3 assumptions:**
+
+&#10230; 廣義線性模型的假設 - 廣義線性模型 (GLM) 的目的在於，給定 x∈Rn+1，要預測隨機變數 y，同時它依賴底下三個假設：
+
+39. **Remark: ordinary least squares and logistic regression are special cases of generalized linear models.**
+
+&#10230; 注意：最小平方法和邏輯迴歸是廣義線性模型的一種特例
+
+40. **Support Vector Machines**
+
+&#10230; 支援向量機
+
+41. **The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
+
+&#10230; 支援向量機的目的在於找到一條決策邊界和資料樣本之間最大化最小距離的線
+
+42. **Optimal margin classifier ― The optimal margin classifier h is such that:**
+
+&#10230; 最佳的邊界分類器 - 最佳的邊界分類器可以表示為：
+
+43. **where (w,b)∈Rn×R is the solution of the following optimization problem:**
+
+&#10230; 其中，(w,b)∈Rn×R 是底下最佳化問題的答案：
+
+44. **such that**
+
+&#10230; 使得
+
+45. **support vectors**
+
+&#10230; 支援向量
+
+46. **Remark: the line is defined as wTx−b=0.**
+
+&#10230; 注意：該條直線定義為 wTx−b=0
+
+47. **Hinge loss ― The hinge loss is used in the setting of SVMs and is defined as follows:**
+
+&#10230; Hinge 損失函數 - Hinge 損失函數用在支援向量機上，定義如下：
+
+48. **Kernel ― Given a feature mapping ϕ, we define the kernel K to be defined as:**
+
+&#10230; 核(函數) - 給定特徵轉換 ϕ，我們定義核(函數) K 為：
+
+49. **In practice, the kernel K defined by K(x,z)=exp(−||x−z||22σ2) is called the Gaussian kernel and is commonly used.**
+
+&#10230; 實務上，K(x,z)=exp(−||x−z||22σ2) 定義的核(函數) K，一般稱作高斯核(函數)。這種核(函數)經常被使用
+
+50. **[Non-linear separability, Use of a kernel mapping, Decision boundary in the original space]**
+
+&#10230; [非線性可分, 使用核(函數)進行映射, 原始空間中的決策邊界]
+
+51. **Remark: we say that we use the "kernel trick" to compute the cost function using the kernel because we actually don't need to know the explicit mapping ϕ, which is often very complicated. Instead, only the values K(x,z) are needed.**
+
+&#10230; 注意：我們使用 "核(函數)技巧" 來計算代價函數時，不需要真正的知道映射函數 ϕ，這個函數非常複雜。相反的，我們只需要知道 K(x,z) 的值即可。
+
+52. **Lagrangian ― We define the Lagrangian L(w,b) as follows:**
+
+&#10230; Lagrangian - 我們將 Lagrangian L(w,b) 定義如下：
+
+53. **Remark: the coefficients βi are called the Lagrange multipliers.**
+
+&#10230; 注意：係數 βi 稱為 Lagrange 乘數
+
+54. **Generative Learning**
+
+&#10230; 生成學習
+
+55. **A generative model first tries to learn how the data is generated by estimating P(x|y), which we can then use to estimate P(y|x) by using Bayes' rule.**
+
+&#10230; 生成模型嘗試透過預估 P(x|y) 來學習資料如何生成，而我們可以透過貝氏定理來預估 P(y|x)
+
+56. **Gaussian Discriminant Analysis**
+
+&#10230; 高斯判別分析
+
+57. **Setting ― The Gaussian Discriminant Analysis assumes that y and x|y=0 and x|y=1 are such that:**
+
+&#10230; 設定 - 高斯判別分析針對 y、x|y=0 和 x|y=1 進行以下假設：
+
+58. **Estimation ― The following table sums up the estimates that we find when maximizing the likelihood:**
+
+&#10230; 估計 - 底下的表格總結了我們在最大概似估計時的估計值：
+
+59. **Naive Bayes**
+
+&#10230; 單純貝氏
+
+60. **Assumption ― The Naive Bayes model supposes that the features of each data point are all independent:**
+
+&#10230; 假設 - 單純貝氏模型會假設每個資料點的特徵都是獨立的。
+
+61. **Solutions ― Maximizing the log-likelihood gives the following solutions, with k∈{0,1},l∈[[1,L]]**
+
+&#10230; 解決方法 - 最大化對數概似估計來給出以下解答，k∈{0,1},l∈[[1,L]]
+
+62. **Remark: Naive Bayes is widely used for text classification and spam detection.**
+
+&#10230; 注意：單純貝氏廣泛應用在文字分類和垃圾信件偵測上
+
+63. **Tree-based and ensemble methods**
+
+&#10230; 基於樹狀結構的學習和整體學習
+
+64. **These methods can be used for both regression and classification problems.**
+
+&#10230; 這些方法可以應用在迴歸或分類問題上
+
+65. **CART ― Classification and Regression Trees (CART), commonly known as decision trees, can be represented as binary trees. They have the advantage to be very interpretable.**
+
+&#10230; CART - 分類與迴歸樹 (CART)，通常稱之為決策數，可以被表示為二元樹。它的優點是具有可解釋性。
+
+66. **Random forest ― It is a tree-based technique that uses a high number of decision trees built out of randomly selected sets of features. Contrary to the simple decision tree, it is highly uninterpretable but its generally good performance makes it a popular algorithm.**
+
+&#10230; 隨機森林 - 這是一個基於樹狀結構的方法，它使用大量經由隨機挑選的特徵所建構的決策樹。與單純的決策樹不同，它通常具有高度不可解釋性，但它的效能通常很好，所以是一個相當流行的演算法。
+
+67. **Remark: random forests are a type of ensemble methods.**
+
+&#10230; 注意：隨機森林是一種整體學習方法
+
+68. **Boosting ― The idea of boosting methods is to combine several weak learners to form a stronger one. The main ones are summed up in the table below:**
+
+&#10230; 增強學習 (Boosting) - 增強學習方法的概念是結合數個弱學習模型來變成強學習模型。主要的分類如下：
+
+69. **[Adaptive boosting, Gradient boosting]**
+
+&#10230; [自適應增強, 梯度增強]
+
+70. **High weights are put on errors to improve at the next boosting step**
+
+&#10230; 在下一輪的提升步驟中，錯誤的部分會被賦予較高的權重
+
+71. **Weak learners trained on remaining errors**
+
+&#10230; 弱學習器會負責訓練剩下的錯誤
+
+72. **Other non-parametric approaches**
+
+&#10230; 其他非參數方法
+
+73. **k-nearest neighbors ― The k-nearest neighbors algorithm, commonly known as k-NN, is a non-parametric approach where the response of a data point is determined by the nature of its k neighbors from the training set. It can be used in both classification and regression settings.**
+
+&#10230; k-最近鄰 - k-最近鄰演算法，又稱之為 k-NN，是一個非參數的方法，其中資料點的決定是透過訓練集中最近的 k 個鄰居而決定。它可以用在分類和迴歸問題上。
+
+74. **Remark: The higher the parameter k, the higher the bias, and the lower the parameter k, the higher the variance.**
+
+&#10230; 注意：參數 k 的值越大，偏差越大。k 的值越小，變異越大。
+
+75. **Learning Theory**
+
+&#10230; 學習理論
+
+76. **Union bound ― Let A1,...,Ak be k events. We have:**
+
+&#10230; 聯集上界 - 令 A1,...,Ak 為 k 個事件，我們有：
+
+77. **Hoeffding inequality ― Let Z1,..,Zm be m iid variables drawn from a Bernoulli distribution of parameter ϕ. Let ˆϕ be their sample mean and γ>0 fixed. We have:**
+
+&#10230; 霍夫丁不等式 - 令 Z1,..,Zm 為 m 個從參數 ϕ 的白努利分佈中抽出的獨立同分佈 (iid) 的變數。令 ˆϕ 為其樣本平均、固定 γ>0，我們可以得到：
+
+78. **Remark: this inequality is also known as the Chernoff bound.**
+
+&#10230; 注意：這個不等式也被稱之為 Chernoff 界線
+
+79. **Training error ― For a given classifier h, we define the training error ˆϵ(h), also known as the empirical risk or empirical error, to be as follows:**
+
+&#10230; 訓練誤差 - 對於一個分類器 h，我們定義訓練誤差為 ˆϵ(h)，也可以稱為經驗風險或經驗誤差。定義如下：
+
+80. **Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
+
+&#10230; 可能近似正確 (PAC) - PAC 是一個框架，有許多學習理論都證明其有效性。它包含以下假設：
+
+81: **the training and testing sets follow the same distribution**
+
+&#10230; 訓練和測試資料集具有相同的分佈
+
+82. **the training examples are drawn independently**
+
+&#10230; 訓練資料集之間彼此獨立
+
+83. **Shattering ― Given a set S={x(1),...,x(d)}, and a set of classifiers H, we say that H shatters S if for any set of labels {y(1),...,y(d)}, we have:**
+
+&#10230; 打散 (Shattering) - 給定一個集合 S={x(1),...,x(d)} 以及一組分類器的集合 H，如果對於任何一組標籤 {y(1),...,y(d)}，H 都能打散 S，定義如下：
+
+84. **Upper bound theorem ― Let H be a finite hypothesis class such that |H|=k and let δ and the sample size m be fixed. Then, with probability of at least 1−δ, we have:**
+
+&#10230; 上限定理 - 令 H 是一個有限假設類別，使 |H|=k 且令 δ 和樣本大小 m 固定，結著，在機率至少為 1−δ 的情況下，我們得到：
+
+85. **VC dimension ― The Vapnik-Chervonenkis (VC) dimension of a given infinite hypothesis class H, noted VC(H) is the size of the largest set that is shattered by H.**
+
+&#10230; VC 維度 - 一個有限假設類別的 Vapnik-Chervonenkis (VC) 維度 VC(H) 指的是 H 最多能夠打散的數量
+
+86. **Remark: the VC dimension of H={set of linear classifiers in 2 dimensions} is 3.**
+
+&#10230; 注意：H={2 維的線性分類器} 的 VC 維度為 3
+
+87. **Theorem (Vapnik) ― Let H be given, with VC(H)=d and m the number of training examples. With probability at least 1−δ, we have:**
+
+&#10230; 理論 (Vapnik) - 令 H 已給定，VC(H)=d 且 m 是訓練資料級的數量，在機率至少為 1−δ 的情況下，我們得到：
+
+88. **Known as Adaboost**
+
+&#10230; 被稱為 Adaboost
diff --git a/zh/cheatsheet-unsupervised-learning.md b/zh-tw/cs-229-unsupervised-learning.md
similarity index 59%
rename from zh/cheatsheet-unsupervised-learning.md
rename to zh-tw/cs-229-unsupervised-learning.md
index 93708b826..0f6d5ee34 100644
--- a/zh/cheatsheet-unsupervised-learning.md
+++ b/zh-tw/cs-229-unsupervised-learning.md
@@ -1,339 +1,298 @@
 1. **Unsupervised Learning cheatsheet**
 
 &#10230;
-
+非監督式學習參考手冊
 <br>
 
 2. **Introduction to Unsupervised Learning**
 
 &#10230;
-
+非監督式學習介紹
 <br>
 
 3. **Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
 
 &#10230;
-
+動機 - 非監督式學習的目的是要找出未標籤資料 {x(1),...,x(m)} 之間的隱藏模式
 <br>
 
 4. **Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
 
 &#10230;
-
+Jensen's 不等式 - 令 f 為一個凸函數、X 為一個隨機變數，我們可以得到底下這個不等式：
 <br>
 
 5. **Clustering**
 
 &#10230;
-
+分群
 <br>
 
 6. **Expectation-Maximization**
 
 &#10230;
-
+最大期望值
 <br>
 
 7. **Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
 
 &#10230;
-
+潛在變數 (Latent variables) - 潛在變數指的是隱藏/沒有觀察到的變數，這會讓問題的估計變得困難，我們通常使用 z 來代表它。底下是潛在變數的常見設定：
 <br>
 
 8. **[Setting, Latent variable z, Comments]**
 
 &#10230;
-
+[設定, 潛在變數 z, 評論]
 <br>
 
 9. **[Mixture of k Gaussians, Factor analysis]**
 
 &#10230;
-
+[k 元高斯模型, 因素分析]
 <br>
 
 10. **Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
 
 &#10230;
-
+演算法 - 最大期望演算法 (EM Algorithm) 透過重複建構一個概似函數的下界 (E-step) 和最佳化下界 (M-step) 來進行最大概似估計給出參數 θ 的高效率估計方法：
 <br>
 
 11. **E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
 
 &#10230;
-
+E-step: 評估後驗機率 Qi(z(i))，其中每個資料點 x(i) 來自於一個特定的群集 z(i)，如下：
 <br>
 
 12. **M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
 
 &#10230;
-
+M-step: 使用後驗機率 Qi(z(i)) 作為資料點 x(i) 在群集中特定的權重，用來分別重新估計每個群集，如下：
 <br>
 
 13. **[Gaussians initialization, Expectation step, Maximization step, Convergence]**
 
 &#10230;
-
+[高斯分佈初始化, E-Step, M-Step, 收斂]
 <br>
 
 14. **k-means clustering**
 
 &#10230;
-
+k-means 分群法
 <br>
 
 15. **We note c(i) the cluster of data point i and μj the center of cluster j.**
 
 &#10230;
-
+我們使用 c(i) 表示資料 i 屬於某群，而 μj 則是群 j 的中心
 <br>
 
 16. **Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
 
 &#10230;
-
+演算法 - 在隨機初始化群集中心點 μ1,μ2,...,μk∈Rn 後，k-means 演算法重複以下步驟直到收斂：
 <br>
 
 17. **[Means initialization, Cluster assignment, Means update, Convergence]**
 
 &#10230;
-
+[中心點初始化, 指定群集, 更新中心點, 收斂]
 <br>
 
 18. **Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
 
 &#10230;
-
+畸變函數 - 為了確認演算法是否收斂，我們定義以下的畸變函數：
 <br>
 
 19. **Hierarchical clustering**
 
 &#10230;
-
+階層式分群法
 <br>
 
 20. **Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
 
 &#10230;
-
+演算法 - 階層式分群法是透過一種階層架構的方式，將資料建立為一種連續層狀結構的形式。
 <br>
 
 21. **Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
 
 &#10230;
-
+類型 - 底下是幾種不同類型的階層式分群法，差別在於要最佳化的目標函式的不同，請參考底下：
 <br>
 
 22. **[Ward linkage, Average linkage, Complete linkage]**
 
 &#10230;
-
+[Ward 鏈結距離, 平均鏈結距離, 完整鏈結距離]
 <br>
 
 23. **[Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
 
 &#10230;
-
+[最小化群內距離, 最小化各群彼此的平均距離, 最小化各群彼此的最大距離]
 <br>
 
 24. **Clustering assessment metrics**
 
 &#10230;
-
+分群衡量指標
 <br>
 
 25. **In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
 
 &#10230;
-
+在非監督式學習中，通常很難去評估一個模型的好壞，因為我們沒有擁有像在監督式學習任務中正確答案的標籤
 <br>
 
 26. **Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
 
 &#10230;
-
+輪廓係數 (Silhouette coefficient) - 我們指定 a 為一個樣本點和相同群集中其他資料點的平均距離、b 為一個樣本點和下一個最接近群集其他資料點的平均距離，輪廓係數 s 對於此一樣本點的定義為：
 <br>
 
 27. **Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
 
 &#10230;
-
+Calinski-Harabaz 指標 - 定義 k 是群集的數量，Bk 和 Wk 分別是群內和群集之間的離差矩陣 (dispersion matrices)：
 <br>
 
 28. **the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
 
 &#10230;
-
+Calinski-Harabaz 指標 s(k) 指出分群模型的好壞，此指標的值越高，代表分群模型的表現越好。定義如下：
 <br>
 
 29. **Dimension reduction**
 
 &#10230;
-
+維度縮減
 <br>
 
 30. **Principal component analysis**
 
 &#10230;
-
+主成份分析
 <br>
 
 31. **It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
 
 &#10230;
-
+這是一個維度縮減的技巧，在於找到投影資料的最大方差
 <br>
 
 32. **Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
 &#10230;
-
+特徵值、特徵向量 - 給定一個矩陣 A∈Rn×n，我們說 λ 是 A 的特徵值，當存在一個特徵向量 z∈Rn∖{0}，使得：
 <br>
 
 33. **Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
 &#10230;
-
+譜定理 - 令 A∈Rn×n，如果 A 是對稱的，則 A 可以可以透過正交矩陣 U∈Rn×n 對角化。當 Λ=diag(λ1,...,λn)，我們得到：
 <br>
 
 34. **diagonal**
 
 &#10230;
-
+對角線
 <br>
 
 35. **Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
 
 &#10230;
-
+注意：與特徵值所關聯的特徵向量就是 A 矩陣的主特徵向量
 <br>
 
 36. **Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k dimensions by maximizing the variance of the data as follows:**
 
 &#10230;
-
+演算法 - 主成份分析 (PCA) 是一種維度縮減的技巧，它會透過尋找資料最大變異的方式，將資料投影在 k 維空間上：
 <br>
 
 37. **Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
 
 &#10230;
-
+第一步：正規化資料，讓資料平均為 0，變異數為 1
 <br>
 
 38. **Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
 
 &#10230;
-
+第二步：計算 Σ=1mm∑i=1x(i)x(i)T∈Rn×n，即對稱實際特徵值
 <br>
 
 39. **Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
 
 &#10230;
-
+第三步：計算 u1,...,uk∈Rn，k 個正交主特徵向量的總和 Σ，即是 k 個最大特徵值的正交特徵向量
 <br>
 
 40. **Step 4: Project the data on spanR(u1,...,uk).**
 
 &#10230;
-
+第四部：將資料投影到 spanR(u1,...,uk)
 <br>
 
 41. **This procedure maximizes the variance among all k-dimensional spaces.**
 
 &#10230;
-
+這個步驟會最大化所有 k 維空間的變異數
 <br>
 
 42. **[Data in feature space, Find principal components, Data in principal components space]**
 
 &#10230;
-
+[資料在特徵空間, 尋找主成分, 資料在主成分空間]
 <br>
 
 43. **Independent component analysis**
 
 &#10230;
-
+獨立成分分析
 <br>
 
 44. **It is a technique meant to find the underlying generating sources.**
 
 &#10230;
-
+這是用來尋找潛在生成來源的技巧
 <br>
 
 45. **Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
 
 &#10230;
-
+假設 - 我們假設資料 x 是從 n 維的來源向量 s=(s1,...,sn) 產生，si 為獨立變數，透過一個混合與非奇異矩陣 A 產生如下：
 <br>
 
 46. **The goal is to find the unmixing matrix W=A−1.**
 
 &#10230;
-
+目的在於找到一個 unmixing 矩陣 W=A−1
 <br>
 
 47. **Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
 
 &#10230;
-
+Bell 和 Sejnowski 獨立成份分析演算法 - 此演算法透過以下步驟來找到 unmixing 矩陣：
 <br>
 
 48. **Write the probability of x=As=W−1s as:**
 
 &#10230;
-
+紀錄 x=As=W−1s 的機率如下：
 <br>
 
 49. **Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
 
 &#10230;
-
+在給定訓練資料 {x(i),i∈[[1,m]]} 的情況下，其對數概似估計函數與定義 g  為 sigmoid 函數如下：
 <br>
 
 50. **Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
 
 &#10230;
-
-<br>
-
-51. **The Machine Learning cheatsheets are now available in Mandarin.**
-
-&#10230;
-
-<br>
-
-52. **Original authors**
-
-&#10230;
-
-<br>
-
-53. **Translated by X, Y and Z**
-
-&#10230;
-
-<br>
-
-54. **Reviewed by X, Y and Z**
-
-&#10230;
-
-<br>
-
-55. **[Introduction, Motivation, Jensen's inequality]**
-
-&#10230;
-
-<br>
-
-56. **[Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
-
-&#10230;
-
-<br>
-
-57. **[Dimension reduction, PCA, ICA]**
-
-&#10230;
+因此，梯度隨機下降學習規則對每個訓練樣本 x(i) 來說，我們透過以下方法來更新 W：
diff --git a/zh/cheatsheet-deep-learning.md b/zh/cheatsheet-deep-learning.md
deleted file mode 100644
index a7604ccc6..000000000
--- a/zh/cheatsheet-deep-learning.md
+++ /dev/null
@@ -1,321 +0,0 @@
-1. **Deep Learning cheatsheet**
-
-&#10230;
-
-<br>
-
-2. **Neural Networks**
-
-&#10230;
-
-<br>
-
-3. **Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
-
-&#10230;
-
-<br>
-
-4. **Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
-
-&#10230;
-
-<br>
-
-5. **[Input layer, hidden layer, output layer]**
-
-&#10230;
-
-<br>
-
-6. **By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
-
-&#10230;
-
-<br>
-
-7. **where we note w, b, z the weight, bias and output respectively.**
-
-&#10230;
-
-<br>
-
-8. **Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
-
-&#10230;
-
-<br>
-
-9. **[Sigmoid, Tanh, ReLU, Leaky ReLU]**
-
-&#10230;
-
-<br>
-
-10. **Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
-
-&#10230;
-
-<br>
-
-11. **Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
-
-&#10230;
-
-<br>
-
-12. **Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
-
-&#10230;
-
-<br>
-
-13. **As a result, the weight is updated as follows:**
-
-&#10230;
-
-<br>
-
-14. **Updating weights ― In a neural network, weights are updated as follows:**
-
-&#10230;
-
-<br>
-
-15. **Step 1: Take a batch of training data.**
-
-&#10230;
-
-<br>
-
-16. **Step 2: Perform forward propagation to obtain the corresponding loss.**
-
-&#10230;
-
-<br>
-
-17. **Step 3: Backpropagate the loss to get the gradients.**
-
-&#10230;
-
-<br>
-
-18. **Step 4: Use the gradients to update the weights of the network.**
-
-&#10230;
-
-<br>
-
-19. **Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
-
-&#10230;
-
-<br>
-
-20. **Convolutional Neural Networks**
-
-&#10230;
-
-<br>
-
-21. **Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
-
-&#10230;
-
-<br>
-
-22. **Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
-
-&#10230;
-
-<br>
-
-23. **It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
-
-&#10230;
-
-<br>
-
-24. **Recurrent Neural Networks**
-
-&#10230;
-
-<br>
-
-25. **Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
-
-&#10230;
-
-<br>
-
-26. **[Input gate, forget gate, gate, output gate]**
-
-&#10230;
-
-<br>
-
-27. **[Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
-
-&#10230;
-
-<br>
-
-28. **LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
-
-&#10230;
-
-<br>
-
-29. **Reinforcement Learning and Control**
-
-&#10230;
-
-<br>
-
-30. **The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
-
-&#10230;
-
-<br>
-
-31. **Definitions**
-
-&#10230;
-
-<br>
-
-32. **Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
-
-&#10230;
-
-<br>
-
-33. **S is the set of states**
-
-&#10230;
-
-<br>
-
-34. **A is the set of actions**
-
-&#10230;
-
-<br>
-
-35. **{Psa} are the state transition probabilities for s∈S and a∈A**
-
-&#10230;
-
-<br>
-
-36. **γ∈[0,1[ is the discount factor**
-
-&#10230;
-
-<br>
-
-37. **R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
-
-&#10230;
-
-<br>
-
-38. **Policy ― A policy π is a function π:S⟶A that maps states to actions.**
-
-&#10230;
-
-<br>
-
-39. **Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
-
-&#10230;
-
-<br>
-
-40. **Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
-
-&#10230;
-
-<br>
-
-41. **Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
-
-&#10230;
-
-<br>
-
-42. **Remark: we note that the optimal policy π∗ for a given state s is such that:**
-
-&#10230;
-
-<br>
-
-43. **Value iteration algorithm ― The value iteration algorithm is in two steps:**
-
-&#10230;
-
-<br>
-
-44. **1) We initialize the value:**
-
-&#10230;
-
-<br>
-
-45. **2) We iterate the value based on the values before:**
-
-&#10230;
-
-<br>
-
-46. **Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
-
-&#10230;
-
-<br>
-
-47. **times took action a in state s and got to s′**
-
-&#10230;
-
-<br>
-
-48. **times took action a in state s**
-
-&#10230;
-
-<br>
-
-49. **Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
-
-&#10230;
-
-<br>
-
-50. **View PDF version on GitHub**
-
-&#10230;
-
-<br>
-
-51. **[Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
-
-&#10230;
-
-<br>
-
-52. **[Convolutional Neural Networks, Convolutional layer, Batch normalization]**
-
-&#10230;
-
-<br>
-
-53. **[Recurrent Neural Networks, Gates, LSTM]**
-
-&#10230;
-
-<br>
-
-54. **[Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
-
-&#10230;
diff --git a/zh/cheatsheet-supervised-learning.md b/zh/cs-229-supervised-learning.md
similarity index 100%
rename from zh/cheatsheet-supervised-learning.md
rename to zh/cs-229-supervised-learning.md

From 54f3c693879e1311326e91cdca5fc080742a1c0c Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 4 Nov 2019 22:42:56 -0800
Subject: [PATCH 448/531] Remove duplicates

---
 CONTRIBUTORS | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 96388ca4f..e54d4b44b 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -62,9 +62,6 @@
   AlisterTA (translation of deep learning tips and tricks)
   Erfan Noury (review of deep learning tips and tricks)
 
-  AlisterTA (translation of deep learning tips and tricks)
-  Erfan Noury (review of deep learning tips and tricks)
-
   Erfan Noury (translation of linear algebra)
   Mohammad Karimi (review of linear algebra)
 

From 430d263a319c64a54d97f8cb70d8b94c57b95f4b Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 4 Nov 2019 22:55:45 -0800
Subject: [PATCH 449/531] Add contributors

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index db83fa3d0..6e57234bd 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -6,6 +6,9 @@
   Amjad Khatabi (review of linear algebra)
   Mazen Melibari (review of linear algebra)
   
+  Mahmoud Aslan (translation of probabilities and statistics)
+  Fares Al-Quaneier (review of probabilities and statistics)
+  
 --de
 
 --es 

From 4dc093a7522d517a600bcbde250e5cbf83f480a3 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 4 Nov 2019 23:14:36 -0800
Subject: [PATCH 450/531] Update [ar] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 0e6dfaf73..bb2d4ec9a 100644
--- a/README.md
+++ b/README.md
@@ -54,7 +54,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 ### CS 229 (Machine Learning)
 | |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|**العَرَبِيَّة**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/182)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|
+|**العَرَبِيَّة**|done|done|done|done|done|done|
 |**Català**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|
 |**Deutsch**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|
 |**Español**|done|done|done|done|done|done|

From 328d9cce10af3a1b970e7f64463de7cba8ad061c Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 4 Nov 2019 23:29:40 -0800
Subject: [PATCH 451/531] Add contributors

---
 CONTRIBUTORS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index dc4167fc2..3c61867fd 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -147,6 +147,12 @@
   Gregory Reshetniak (translation of probabilities and statistics)
   Denys (review of probabilities and statistics)
   
+--vi
+  Tran Tuan Anh (translation of supervised learning)
+  Dam Minh Tien (review of supervised learning)
+  Hung Nguyễn (review of supervised learning)
+  Nguyễn Trí Minh (review of supervised learning)
+  
 --zh
   Wang Hongnian (translation of supervised learning)
   Xiaohu Zhu (朱小虎) (review of supervised learning)

From 91341d5cb3b6f02440367c3dfaab41179a28724d Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Mon, 4 Nov 2019 23:39:33 -0800
Subject: [PATCH 452/531] Update [vi] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index bb2d4ec9a..99bb01266 100644
--- a/README.md
+++ b/README.md
@@ -73,7 +73,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|done|done|
 |**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
-|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/162)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/177)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
+|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/177)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
 |**简体中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 |**繁體中文**|done|done|done|done|done|done|
 

From 53f96a7bea5ee1f07b77a8f8e54cb0a511e31d85 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 5 Nov 2019 22:08:27 -0800
Subject: [PATCH 453/531] Add [zh-tw] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 99bb01266..8b6e3a0f7 100644
--- a/README.md
+++ b/README.md
@@ -101,7 +101,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Українська**|not started|not started|not started|
 |**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/180)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
 |**简体中文**|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/181)|not started|
-|**繁體中文**|not started|not started|not started|
+|**繁體中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/196)|not started|not started|
 
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From 3c5916ad4821eaaa1a68a405c46a740d881db76a Mon Sep 17 00:00:00 2001
From: qunaieer <fsaq2000@gmail.com>
Date: Wed, 6 Nov 2019 18:51:13 +0300
Subject: [PATCH 454/531] Adding RTL tags to cs-229-deep-learning.md for AR

---
 ar/cs-229-deep-learning.md | 120 +++++++++++++++++++++++++++++++++++--
 1 file changed, 114 insertions(+), 6 deletions(-)

diff --git a/ar/cs-229-deep-learning.md b/ar/cs-229-deep-learning.md
index d4cf59da6..197538d2b 100644
--- a/ar/cs-229-deep-learning.md
+++ b/ar/cs-229-deep-learning.md
@@ -2,322 +2,430 @@
 **1. Deep Learning cheatsheet**
 
 &#10230;
+<div dir="rtl">
 ملخص مختصر التعلم العميق
+</div>
 <br> 
 
 **2. Neural Networks**
 
 &#10230;
+<div dir="rtl">
 الشبكة العصبونية الاصطناعية(Neural Networks)
+</div>
 <br> 
 **3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
 
 &#10230;
+<div dir="rtl">
 الشبكة العصبونية الاصطناعيةهي عبارة عن نوع من النماذج يبنى من عدة طبقات , اكثر هذة الانواع استخداما هي الشبكات الالتفافية و الشبكات العصبونية المتكرره
 
+</div>
 <br> 
 
 **4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
 
 &#10230;
+<div dir="rtl">
 البنية - المصطلحات حول بنية الشبكة العصبونية موضح في الشكل ادناة
+</div>
 <br> 
 
 **5. [Input layer, hidden layer, output layer]**
 
 &#10230;
+<div dir="rtl">
 [طبقة ادخال, طبقة مخفية, طبقة اخراج ]
+</div>
 <br>  
 
 **6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
 
 &#10230;
+<div dir="rtl">
 عبر تدوين i كالطبقة رقم i و j للدلالة على رقم الوحده الخفية في تلك الطبقة , نحصل على:
+</div>
 <br>  
 
 **7. where we note w, b, z the weight, bias and output respectively.**
 
 &#10230;
+<div dir="rtl">
 حيث نعرف w, b, z كالوزن , و معامل التعديل , و الناتج حسب الترتيب.
+</div>
 <br>  
 
 **8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
 
 &#10230;
+<div dir="rtl">
 دالة التفعيل(Activation function) - دالة التفعيل تستخدم في نهاية الوحده الخفية لتضمن المكونات الغير خطية للنموذج. هنا بعض دوال التفعيل الشائعة
+</div>
 <br> 
 
 **9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
 
 &#10230;
+<div dir="rtl">
 [Sigmoid, Tanh, ReLU, Leaky ReLU] 
+</div>
 <br> 
 
 **10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
 
 &#10230;
+<div dir="rtl">
 دالة الانتروبيا التقاطعية للخسارة(Cross-entropy loss) - في سياق الشبكات العصبونية, دالة الأنتروبيا L(z,y) تستخدم و تعرف كالاتي:
+</div>
 <br>  
 
 **11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
 
 &#10230;
+<div dir="rtl">
 معدل التعلم(Learning rate) - معدل التعلم, يرمز , و هو مؤشر في اي تجاة يتم تحديث الاوزان. يمكن تثبيت هذا المعامل او تحديثة بشكل تأقلمي . حاليا اكثر النسب شيوعا تدعى Adam , وهي طريقة تجعل هذه النسبة سرعة التعلم بشكل تأقلمي    α او η ب , 
+</div>
 <br>  
 
 **12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
 
 &#10230;
+<div dir="rtl">
 التغذية الخلفية(Backpropagation) - التغذية الخلفية هي طريقة لتحديث الاوزان في الشبكة العصبونية عبر اعتبار القيم الحقيقة للناتج مع القيمة المطلوبة للخرج. المشتقة بالنسبة للوزن w يتم حسابها باستخدام قاعدة التسلسل و تكون عبر الشكل الاتي: 
+</div>
 <br>
 
 **13. As a result, the weight is updated as follows:**
 
 &#10230;
+<div dir="rtl">
 كنتيجة , الوزن سيتم تحديثة كالتالي:
+</div>
 <br> 
 
 **14. Updating weights ― In a neural network, weights are updated as follows:**
 
 &#10230;
+<div dir="rtl">
 تحديث الاوزان - في الشبكات العصبونية , يتم تحديث الاوزان كما يلي: 
+</div>
 <br>  
 
 **15. Step 1: Take a batch of training data.**
 
 &#10230;
+<div dir="rtl">
 الخطوة 1: خذ حزمة من بيانات التدريب
+</div>
 <br> 
 
 **16. Step 2: Perform forward propagation to obtain the corresponding loss.**
 
 &#10230;
+<div dir="rtl">
 الخطوة 2: قم بعملية التغذيه الامامية لحساب الخسارة الناتجة
+</div>
 <br> 
 
 **17. Step 3: Backpropagate the loss to get the gradients.**
 
 &#10230;
+<div dir="rtl">
 الخطوة 3: قم بتغذية خلفية للخساره للحصول على دالة الانحدار
+</div>
 <br>  
 
 **18. Step 4: Use the gradients to update the weights of the network.**
 
 &#10230;
+<div dir="rtl">
 الخطوة 4: استخدم قيم الانحدار لتحديث اوزان الشبكة
+</div>
 <br> 
 
 **19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
 
 &#10230;
+<div dir="rtl">
 الاسقاط(Dropout) - الاسقاط هي طريقة الغرض منها منع التكيف الزائد للنموذج في بيانات التدريب عبر اسقاط بعض الواحدات في الشبكة العصبونية, العصبونات يتم اما اسقاطها باحتمالية p او الحفاظ عليها باحتمالية 1-p.
+</div>
 <br>  
 
 **20. Convolutional Neural Networks**
 
 &#10230;
+<div dir="rtl">
 الشبكات العصبونية الالتفافية(CNN) 
+</div>
 <br> 
 
 **21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
 
 &#10230;
+<div dir="rtl">
 احتياج الطبقة الالتفافية - عبر رمز w لحجم المدخل , F حجم العصبونات للطبقة الالتفافية , P عدد الحشوات الصفرية , فأن N عدد العصبونات لكل حجم معطى يحسب عبر الاتي: 
+</div>
 <br>
 
 **22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
 
 &#10230;
+<div dir="rtl">
 تنظيم الحزمة(Batch normalization) - هي خطوه من قيم التحسين الخاصة γ,β  والتي تعدل الحزمة {xi}. لنجعل μB,σ2B المتوسط و الانحراف للحزمة المعنية و نريد تصحيح هذه الحزمة, يتم ذلك كالتالي:    
+</div>
 <br> 
 
 **23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
 
 &#10230;
+<div dir="rtl">
 في الغالب تتم بعد الطبقة الالتفافية أو المتصلة كليا و قبل طبقة التغيرات الغير خطية و تهدف للسماح للسرعات التعليم العالية للتقليل من الاعتمادية القوية للقيم الاولية.
 
-
+</div>
 <br>
 
 **24. Recurrent Neural Networks**
 
 &#10230;
+<div dir="rtl">
 (RNN)الشبكات العصبونية التكرارية
+</div>
 <br> 
 
 **25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
 
 &#10230;
+<div dir="rtl">
 انواع البوابات - هنا الانواع المختلفة التي ممكن مواجهتها في الشبكة العصبونية الاعتيادية:
+</div>
 <br>  
 
 **26. [Input gate, forget gate, gate, output gate]**
 
 &#10230;
+<div dir="rtl">
 [بوابة ادخال, بوابة نسيان, بوابة منفذ, بوابة اخراج ]
+</div>
 <br> 
 
 **27. [Write to cell or not?, Erase a cell or not?, How much to write to cell?, How much to reveal cell?]**
 
 &#10230;
+<div dir="rtl">
 [كتابة ام عدم كتابة الى الخلية؟, مسح ام عدم مسح الخلية؟, كمية الكتابة الى الخلية ؟ , مدى الافصاح عن الخلية ؟ ]
+</div>
 <br> 
 
 **28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
 
 &#10230;
+<div dir="rtl">
 LSTM - ذاكرة طويلة قصير الامد (long short-term memory) هي نوع من نموذج ال RNN تستخدم لتجنب مشكلة اختفاء الانحدار عبر اضافة بوابات النسيان.
+</div>
 <br>  
 
 **29. Reinforcement Learning and Control**
 
 &#10230;
+<div dir="rtl">
 التعلم و التحكم المعزز(Reinforcement Learning)
+</div>
 <br> 
 
 **30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
 
 &#10230;
+<div dir="rtl">
 الهدف من التعلم المعزز للعميل الذكي هو التعلم لكيفية التأقلم في اي بيئة.
+</div>
 <br>  
 
 **31. Definitions**
 
 &#10230;
+<div dir="rtl">
 تعريفات
+</div>
 <br> 
 
 **32. Markov decision processes ― A Markov decision process (MDP) is a 5-tuple (S,A,{Psa},γ,R) where:**
 
 &#10230;
+<div dir="rtl">
 عملية ماركوف لاتخاذ القرار - عملية ماركوف لاتخاذ القرار هي سلسلة خماسية (S,A,{Psa},γ,R) حيث
-
+</div>
 <br> 
 **33. S is the set of states**
 
 &#10230;
+<div dir="rtl">
  S هي مجموعة من حالات البيئة
+</div>
 <br>
 
 **34. A is the set of actions**
 
 &#10230;
+<div dir="rtl">
 A هي مجموعة من حالات الاجراءات
-<br> 
+</div>
+<br>
+
 **35. {Psa} are the state transition probabilities for s∈S and a∈A**
 
 &#10230;
+<div dir="rtl">
 {Psa} هو حالة احتمال الانتقال من الحالة s∈S و a∈A
+</div>
 <br>  
 
 **36. γ∈[0,1[ is the discount factor**
 
 &#10230;
+<div dir="rtl">
 γ∈[0,1[ هي عامل الخصم
+</div>
 <br>   
 
 **37. R:S×A⟶R or R:S⟶R is the reward function that the algorithm wants to maximize**
 
 &#10230;
+<div dir="rtl">
 R:S×A⟶R or R:S⟶R  هي دالة المكافأة والتي تعمل الخوارزمية على جعلها اعلى قيمة
+</div>
 <br> 
 
 **38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
 
 &#10230;
+<div dir="rtl">
 دالة القواعد - دالة القواعد π:S⟶A  هي التي تقوم بترجمة الحالات الى اجراءات.
+</div>
 <br>  
 
 **39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
 
 &#10230;
+<div dir="rtl">
 ملاحظة: نقول ان النموذج ينفذ القاعدة المعينه π للحالة المعطاة s ان نتخذ الاجراءa=π(s).  
+</div>
 <br>  
  
 **40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
 
 &#10230;
+<div dir="rtl">
 دالة القاعدة - لاي قاعدة معطاة π و حالة s, نقوم بتعريف دالة القيمة Vπ  كما يلي:  
+</div>
 <br>    
 
 **41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
 
 &#10230;
+<div dir="rtl">
 معادلة بيلمان - معادلات بيلمان المثلى تشخص دالة القيمة دالة القيمة Vπ∗  π∗:للقاعدة المثلى 
+</div>
 <br>  
 
 **42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
 
 &#10230;
+<div dir="rtl">
   π∗ للحالة المعطاه s تعطى كاالتالي: ملاحظة: نلاحظ ان القاعدة المثلى
+</div>
 <br>  
 
 **43. Value iteration algorithm ― The value iteration algorithm is in two steps:**
 
 &#10230;
+<div dir="rtl">
 خوارزمية تكرار القيمة(Value iteration algorithm) - خوارزمية تكرار القيمة تكون في خطوتين:
+</div>
 <br>  
 
 **44. 1) We initialize the value:**
 
 &#10230;
+<div dir="rtl">
  1) نقوم بوضع قيمة اولية:
+</div>
 <br> 
 
 **45. 2) We iterate the value based on the values before:**
 
 &#10230;
+<div dir="rtl">
 2) نقوم بتكرير القيمة حسب القيم السابقة: 
-
+</div>
 <br> 
+
 **46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
 
 &#10230;
+<div dir="rtl">
 تقدير الامكانية القصوى - تقديرات الامكانية القصوى (تقدير الاحتمال الأرجح) لحتماليات انتقال الحالة تكون كما يلي : 
+</div>
 <br>   
 
 **47. times took action a in state s and got to s′**
 
 &#10230;
+<div dir="rtl">
 اوقات تنفيذ الاجراء a في الحالة s و انتقلت الى s' 
-
+</div>
 <br> 
+
 **48. times took action a in state s**
 
 &#10230;
+<div dir="rtl">
 اوقات تنفيذ الاجراء a في الحالة s
+</div>
 <br>  
 
 **49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
 
 &#10230;
+<div dir="rtl">
 التعلم-Q (Q-learning) -هي طريقة غير منمذجة لتقدير Q , و تتم كالاتي:
-<br>  
+</div>
+<br> 
+
 **50. View PDF version on GitHub**
 
 &#10230;
+<div dir="rtl">
 قم باستعراض نسخة ال PDF على GitHub
+</div>
 <br>
 
 **51. [Neural Networks, Architecture, Activation function, Backpropagation, Dropout]**
 
 &#10230;
+<div dir="rtl">
  [شبكات عصبونية, البنية , دالة التفعيل , التغذية الخلفية , الاسقاط ]
+</div>
 <br> 
 
 **52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
 
 &#10230;
+<div dir="rtl">
 [ الشبكة العصبونية الالتفافية , طبقة التفافية , تنظيم الحزمة ] 
+</div>
 <br>  
 
 **53. [Recurrent Neural Networks, Gates, LSTM]**
 
 &#10230;
+<div dir="rtl">
 [الشبكة العصبونية التكرارية , البوابات , LSTM]
+</div>
 <br>  
 
 **54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
 
 &#10230;
+<div dir="rtl">
 [التعلم المعزز , عملية ماركوف لاتخاذ القرار , تكرير القيمة / القاعدة , بحث القاعدة]
+</div>

From bc9fc2f7fb2aa66b6c5498d77322ad030d4dd80b Mon Sep 17 00:00:00 2001
From: qunaieer <fsaq2000@gmail.com>
Date: Wed, 6 Nov 2019 18:53:03 +0300
Subject: [PATCH 455/531] Name Correction

The corrected name is: Fares Al-Qunaieer
---
 CONTRIBUTORS | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index a4e0ac689..b9bc05925 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -6,17 +6,17 @@
   Amjad Khatabi (review of linear algebra)
   Mazen Melibari (review of linear algebra)
 
-  Fares Al-Quaneier (translation of machine learning tips and tricks)
+  Fares Al-Qunaieer (translation of machine learning tips and tricks)
   Zaid Alyafeai (review of machine learning tips and tricks)
   
   Mahmoud Aslan (translation of probabilities and statistics)
-  Fares Al-Quaneier (review of probabilities and statistics)
+  Fares Al-Qunaieer (review of probabilities and statistics)
 
-  Fares Al-Quaneier (translation of supervised learning)
+  Fares Al-Qunaieer (translation of supervised learning)
   Zaid Alyafeai (review of supervised learning)
 
   Redouane Lguensat (translation of unsupervised learning)
-  Fares Al-Quaneier (review of unsupervised learning)
+  Fares Al-Qunaieer (review of unsupervised learning)
 
 --de
 

From ed0523d89d8dc34d05f6f00aa31d70605320909a Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 9 Nov 2019 10:35:53 -0800
Subject: [PATCH 456/531] Template fix

---
 template/cs-230-deep-learning-tips-and-tricks.md | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/template/cs-230-deep-learning-tips-and-tricks.md b/template/cs-230-deep-learning-tips-and-tricks.md
index 75127ac5d..e1778de36 100644
--- a/template/cs-230-deep-learning-tips-and-tricks.md
+++ b/template/cs-230-deep-learning-tips-and-tricks.md
@@ -268,8 +268,7 @@
 <br>
 
 
-**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.
-**
+**39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
 
 &#10230;
 
@@ -346,13 +345,7 @@
 <br>
 
 
-**50. [LASSO, Ridge, Elastic Net]**
-
-&#10230;
-
-<br>
-
-**50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+**50. [LASSO, Ridge, Elastic Net, Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
 &#10230;
 
@@ -421,7 +414,7 @@
 <br>
 
 
-**60. The Deep Learning cheatsheets are now available in [target language].
+**60. The Deep Learning cheatsheets are now available in [target language].**
 
 &#10230;
 

From e91a6d3e862f98b68fa897cf973b5e0adf2c22ec Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 9 Nov 2019 13:59:50 -0800
Subject: [PATCH 457/531] Small template fixes

---
 template/cs-229-supervised-learning.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/template/cs-229-supervised-learning.md b/template/cs-229-supervised-learning.md
index d82685e6e..9a0a1901a 100644
--- a/template/cs-229-supervised-learning.md
+++ b/template/cs-229-supervised-learning.md
@@ -242,19 +242,19 @@
 
 <br>
 
-**41: The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
+**41. The goal of support vector machines is to find the line that maximizes the minimum distance to the line.**
 
 &#10230;
 
 <br>
 
-**42: Optimal margin classifier ― The optimal margin classifier h is such that:**
+**42. Optimal margin classifier ― The optimal margin classifier h is such that:**
 
 &#10230;
 
 <br>
 
-**43: where (w,b)∈Rn×R is the solution of the following optimization problem:**
+**43. where (w,b)∈Rn×R is the solution of the following optimization problem:**
 
 &#10230;
 
@@ -476,13 +476,13 @@
 
 <br>
 
-**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions: **
+**80. Probably Approximately Correct (PAC) ― PAC is a framework under which numerous results on learning theory were proved, and has the following set of assumptions:**
 
 &#10230;
 
 <br>
 
-**81: the training and testing sets follow the same distribution **
+**81: the training and testing sets follow the same distribution**
 
 &#10230;
 

From d5c36330a3a8d3acec05cbf63400c4277dd611a0 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Sun, 10 Nov 2019 22:59:39 +0900
Subject: [PATCH 458/531] vi translate for unsupervised-learning

---
 vi/cs-229-unsupervised-learning.md | 344 +++++++++++++++++++++++++++++
 1 file changed, 344 insertions(+)
 create mode 100644 vi/cs-229-unsupervised-learning.md

diff --git a/vi/cs-229-unsupervised-learning.md b/vi/cs-229-unsupervised-learning.md
new file mode 100644
index 000000000..fb5615793
--- /dev/null
+++ b/vi/cs-229-unsupervised-learning.md
@@ -0,0 +1,344 @@
+**Unsupervised Learning translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/cheatsheet-unsupervised-learning)
+
+<br>
+
+**1. Unsupervised Learning cheatsheet**
+
+&#10230; Cheatsheet học không giám sát
+
+<br>
+
+**2. Introduction to Unsupervised Learning**
+
+&#10230; Giới thiệu về học không giám sát
+
+<br>
+
+**3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
+
+&#10230; Động lực ― Mục tiêu của học không giám sát là tìm được mẫu ẩn (hidden pattern) trong tập dữ liệu không được gán nhãn {x(1),...,x(m)}.
+
+<br>
+
+**4. Jensen's inequality ― Let f be a convex function and X a random variable. We have the following inequality:**
+
+&#10230; Bất đẳng thức Jensen - Cho f là một hàm lồi và X là một biến ngẫu nhiên. Chúng ta có bất đẳng thức sau:
+
+<br>
+
+**5. Clustering**
+
+&#10230; Phân cụm
+
+<br>
+
+**6. Expectation-Maximization**
+
+&#10230; Tối đa hoá kì vọng
+
+<br>
+
+**7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
+
+&#10230; Các biến Latent - Các biến Latent là các biến ẩn/ không thấy được khiến cho việc dự đoán trở nên khó khăn, và thường được kí hiệu là z. ĐÂy là các thiết lập phổ biến mà các biến latent thường có:
+
+<br>
+
+**8. [Setting, Latent variable z, Comments]**
+
+&#10230; [Thiết lập, Biến Latent z, Các bình luận]
+
+<br>
+
+**9. [Mixture of k Gaussians, Factor analysis]**
+
+&#10230;
+
+<br>
+
+**10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
+
+&#10230;
+
+<br>
+
+**11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
+
+&#10230;
+
+<br>
+
+**12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
+
+&#10230;
+
+<br>
+
+**13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
+
+&#10230;
+
+<br>
+
+**14. k-means clustering**
+
+&#10230;
+
+<br>
+
+**15. We note c(i) the cluster of data point i and μj the center of cluster j.**
+
+&#10230;
+
+<br>
+
+**16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
+
+&#10230;
+
+<br>
+
+**17. [Means initialization, Cluster assignment, Means update, Convergence]**
+
+&#10230;
+
+<br>
+
+**18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
+
+&#10230;
+
+<br>
+
+**19. Hierarchical clustering**
+
+&#10230;
+
+<br>
+
+**20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
+
+&#10230;
+
+<br>
+
+**21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**22. [Ward linkage, Average linkage, Complete linkage]**
+
+&#10230;
+
+<br>
+
+**23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
+
+&#10230;
+
+<br>
+
+**24. Clustering assessment metrics**
+
+&#10230;
+
+<br>
+
+**25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
+
+&#10230;
+
+<br>
+
+**26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
+
+&#10230;
+
+<br>
+
+**27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
+
+&#10230;
+
+<br>
+
+**28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
+
+&#10230;
+
+<br>
+
+**29. Dimension reduction**
+
+&#10230;
+
+<br>
+
+**30. Principal component analysis**
+
+&#10230;
+
+<br>
+
+**31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
+
+&#10230;
+
+<br>
+
+**32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+
+<br>
+
+**33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+
+<br>
+
+**34. diagonal**
+
+&#10230;
+
+<br>
+
+**35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
+
+&#10230;
+
+<br>
+
+**36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
+dimensions by maximizing the variance of the data as follows:**
+
+&#10230;
+
+<br>
+
+**37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
+
+&#10230;
+
+<br>
+
+**38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
+
+&#10230;
+
+<br>
+
+**39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
+
+&#10230;
+
+<br>
+
+**40. Step 4: Project the data on spanR(u1,...,uk).**
+
+&#10230;
+
+<br>
+
+**41. This procedure maximizes the variance among all k-dimensional spaces.**
+
+&#10230;
+
+<br>
+
+**42. [Data in feature space, Find principal components, Data in principal components space]**
+
+&#10230;
+
+<br>
+
+**43. Independent component analysis**
+
+&#10230;
+
+<br>
+
+**44. It is a technique meant to find the underlying generating sources.**
+
+&#10230;
+
+<br>
+
+**45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
+
+&#10230;
+
+<br>
+
+**46. The goal is to find the unmixing matrix W=A−1.**
+
+&#10230;
+
+<br>
+
+**47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
+
+&#10230;
+
+<br>
+
+**48. Write the probability of x=As=W−1s as:**
+
+&#10230;
+
+<br>
+
+**49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
+
+&#10230;
+
+<br>
+
+**50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
+
+&#10230;
+
+<br>
+
+**51. The Machine Learning cheatsheets are now available in [target language].**
+
+&#10230;
+
+<br>
+
+**52. Original authors**
+
+&#10230;
+
+<br>
+
+**53. Translated by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**54. Reviewed by X, Y and Z**
+
+&#10230;
+
+<br>
+
+**55. [Introduction, Motivation, Jensen's inequality]**
+
+&#10230;
+
+<br>
+
+**56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
+
+&#10230;
+
+<br>
+
+**57. [Dimension reduction, PCA, ICA]**
+
+&#10230;

From 85ff82ad60f488efaee8ea047b33abbc39d379f8 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 10 Nov 2019 23:42:07 -0800
Subject: [PATCH 459/531] Rename cheatsheet-supervised-learning.md to
 cs-229-supervised-learning.md

---
 ...sheet-supervised-learning.md => cs-229-supervised-learning.md} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename vi/{cheatsheet-supervised-learning.md => cs-229-supervised-learning.md} (100%)

diff --git a/vi/cheatsheet-supervised-learning.md b/vi/cs-229-supervised-learning.md
similarity index 100%
rename from vi/cheatsheet-supervised-learning.md
rename to vi/cs-229-supervised-learning.md

From 0423bd24b0498f9781f2bd07fa69347745035b9b Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Mon, 11 Nov 2019 22:48:41 +0900
Subject: [PATCH 460/531] vi translate for unsupervised learning

---
 vi/cs-229-unsupervised-learning.md | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/vi/cs-229-unsupervised-learning.md b/vi/cs-229-unsupervised-learning.md
index fb5615793..8b7cedb58 100644
--- a/vi/cs-229-unsupervised-learning.md
+++ b/vi/cs-229-unsupervised-learning.md
@@ -52,49 +52,49 @@
 
 **9. [Mixture of k Gaussians, Factor analysis]**
 
-&#10230;
+&#10230; [Sự kết hợp của k Gaussians, Phân tích hệ số]
 
 <br>
 
 **10. Algorithm ― The Expectation-Maximization (EM) algorithm gives an efficient method at estimating the parameter θ through maximum likelihood estimation by repeatedly constructing a lower-bound on the likelihood (E-step) and optimizing that lower bound (M-step) as follows:**
 
-&#10230;
+&#10230; Thuật toán - Thuật toán tối đa hoá kì vọng (EM) mang lại một phương thức có hiệu quả trong việc ước lượng tham số θ thông qua tối đa hoá giá trị ước lượng likelihood bằng cách lặp lại việc tạo nên một cận dưới cho likelihood (E-step) và tối ưu hoá cận dưới (M-step) như sau:   
 
 <br>
 
 **11. E-step: Evaluate the posterior probability Qi(z(i)) that each data point x(i) came from a particular cluster z(i) as follows:**
 
-&#10230;
+&#10230; E-step: Đánh giá xác suất hậu nghiệm Qi(z(i)) cho mỗi điểm dữ liệu x(i) đến từ một cụm z(i) cụ thể như sau:
 
 <br>
 
 **12. M-step: Use the posterior probabilities Qi(z(i)) as cluster specific weights on data points x(i) to separately re-estimate each cluster model as follows:**
 
-&#10230;
+&#10230; M-step: Sử dụng xác suất hậu nghiệm Qi(z(i)) như các trọng số cụ thể của cụm trên các điểm dữ liệu x(i) để ước lượng lại một cách riêng biệt cho mỗi mô hình cụm như sau:
 
 <br>
 
 **13. [Gaussians initialization, Expectation step, Maximization step, Convergence]**
 
-&#10230;
+&#10230; [Khởi tạo Gaussians, Bước kì vọng, Bước tối đa hoá, Hội tụ]
 
 <br>
 
 **14. k-means clustering**
 
-&#10230;
+&#10230; Phân cụm k-means
 
 <br>
 
 **15. We note c(i) the cluster of data point i and μj the center of cluster j.**
 
-&#10230;
+&#10230; Chúng ta kí hiệu c(i) là cụm của điểm dữ liệu i và μj là điểm trung tâm của cụm j.
 
 <br>
 
 **16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
 
-&#10230;
+&#10230; Thuật toán - Sau khi khởi tạo ngẫu nhiên các tâm của cụm (centroids) μ1,μ2,...,μk∈Rn, thuật toán k-means lặp lại bước sau cho đến khi hội tụ:
 
 <br>
 

From cc285264ebd7be61dc60ea483c746c38651c9c42 Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Tue, 12 Nov 2019 22:45:35 +0900
Subject: [PATCH 461/531] vi translate for unsupervised learning

---
 vi/cs-229-unsupervised-learning.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/vi/cs-229-unsupervised-learning.md b/vi/cs-229-unsupervised-learning.md
index 8b7cedb58..7e4f08041 100644
--- a/vi/cs-229-unsupervised-learning.md
+++ b/vi/cs-229-unsupervised-learning.md
@@ -100,31 +100,31 @@
 
 **17. [Means initialization, Cluster assignment, Means update, Convergence]**
 
-&#10230;
+&#10230; [Khởi tạo giá trị trung bình, Gán cụm, Cập nhật giá trị trung bình, Hội tụ]
 
 <br>
 
 **18. Distortion function ― In order to see if the algorithm converges, we look at the distortion function defined as follows:**
 
-&#10230;
+&#10230; Hàm Distortion - Để nhận biết khi nào thuật toán hội tụ, chúng ta sẽ xem xét hàm distortion được định nghĩa như sau: 
 
 <br>
 
 **19. Hierarchical clustering**
 
-&#10230;
+&#10230; Hierarchical clustering
 
 <br>
 
 **20. Algorithm ― It is a clustering algorithm with an agglomerative hierarchical approach that build nested clusters in a successive manner.**
 
-&#10230;
+&#10230; Thuật toán - Là một thuật toán phân cụm với cách tiếp cận phân cấp kết tập, cách tiếp cận này sẽ xây dựng các cụm lồng nhau theo một quy tắc nối tiếp.
 
 <br>
 
 **21. Types ― There are different sorts of hierarchical clustering algorithms that aims at optimizing different objective functions, which is summed up in the table below:**
 
-&#10230;
+&#10230; Các loại - Các loại thuật toán hierarchical clustering khác nhau với mục tiêu là tối ưu hoá các hàm đối tượng khác nhau sẽ được tổng kết trong bảng dưới đây:
 
 <br>
 

From ba0580783bfabc394070d0b4fb7d256b276a7279 Mon Sep 17 00:00:00 2001
From: Nerd <43769314+tt-anh-eole@users.noreply.github.com>
Date: Wed, 13 Nov 2019 18:12:36 +0900
Subject: [PATCH 462/531] vi translate unsupervised learning

---
 vi/cs-229-unsupervised-learning.md | 72 +++++++++++++++---------------
 1 file changed, 36 insertions(+), 36 deletions(-)

diff --git a/vi/cs-229-unsupervised-learning.md b/vi/cs-229-unsupervised-learning.md
index 7e4f08041..e12309283 100644
--- a/vi/cs-229-unsupervised-learning.md
+++ b/vi/cs-229-unsupervised-learning.md
@@ -130,215 +130,215 @@
 
 **22. [Ward linkage, Average linkage, Complete linkage]**
 
-&#10230;
+&#10230; [Liên kết Ward, Liên kết trung bình, Liên kết hoàn chỉnh]
 
 <br>
 
 **23. [Minimize within cluster distance, Minimize average distance between cluster pairs, Minimize maximum distance of between cluster pairs]**
 
-&#10230;
+&#10230; [Tối thiểu hoá trong phạm vi khoảng cách của một cụm, Tối thiểu hoá khoảng cách trung bình giữa các cặp cụm, Tối thiểu hoá khoảng cách tối đa giữa các cặp cụm]
 
 <br>
 
 **24. Clustering assessment metrics**
 
-&#10230;
+&#10230; Các số liệu đánh giá phân cụm
 
 <br>
 
 **25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
 
-&#10230;
+&#10230; Trong quá trình thiết lập học không giám sát, khá khó khăn để đánh giá hiệu năng của một mô hình vì chúng ta không có các nhãn đủ tin cậy như trong trường hợp của học có giám sát.
 
 <br>
 
 **26. Silhouette coefficient ― By noting a and b the mean distance between a sample and all other points in the same class, and between a sample and all other points in the next nearest cluster, the silhouette coefficient s for a single sample is defined as follows:**
 
-&#10230;
+&#10230; Hệ số Silhouette - Bằng việc kí hiệu a và b là khoảng cách trung bình giữa một điểm mẫu với các điểm khác trong cùng một lớp, và giữa một điểm mẫu với các điểm khác thuộc cụm kế cận gần nhất, hệ số silhouette s đối với một điểm mẫu đơn được định nghĩa như sau:
 
 <br>
 
 **27. Calinski-Harabaz index ― By noting k the number of clusters, Bk and Wk the between and within-clustering dispersion matrices respectively defined as**
 
-&#10230;
+&#10230; Chỉ số Calinski-Harabaz - Bằng việc kí hiệu k là số cụm, các chỉ số Bk và Wk về độ phân tán giữa và trong một cụm lần lượt được định nghĩa như là
 
 <br>
 
 **28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
 
-&#10230;
+&#10230; Chỉ số Calinski-Harabaz s(k) cho biết khả năng phân cụm tốt đến đâu của một mô hình phân cụm, như là với score cao hơn, sẽ kém hơn và việc phân cụm tốt hơn. Nó được định nghĩa như sau:
 
 <br>
 
 **29. Dimension reduction**
 
-&#10230;
+&#10230; Giảm số chiều dữ liệu
 
 <br>
 
 **30. Principal component analysis**
 
-&#10230;
+&#10230; Principal component analysis
 
 <br>
 
 **31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
 
-&#10230;
+&#10230; Là một kĩ thuật giảm số chiều dữ liệu, kĩ thuật này sẽ tìm các hướng tối đa hoá phương sai để chiếu dữ liệu trên đó. 
 
 <br>
 
 **32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
-&#10230;
+&#10230; Giá trị riêng, vector riêng - Cho ma trận A∈Rn×n, λ là giá trị riêng của A nếu tồn tại một vector z∈Rn∖{0}, gọi là vector riêng, mà ta có như sau:
 
 <br>
 
 **33. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230;
+&#10230; Định lý Spectral - Với A∈Rn×n. Nếu A đối xứng thì A có thể chéo hoá bởi một ma trận trực giao U∈Rn×n. Bằng việc kí hiệu Λ=diag(λ1,...,λn), ta có:
 
 <br>
 
 **34. diagonal**
 
-&#10230;
+&#10230; đường chéo
 
 <br>
 
 **35. Remark: the eigenvector associated with the largest eigenvalue is called principal eigenvector of matrix A.**
 
-&#10230;
+&#10230; Chú thích: vector riêng tương ứng với giá trị riêng lớn nhất được gọi là vector riêng chính của ma trận A.
 
 <br>
 
 **36. Algorithm ― The Principal Component Analysis (PCA) procedure is a dimension reduction technique that projects the data on k
 dimensions by maximizing the variance of the data as follows:**
 
-&#10230;
+&#10230; Thuật toán - Principal Component Analysis (PCA) là một kĩ thuật giảm số chiều dữ liệu, nó sẽ chiếu dữ liệu lên k chiều bằng cách tối đa hoá phương sai của dữ liệu như sau:
 
 <br>
 
 **37. Step 1: Normalize the data to have a mean of 0 and standard deviation of 1.**
 
-&#10230;
+&#10230; Bước 1: Chuẩn hoá dữ liệu để có giá trị trung bình bằng 0 và độ lệch chuẩn bằng 1.
 
 <br>
 
 **38. Step 2: Compute Σ=1mm∑i=1x(i)x(i)T∈Rn×n, which is symmetric with real eigenvalues.**
 
-&#10230;
+&#10230; Bước 2: Tính Σ=1mm∑i=1x(i)x(i)T∈Rn×n, là đối xứng với các giá trị riêng thực.
 
 <br>
 
 **39. Step 3: Compute u1,...,uk∈Rn the k orthogonal principal eigenvectors of Σ, i.e. the orthogonal eigenvectors of the k largest eigenvalues.**
 
-&#10230;
+&#10230; Bước 3: Tính u1,...,uk∈Rn là k vector riêng trực giao của Σ, tức các vector trực giao riêng của k giá trị riêng lớn nhất.
 
 <br>
 
 **40. Step 4: Project the data on spanR(u1,...,uk).**
 
-&#10230;
+&#10230; Bước 4: Chiếu dữ liệu lên spanR(u1,...,uk).
 
 <br>
 
 **41. This procedure maximizes the variance among all k-dimensional spaces.**
 
-&#10230;
+&#10230; Thủ tục này tối đa hoá phương sai giữa các không gian k-chiều.
 
 <br>
 
 **42. [Data in feature space, Find principal components, Data in principal components space]**
 
-&#10230;
+&#10230; [Dữ liệu trong không gian đặc trưng, Tìm các thành phần chính, Dữ liệu trong không gian các thành phần chính]
 
 <br>
 
 **43. Independent component analysis**
 
-&#10230;
+&#10230; Independent component analysis
 
 <br>
 
 **44. It is a technique meant to find the underlying generating sources.**
 
-&#10230;
+&#10230; Là một kĩ thuật tìm các nguồn tạo cơ bản.
 
 <br>
 
 **45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
 
-&#10230;
+&#10230; Giả định - Chúng ta giả sử rằng dữ liệu x của chúng ta được tạo ra bởi vector nguồn n-chiều s=(s1,...,sn), với si là các biến ngẫu nhiên độc lập, thông qua một ma trận mixing và non-singular A như sau:
 
 <br>
 
 **46. The goal is to find the unmixing matrix W=A−1.**
 
-&#10230;
+&#10230; Mục tiêu là tìm ma trận unmixing W=A−1.
 
 <br>
 
 **47. Bell and Sejnowski ICA algorithm ― This algorithm finds the unmixing matrix W by following the steps below:**
 
-&#10230;
+&#10230; Giải thuật Bell và Sejnowski ICA - Giải thuật này tìm ma trận unmixing W bằng các bước dưới đây:
 
 <br>
 
 **48. Write the probability of x=As=W−1s as:**
 
-&#10230;
+&#10230; Ghi xác suất của x=As=W−1s như là:
 
 <br>
 
 **49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
 
-&#10230;
+&#10230; Ghi log likelihood cho dữ liệu huấn luyện {x(i),i∈[[1,m]]} của chúng ta và bằng cách kí hiệu g là hàm sigmoid như là:
 
 <br>
 
 **50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
 
-&#10230;
+&#10230; Vì thế, quy tắc học của stochastic gradient ascent là cho mỗi ví dụ huấn luyện x(i), chúng ta cập nhật W như sau:
 
 <br>
 
 **51. The Machine Learning cheatsheets are now available in [target language].**
 
-&#10230;
+&#10230; Machine Learning cheatsheets hiện đã có bản [tiếng Việt].
 
 <br>
 
 **52. Original authors**
 
-&#10230;
+&#10230; Các tác giả
 
 <br>
 
 **53. Translated by X, Y and Z**
 
-&#10230;
+&#10230; Được dịch bởi X, Y và Z
 
 <br>
 
 **54. Reviewed by X, Y and Z**
 
-&#10230;
+&#10230; Được review bởi X, Y và Z
 
 <br>
 
 **55. [Introduction, Motivation, Jensen's inequality]**
 
-&#10230;
+&#10230; [Giới thiệu, Động lực, Bất đẳng thức Jensen]
 
 <br>
 
 **56. [Clustering, Expectation-Maximization, k-means, Hierarchical clustering, Metrics]**
 
-&#10230;
+&#10230; [Phân cụm, Tối đa hoá kì vọng, k-means, Hierarchical clustering, Các chỉ số]
 
 <br>
 
 **57. [Dimension reduction, PCA, ICA]**
 
-&#10230;
+&#10230; [Giảm số chiều dữ liệu, PCA, ICA]

From da08f29db9c97b4b0f16f477ee71c993b2c0a6ab Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Fri, 15 Nov 2019 00:29:00 -0800
Subject: [PATCH 463/531] Update [vi] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 8b6e3a0f7..80baf9c4f 100644
--- a/README.md
+++ b/README.md
@@ -73,7 +73,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|done|done|
 |**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
-|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/177)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
+|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/199)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
 |**简体中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 |**繁體中文**|done|done|done|done|done|done|
 

From 0e355cf19c6a80dc4d86aa8b81122904388e6a4e Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Fri, 15 Nov 2019 22:32:13 +0900
Subject: [PATCH 464/531] vi translate for rnn

---
 vi/cs-230-recurrent-neural-networks.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/vi/cs-230-recurrent-neural-networks.md b/vi/cs-230-recurrent-neural-networks.md
index 1de42c45d..4bc42fc00 100644
--- a/vi/cs-230-recurrent-neural-networks.md
+++ b/vi/cs-230-recurrent-neural-networks.md
@@ -74,14 +74,14 @@
 
 **11. Architecture of a traditional RNN ― Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. They are typically as follows:**
 
-&#10230; Kiến trúc của một mạng RNN truyền thống - Các mạng neural hồi quy, còn được biến đến như là RNNs, là một lớp của mạng neural cho phép đầu ra của tầng trước được sử dụng như đầu vào của tầng kế tiếp khi có các trạng thái ẩn. Thông thường là như sau:
+&#10230; Kiến trúc của một mạng RNN truyền thống - Các mạng neural hồi quy, còn được biến đến như là RNNs, là một lớp của mạng neural cho phép đầu ra được sử dụng như đầu vào trong khi có các trạng thái ẩn. Thông thường là như sau:
 
 <br>
 
 
 **12. For each timestep t, the activation a<t> and the output y<t> are expressed as follows:**
 
-&#10230; Tại mỗi bước t, hàm activation a<t> và đầu ra y<t> được biểu diễn như sau:
+&#10230; Tại mỗi bước t, giá trị kích hoạt a<t> và đầu ra y<t> được biểu diễn như sau:
 
 <br>
 
@@ -284,7 +284,7 @@
 
 **41. Learning word representation**
 
-&#10230; Học thể hiện từ
+&#10230; Học từ đại diện
 
 <br>
 
@@ -305,7 +305,7 @@
 
 **44. Representation techniques ― The two main ways of representing words are summed up in the table below:**
 
-&#10230; Các kĩ thuật biểu diễn - Có hai cách chính của biểu diễn các từ được tổng kết ở bảng bên dưới:
+&#10230; Các kĩ thuật biểu diễn - Có hai cách chính để biểu diễn từ được tổng kết ở bảng bên dưới:
 
 <br>
 
@@ -558,7 +558,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **79. [Case, Root cause, Remedies]**
 
-&#10230; [Trường hợp, Nguyên nhân xâu xa, Biện pháp khắc phục]
+&#10230; [Trường hợp, Nguyên nhân sâu xa, Biện pháp khắc phục]
 
 <br>
 
@@ -600,7 +600,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
 
-&#10230; Attention model - Mô hình này cho phép một RNN chú ý lên các phần cụ thể của đầu vào được xem xét là quan trọng, nó giúp cải thiện hiệu năng của mô hình kết quả trong thực tế. Bằng việc kí hiệu α<t,t′> là mức độ chú ý mà đầu ra y<t> nên có đối với hàm kích hoạt a<t′> và c<t> là ngữ cảnh ở thời điểm t, chúng ta có:
+&#10230; Attention model - Mô hình này cho phép một RNN tập trung lên các phần cụ thể của đầu vào được xem xét là quan trọng, nó giúp cải thiện hiệu năng của mô hình kết quả trong thực tế. Bằng việc kí hiệu α<t,t′> là mức độ chú ý mà đầu ra y<t> nên có đối với hàm kích hoạt a<t′> và c<t> là ngữ cảnh ở thời điểm t, chúng ta có:
 
 <br>
 

From 24cb255c829c2229c8114de1a762207ec726468d Mon Sep 17 00:00:00 2001
From: tuananhhedspibk <tuananhhedspibk1@gmail.com>
Date: Wed, 20 Nov 2019 22:28:06 +0900
Subject: [PATCH 465/531] vi translate for rnn

---
 vi/cs-230-recurrent-neural-networks.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/vi/cs-230-recurrent-neural-networks.md b/vi/cs-230-recurrent-neural-networks.md
index 4bc42fc00..91316df26 100644
--- a/vi/cs-230-recurrent-neural-networks.md
+++ b/vi/cs-230-recurrent-neural-networks.md
@@ -102,7 +102,7 @@
 
 **15. The pros and cons of a typical RNN architecture are summed up in the table below:**
 
-&#10230; pros và cons của một kiến trúc RNN thông thường được tổng kết ở bảng dưới đây:
+&#10230; Ưu và nhược điểm của một kiến trúc RNN thông thường được tổng kết ở bảng dưới đây:
 
 <br>
 
@@ -263,7 +263,7 @@
 
 **38. Remark: the sign ⋆ denotes the element-wise multiplication between two vectors.**
 
-&#10230; Chú ý: kí hiệu ⋆ chỉ phép nhân nguyên tố giữa hai vectors.
+&#10230; Chú ý: kí hiệu ⋆ chỉ phép nhân từng phần tử với nhau giữa hai vectors.
 
 <br>
 
@@ -403,7 +403,7 @@
 
 **57bis. GloVe ― The GloVe model, short for global vectors for word representation, is a word embedding technique that uses a co-occurence matrix X where each Xi,j denotes the number of times that a target i occurred with a context j. Its cost function J is as follows:**
 
-&#10230; GloVe - Mô hình GloVe, viết tắt của global vectors for word representation, nó là một kĩ thuật word embedding sử dụng ma trận đồng thời X với mỗi Xi,j là số lần mà target i xảy ra tại ngữ cảnh j. Cost function J của nó như sau:
+&#10230; GloVe - Mô hình GloVe, viết tắt của global vectors for word representation, nó là một kĩ thuật word embedding sử dụng ma trận đồng xuất hiện X với mỗi Xi,j là số lần mà từ đích (target) i xuất hiện tại ngữ cảnh j. Cost function J của nó như sau:
 
 <br>
 
@@ -600,7 +600,7 @@ Given the symmetry that e and θ play in this model, the final word embedding e(
 
 **85. Attention model ― This model allows an RNN to pay attention to specific parts of the input that is considered as being important, which improves the performance of the resulting model in practice. By noting α<t,t′> the amount of attention that the output y<t> should pay to the activation a<t′> and c<t> the context at time t, we have:**
 
-&#10230; Attention model - Mô hình này cho phép một RNN tập trung lên các phần cụ thể của đầu vào được xem xét là quan trọng, nó giúp cải thiện hiệu năng của mô hình kết quả trong thực tế. Bằng việc kí hiệu α<t,t′> là mức độ chú ý mà đầu ra y<t> nên có đối với hàm kích hoạt a<t′> và c<t> là ngữ cảnh ở thời điểm t, chúng ta có:
+&#10230; Attention model - Mô hình này cho phép một RNN tập trung vào các phần cụ thể của đầu vào được xem xét là quan trọng, nó giúp cải thiện hiệu năng của mô hình kết quả trong thực tế. Bằng việc kí hiệu α<t,t′> là mức độ chú ý mà đầu ra y<t> nên có đối với hàm kích hoạt a<t′> và c<t> là ngữ cảnh ở thời điểm t, chúng ta có:
 
 <br>
 

From 7a357edb94dfed69cafd40d2d67b6a1c3afb8266 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Thu, 21 Nov 2019 21:53:07 -0800
Subject: [PATCH 466/531] Add progress [vi]

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 80baf9c4f..39bed2883 100644
--- a/README.md
+++ b/README.md
@@ -99,7 +99,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Русский**|not started|not started|not started|
 |**Türkçe**|done|done|done|
 |**Українська**|not started|not started|not started|
-|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/180)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
+|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/180)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/184)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
 |**简体中文**|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/181)|not started|
 |**繁體中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/196)|not started|not started|
 

From 2477552057753361657165064b4233565543e211 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Fri, 10 Jan 2020 23:28:45 -0800
Subject: [PATCH 467/531] Add fa progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 39bed2883..efdbe8cb1 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |:---|:---:|:---:|:---:|:---:|
 |**Deutsch**|not started|not started|not started|not started|
 |**Español**|not started|not started|not started|not started|
-|**فارسی**|not started|not started|not started|not started|
+|**فارسی**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/200)|not started|not started|not started|
 |**Français**|done|done|done|done|
 |**עִבְרִית**|not started|not started|not started|not started|
 |**Italiano**|not started|not started|not started|not started|

From 55120622acda2a4ce9269416a9c2dd31331dc198 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 26 Jan 2020 17:27:30 -0800
Subject: [PATCH 468/531] Delete .DS_Store

---
 .DS_Store | Bin 6148 -> 0 bytes
 1 file changed, 0 insertions(+), 0 deletions(-)
 delete mode 100644 .DS_Store

diff --git a/.DS_Store b/.DS_Store
deleted file mode 100644
index 5008ddfcf53c02e82d7eee2e57c38e5672ef89f6..0000000000000000000000000000000000000000
GIT binary patch
literal 0
HcmV?d00001

literal 6148
zcmeH~Jr2S!425mzP>H1@V-^m;4Wg<&0T*E43hX&L&p$$qDprKhvt+--jT7}7np#A3
zem<@ulZcFPQ@L2!n>{z**<q8>++&mCkOWA81W14cNZ<zv;LbK1Poaz?KmsK2CSc!(
z0ynLxE!0092;Krf2c+FF_Fe*7ECH>lEfg7;MkzE(HCqgga^y>{tEnwC%0;vJ&^%eQ
zLs35+`xjp>T0<F0fCPF1$Cyrb|F7^5{eNG?83~ZUUlGt@xh*qZDeu<Z%US-OSsOPv
j)R!Z4KLME7ReXlK;d!wEw5GODWMKRea10D2@KpjYNUI8I


From 9d531ba3fe9a6195e4adb1cd6203d55f2f459364 Mon Sep 17 00:00:00 2001
From: Taichi Kato <me@taichikato.com>
Date: Thu, 6 Feb 2020 14:51:15 +0800
Subject: [PATCH 469/531] Apply suggestions from code review
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

レビュー確認させていただきました。妥当なSuggestionで勉強になりました、ありがとうございます。

MLTの皆様、一昨年出来心でGithubの方でContributeした幼稚な翻訳ですが丁寧なレビューありがとうございます。こちらの大変遅い対応で日本語版公開の遅れに寄与してしまったことをお詫び申し上げます。

Co-Authored-By: for_tokyo <42432573+for-tokyo@users.noreply.github.com>
Co-Authored-By: Yoshiyuki Nakai 中井喜之 <8402782+yoshiyukinakai@users.noreply.github.com>
---
 ja/cheatsheet-deep-learning.md | 62 +++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/ja/cheatsheet-deep-learning.md b/ja/cheatsheet-deep-learning.md
index 322a9f2cc..50557a63f 100644
--- a/ja/cheatsheet-deep-learning.md
+++ b/ja/cheatsheet-deep-learning.md
@@ -12,13 +12,13 @@
 
 **3. Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.**
 
-&#10230; ニューラルネットワークとは複数の層を用いて組まれる数学モデルです。代表的なネットワークとして畳み込みと再帰型ニューラルネットワークが挙げられます。
+&#10230; ニューラルネットワークとは複数の層を用いて構成されたモデルの種類を指します。一般的に利用されるネットワークとして畳み込みニューラルネットワークとリカレントニューラルネットワークが挙げられます。
 
 <br>
 
 **4. Architecture ― The vocabulary around neural networks architectures is described in the figure below:**
 
-&#10230; 構造　－ ニューラルネットワークを組む上で重要な用語は以下の図により説明されます：
+&#10230; アーキテクチャ　－ ニューラルネットワークに関する用語は以下の図により説明されます：
 
 <br>
 
@@ -36,49 +36,49 @@
 
 **7. where we note w, b, z the weight, bias and output respectively.**
 
-&#10230; この場合重み付けをw、バイアス項をb、出力をzとします。
+&#10230; この場合重みをw、バイアス項をb、出力をzとします。
 
 <br>
 
 **8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
 
-&#10230; 活性化関数　ー ユニットの出力に非線形性を与える関数を活性化関数といいます。一般的には以下の関数がよく使われます：
+&#10230; 活性化関数　ー 活性化関数はモデルに非線形性を与えるために隠れユニットの最後で利用されます。一般的には以下の関数がよく使われます：
 
 <br>
 
 **9. [Sigmoid, Tanh, ReLU, Leaky ReLU]**
 
-&#10230; [Sigmoid(シグモイド関数), Tanh(双曲線関数), ReLU(ランプ関数), Leaky ReLU]
+&#10230; [Sigmoid(シグモイド関数), Tanh(双曲線関数), ReLU(正規化線形ユニット), Leaky ReLU(漏洩ReLU)]
 
 <br>
 
 **10. Cross-entropy loss ― In the context of neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
 
-&#10230; 交差エントロピーロス　ー ニューラルネットにおいて交差エントロピーロスL(z,y)は頻繁に使われ、以下のように定義されています：
+&#10230; 交差エントロピーロス　ー ニューラルネットにおいて交差エントロピーロスL(z,y)は一般的に使われ、以下のように定義されています：
 
 <br>
 
 **11. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. This can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
 
-&#10230; 学習率　ー αやηで表される学習率は勾配法による重み付けのアップデートをする速度を表します。学習率は固定または適応的に変更することができます。現在一般的に使われている学習法はAdam（アダム）であり、学習率を適用させる方法です。
+&#10230; 学習率　ー 学習率は多くの場合α、しばしばηで表記され、勾配法による重み付けのアップデートをする速度を表します。学習率は固定または適応的に変更することができます。現在一般的に使われている学習法はAdam（アダム）であり、学習率を適用的に変更する方法です。
 
 <br>
 
 **12. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight w is computed using chain rule and is of the following form:**
 
-&#10230; 誤差逆伝播法（backpropagation）ー 誤差逆伝播法はニューラルネットの期待される出力値と実際の出力の差異を考慮し重み付けのアップデートをする方法の一つです。重みwに関する導関数は連鎖規則を使用して計算され、次の形式で表される：
+&#10230; 誤差逆伝播法（backpropagation）ー 誤差逆伝播法はニューラルネットにおいて実際の出力と期待される出力との差異を考慮して重みを更新する方法の一つです。重みwに関する導関数は連鎖律を使用して計算され、次の形式で表されます：
 
 <br>
 
 **13. As a result, the weight is updated as follows:**
 
-&#10230; 結果、重みは以下のようにアップデートされます：
+&#10230; 結果として、重みは以下のように更新されます：
 
 <br>
 
 **14. Updating weights ― In a neural network, weights are updated as follows:**
 
-&#10230; 重みアップデート　ー ニューラルネットでは以下のように重みがアップデートされます：
+&#10230; 重みの更新 ー ニューラルネットでは以下のように重みが更新されます：
 
 <br>
 
@@ -90,25 +90,25 @@
 
 **16. Step 2: Perform forward propagation to obtain the corresponding loss.**
 
-&#10230; ステップ２：　フォワードプロパゲーションを行い誤差を求める。
+&#10230; ステップ２：　順伝播を行いそれに対する誤差を求める。
 
 <br>
 
 **17. Step 3: Backpropagate the loss to get the gradients.**
 
-&#10230; 求められた誤差を用い、傾斜を計算する。
+&#10230; 誤差を逆伝播し、勾配を計算する。
 
 <br>
 
 **18. Step 4: Use the gradients to update the weights of the network.**
 
-&#10230; 傾斜を使い誤差が小さくなるように重みを調整する。
+&#10230; 勾配を使いネットワークの重みを更新する。
 
 <br>
 
 **19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
 
-&#10230;ドロップアウト　ー ドロップアウトはニューラルネット内の一部のユニットを非活性化させることにより過学習を防ぐテクニックである。実際には、ニューロンはある確率pで非活性、1-pの確率で活性化されるようになってる。
+&#10230;ドロップアウト　ー ドロップアウトはニューラルネット内の一部のユニットを無効にすることで学習データへの過学習を防ぐテクニックです。実際には、ニューロンは確率pで無効、確率1-pで有効のどちらかになります。
 
 <br>
 
@@ -120,13 +120,13 @@
 
 **21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
 
-&#10230; 畳み込みレイヤーの条件　ー Wを入力サイズ、Fを畳み込みレイヤーニューロンのサイズ、Pをゼロパディングの量とすると、与えられた体積に収まるニューロン数Nは次のようになります。
+&#10230; 畳み込みレイヤーの条件　ー Wを入力サイズ、Fを畳み込み層のニューロンのサイズ、Pをゼロ埋めの量とすると、これらに対応するニューロンの数Nは次のようになります。
 
 <br>
 
 **22. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
 
-&#10230; バッチ正規化　ー バッチ{xi}を正規化するハイパーパラメータγ、βのステップです。バッチに修正したい平均値と分散値をμB,σ2Bとすると、正規化は以下のように行われます:
+&#10230; バッチ正規化　ー バッチ{xi}を正規化するハイパーパラメータγ、βのステップです。補正を行うバッチの平均と分散をμB,σ2Bとすると、正規化は以下のように行われます:
 
 <br>
 
@@ -138,7 +138,7 @@
 
 **24. Recurrent Neural Networks**
 
-&#10230; 再帰型ニューラルネットワーク (RNN)
+&#10230; リカレントニューラルネットワーク (RNN)
 
 <br>
 
@@ -168,7 +168,7 @@
 
 **29. Reinforcement Learning and Control**
 
-&#10230; 強化学習とコントロール
+&#10230; 強化学習と制御
 
 <br>
 
@@ -192,13 +192,13 @@
 
 **33. S is the set of states**
 
-&#10230; Sは状態の有限集合
+&#10230; Sは状態の集合
 
 <br>
 
 **34. A is the set of actions**
 
-&#10230; Aは行動の有限集合
+&#10230; Aは行動の集合
 
 <br>
 
@@ -210,7 +210,7 @@
 
 **36. γ∈[0,1[ is the discount factor**
 
-&#10230; γ∈[0,1[は割引因子と呼ばれる値
+&#10230; γ∈[0,1[は割引因子
 
 <br>
 
@@ -222,31 +222,31 @@
 
 **38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
 
-&#10230; 政策 - 政策πは状態と行動を写像する関数π:S⟶A
+&#10230; 方策 - 方策πは状態を行動に写像する関数π:S⟶A
 
 <br>
 
 **39. Remark: we say that we execute a given policy π if given a state s we take the action a=π(s).**
 
-&#10230; 備考: 状態sを与えられた際に行動a=π(s)を行うことを政策πを実行すると言う。
+&#10230; 備考: 状態sを与えられた際に行動a=π(s)を行うことを方策πを実行すると言います。
 
 <br>
 
 **40. Value function ― For a given policy π and a given state s, we define the value function Vπ as follows:**
 
-&#10230; 価値関数 - ある政策πとある状態sにおいて価値関数Vπを以下のように定義する：
+&#10230; 価値関数 - ある方策πとある状態sにおける価値関数Vπを以下のように定義します：
 
 <br>
 
 **41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
 
-&#10230; ベルマン方程式 - 政策πをとった価値関数Vπ∗に対する最適なベルマン方程式：
+&#10230; ベルマン方程式 - 最適ベルマン方程式は最適方策π∗の価値関数Vπ∗で記述されます：
 
 <br>
 
 **42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
 
-&#10230; 備考: 与えられた状態sに対する最適方針π*はこのようになります：
+&#10230; 備考: 与えられた状態sに対する最適方策π*はこのようになります：
 
 <br>
 
@@ -258,7 +258,7 @@
 
 **44. 1) We initialize the value:**
 
-&#10230; 1) 値を初期化する。
+&#10230; 1) 値を初期化する：
 
 <br>
 
@@ -288,7 +288,7 @@
 
 **49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
 
-&#10230;  Q学習　ー Q学習は数学モデルを使わないQ値の評価手法であり、以下のように行われる：
+&#10230;  Q学習 ― Q学習はモデルフリーのQ値の推定であり、以下のように行われます：
 
 <br>
 
@@ -306,16 +306,16 @@
 
 **52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
 
-&#10230; [畳み込みニューラルネットワーク, 畳み込み層, バッチノーマライゼーション]
+&#10230; [畳み込みニューラルネットワーク, 畳み込み層, バッチ正規化]
 
 <br>
 
 **53. [Recurrent Neural Networks, Gates, LSTM]**
 
-&#10230; [再帰型ニューラルネットワーク, ゲート, LSTM]
+&#10230; [リカレントニューラルネットワーク, ゲート, LSTM]
 
 <br>
 
 **54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
 
-&#10230; [強化学習, マルコフ決定過程, バリュー/ポリシー反復, 近似動的計画法, ポリシーサーチ]
+&#10230; [強化学習, マルコフ決定過程, 価値/方策反復, 近似動的計画法, 方策探索]

From 7b16f911e402d2ea9c29baeebac2a1881d858d26 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 9 Feb 2020 00:02:50 -0800
Subject: [PATCH 470/531] Update ja progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index efdbe8cb1..b97bd9fb6 100644
--- a/README.md
+++ b/README.md
@@ -66,7 +66,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
 |**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
+|**日本語**|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
 |**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|
 |**Português**|done|done|done|done|done|done|

From 8dea07b7b33e911c5d3710893603901c2dbb2bc9 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 9 Feb 2020 00:10:32 -0800
Subject: [PATCH 471/531] Add ja contributors

---
 CONTRIBUTORS | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index b9bc05925..eff44e7e5 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -109,6 +109,11 @@
   Yoshiyuki Nakai (review of convolutional neural networks)
   Linh Dang (review of convolutional neural networks)
   
+  Taichi Kato (translation of deep learning)
+  Dan Lillrank (review of deep learning)
+  Yoshiyuki Nakai (review of deep learning)
+  Yuki Tokyo (review of deep learning)
+  
   Kamuela Lau (translation of deep learning tips and tricks)
   Yoshiyuki Nakai (review of deep learning tips and tricks)
   Hiroki Mori (review of deep learning tips and tricks)

From 9cc2607ec3b2e38d2e5a14a00cdc15cb36c05f1f Mon Sep 17 00:00:00 2001
From: aepiotti <alessandroemanuelepiotti@gmail.com>
Date: Wed, 19 Feb 2020 10:03:51 +0100
Subject: [PATCH 472/531] Initial commit [it] cs-229-linear-algebra.md

---
 it/cs-229-linear-algebra.md | 343 ++++++++++++++++++++++++++++++++++++
 1 file changed, 343 insertions(+)
 create mode 100644 it/cs-229-linear-algebra.md

diff --git a/it/cs-229-linear-algebra.md b/it/cs-229-linear-algebra.md
new file mode 100644
index 000000000..dced85397
--- /dev/null
+++ b/it/cs-229-linear-algebra.md
@@ -0,0 +1,343 @@
+**Linear Algebra and Calculus translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus)
+
+<br>
+
+**1. Linear Algebra and Calculus refresher**
+
+&#10230;
+
+<br>
+
+**2. General notations**
+
+&#10230;
+
+<br>
+
+**3. Definitions**
+
+&#10230;
+
+<br>
+
+**4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
+
+&#10230;
+
+<br>
+
+**5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
+
+&#10230;
+
+<br>
+
+**6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
+
+&#10230;
+
+<br>
+
+**7. Main matrices**
+
+&#10230;
+
+<br>
+
+**8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
+
+&#10230;
+
+<br>
+
+**9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
+
+&#10230;
+
+<br>
+
+**10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
+
+&#10230;
+
+<br>
+
+**11. Remark: we also note D as diag(d1,...,dn).**
+
+&#10230;
+
+<br>
+
+**12. Matrix operations**
+
+&#10230;
+
+<br>
+
+**13. Multiplication**
+
+&#10230;
+
+<br>
+
+**14. Vector-vector ― There are two types of vector-vector products:**
+
+&#10230;
+
+<br>
+
+**15. inner product: for x,y∈Rn, we have:**
+
+&#10230;
+
+<br>
+
+**16. outer product: for x∈Rm,y∈Rn, we have:**
+
+&#10230;
+
+<br>
+
+**17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
+
+&#10230;
+
+<br>
+
+**18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
+
+&#10230;
+
+<br>
+
+**19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
+
+&#10230;
+
+<br>
+
+**20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
+
+&#10230;
+
+<br>
+
+**21. Other operations**
+
+&#10230;
+
+<br>
+
+**22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
+
+&#10230;
+
+<br>
+
+**23. Remark: for matrices A,B, we have (AB)T=BTAT**
+
+&#10230;
+
+<br>
+
+**24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
+
+&#10230;
+
+<br>
+
+**25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
+
+&#10230;
+
+<br>
+
+**26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
+
+&#10230;
+
+<br>
+
+**27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
+
+&#10230;
+
+<br>
+
+**28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
+
+&#10230;
+
+<br>
+
+**29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
+
+&#10230;
+
+<br>
+
+**30. Matrix properties**
+
+&#10230;
+
+<br>
+
+**31. Definitions**
+
+&#10230;
+
+<br>
+
+**32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
+
+&#10230;
+
+<br>
+
+**33. [Symmetric, Antisymmetric]**
+
+&#10230;
+
+<br>
+
+**34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
+
+&#10230;
+
+<br>
+
+**35. N(ax)=|a|N(x) for a scalar**
+
+&#10230;
+
+<br>
+
+**36. if N(x)=0, then x=0**
+
+&#10230;
+
+<br>
+
+**37. For x∈V, the most commonly used norms are summed up in the table below:**
+
+&#10230;
+
+<br>
+
+**38. [Norm, Notation, Definition, Use case]**
+
+&#10230;
+
+<br>
+
+**39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
+
+&#10230;
+
+<br>
+
+**40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
+
+&#10230;
+
+<br>
+
+**41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
+
+&#10230;
+
+<br>
+
+**42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
+
+&#10230;
+
+<br>
+
+**43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
+
+&#10230;
+
+<br>
+
+**44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
+
+&#10230;
+
+<br>
+
+**45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
+
+&#10230;
+
+<br>
+
+**46. diagonal**
+
+&#10230;
+
+<br>
+
+**47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
+
+&#10230;
+
+<br>
+
+**48. Matrix calculus**
+
+&#10230;
+
+<br>
+
+**49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
+
+&#10230;
+
+<br>
+
+**50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
+
+&#10230;
+
+<br>
+
+**51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
+
+&#10230;
+
+<br>
+
+**52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
+
+&#10230;
+
+<br>
+
+**53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
+
+&#10230;
+
+<br>
+
+**54. [General notations, Definitions, Main matrices]**
+
+&#10230;
+
+<br>
+
+**55. [Matrix operations, Multiplication, Other operations]**
+
+&#10230;
+
+<br>
+
+**56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
+
+&#10230;
+
+<br>
+
+**57. [Matrix calculus, Gradient, Hessian, Operations]**
+
+&#10230;

From eb54df21e179b5d6a826eeaec916ba867c58e2fc Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=86=20Piotti?=
 <46027345+aepiotti@users.noreply.github.com>
Date: Wed, 19 Feb 2020 10:32:09 +0100
Subject: [PATCH 473/531] Update cs-229-linear-algebra.md

up to #11
---
 it/cs-229-linear-algebra.md | 24 +++++++++++++-----------
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/it/cs-229-linear-algebra.md b/it/cs-229-linear-algebra.md
index dced85397..5130689cc 100644
--- a/it/cs-229-linear-algebra.md
+++ b/it/cs-229-linear-algebra.md
@@ -1,70 +1,72 @@
 **Linear Algebra and Calculus translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus)
 
+&#10230; Algebra lineare e Calcolo numerico
+
 <br>
 
 **1. Linear Algebra and Calculus refresher**
 
-&#10230;
+&#10230; Ripasso di Algebra lineare e Calcolo
 
 <br>
 
 **2. General notations**
 
-&#10230;
+&#10230; Notazione generale
 
 <br>
 
 **3. Definitions**
 
-&#10230;
+&#10230; Definizioni
 
 <br>
 
 **4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
 
-&#10230;
+&#10230; Vettore - Definiamo x∈Rn un vettore con n elementi, dove xi∈R è l'i-esimo elemento:
 
 <br>
 
 **5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
 
-&#10230;
+&#10230; Matrice - Definiamo A∈Rm×n una matriche con m righe e n colonne, dove Ai,j∈R è l'elemento posizionato alla i-esima riga e j-esima colonna:
 
 <br>
 
 **6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
 
-&#10230;
+&#10230; Osservazione: il vettore x, definito precedentemente, può essere visto come una matrice nx1 ed è chiamato, più particolarmente, un vettore colonna.
 
 <br>
 
 **7. Main matrices**
 
-&#10230;
+&#10230; Matrici principali
 
 <br>
 
 **8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
 
-&#10230;
+&#10230; matrice identità - La matrice identità I∈Rn×n è una matrice quadrata con tutti 1 sulla diagonale principale e 0 in tutte le altre posizioni:
 
 <br>
 
 **9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
 
-&#10230;
+&#10230; Osservazione: per tutte le matrici A∈Rn×n, si ha che A×I=I×A=A.
 
 <br>
 
 **10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
 
-&#10230;
+&#10230; matrice diagonale - Una matrice diagonale D∈Rn×n è una matrice quadrata con valori diversi da zero sulla diagonale principale e zero in tutte le altre posizioni:
 
 <br>
 
 **11. Remark: we also note D as diag(d1,...,dn).**
 
-&#10230;
+&#10230; Osservazione: definiamo, inoltre, D come diag(d1,...,dn)
 
 <br>
 

From ba8f508abeddb8d7de4b3bd8a844d64525886224 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=86=20Piotti?=
 <46027345+aepiotti@users.noreply.github.com>
Date: Wed, 19 Feb 2020 10:54:31 +0100
Subject: [PATCH 474/531] Update cs-229-linear-algebra.md

up to #25
---
 it/cs-229-linear-algebra.md | 40 ++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/it/cs-229-linear-algebra.md b/it/cs-229-linear-algebra.md
index 5130689cc..95cf6d14c 100644
--- a/it/cs-229-linear-algebra.md
+++ b/it/cs-229-linear-algebra.md
@@ -24,13 +24,13 @@
 
 **4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
 
-&#10230; Vettore - Definiamo x∈Rn un vettore con n elementi, dove xi∈R è l'i-esimo elemento:
+&#10230; Vettore ― Definiamo x∈Rn un vettore con n elementi, dove xi∈R è l'i-esimo elemento:
 
 <br>
 
 **5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
 
-&#10230; Matrice - Definiamo A∈Rm×n una matriche con m righe e n colonne, dove Ai,j∈R è l'elemento posizionato alla i-esima riga e j-esima colonna:
+&#10230; Matrice ― Definiamo A∈Rm×n una matriche con m righe e n colonne, dove Ai,j∈R è l'elemento posizionato alla i-esima riga e j-esima colonna:
 
 <br>
 
@@ -48,19 +48,19 @@
 
 **8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
 
-&#10230; matrice identità - La matrice identità I∈Rn×n è una matrice quadrata con tutti 1 sulla diagonale principale e 0 in tutte le altre posizioni:
+&#10230; matrice identità ― La matrice identità I∈Rn×n è una matrice quadrata con tutti 1 sulla diagonale principale e 0 in tutte le altre posizioni:
 
 <br>
 
 **9. Remark: for all matrices A∈Rn×n, we have A×I=I×A=A.**
 
-&#10230; Osservazione: per tutte le matrici A∈Rn×n, si ha che A×I=I×A=A.
+&#10230; Osservazione: per tutte le matrici A∈Rn×n, abbiamo che A×I=I×A=A.
 
 <br>
 
 **10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
 
-&#10230; matrice diagonale - Una matrice diagonale D∈Rn×n è una matrice quadrata con valori diversi da zero sulla diagonale principale e zero in tutte le altre posizioni:
+&#10230; matrice diagonale ― Una matrice diagonale D∈Rn×n è una matrice quadrata con valori diversi da zero sulla diagonale principale e zero in tutte le altre posizioni:
 
 <br>
 
@@ -72,91 +72,91 @@
 
 **12. Matrix operations**
 
-&#10230;
+&#10230; Operazioni sulle matrici
 
 <br>
 
 **13. Multiplication**
 
-&#10230;
+&#10230; Moltiplicazione
 
 <br>
 
 **14. Vector-vector ― There are two types of vector-vector products:**
 
-&#10230;
+&#10230; Vettore-vettore ― Ci sono due tipi di prodotto vettore-vettore:
 
 <br>
 
 **15. inner product: for x,y∈Rn, we have:**
 
-&#10230;
+&#10230; prodotto scalare: per ogni x,y∈Rn, abbiamo che:
 
 <br>
 
 **16. outer product: for x∈Rm,y∈Rn, we have:**
 
-&#10230;
+&#10230; prodotto vettoriale: per ogni x∈Rm,y∈Rn, abbiamo che:
 
 <br>
 
 **17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
 
-&#10230;
+&#10230; Matrice-vettore ― Il prodotto di una matrice A∈Rm×n ed un vettore x∈Rn, è un vettore di dimensione Rn, tale che:
 
 <br>
 
 **18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
 
-&#10230;
+&#10230; dove aTr,i sono i vettori riga, ac,j sono i vettori colonna di A e xi sono gli elementi di x.
 
 <br>
 
 **19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
 
-&#10230;
+&#10230; Matrice-matrice — Il prodotto di matrici A∈Rm×n e B∈Rn×p è una matriche di dimensione Rn×p, tale che:
 
 <br>
 
 **20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
 
-&#10230;
+&#10230; dove aTr,i,bTr,i sono i vettori riga e ac,j,bc,j sono i vettori colonna rispettivamente di A e di B
 
 <br>
 
 **21. Other operations**
 
-&#10230;
+&#10230; Altre operazioni
 
 <br>
 
 **22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
 
-&#10230;
+&#10230; Trasposta — La trasposta di una matrice A∈Rm×n, indicata con AT, è tale che i suoi elementi sono scambiati:
 
 <br>
 
 **23. Remark: for matrices A,B, we have (AB)T=BTAT**
 
-&#10230;
+&#10230; Osservazione: per le matrici A,B abbiamo che (AB)T=BTAT
 
 <br>
 
 **24. Inverse ― The inverse of an invertible square matrix A is noted A−1 and is the only matrix such that:**
 
-&#10230;
+&#10230; Inversa — L'inversa di una matrice quadrata invertibile A è indicata con A-1 ed è l'unica matrice tale che:
 
 <br>
 
 **25. Remark: not all square matrices are invertible. Also, for matrices A,B, we have (AB)−1=B−1A−1**
 
-&#10230;
+&#10230; Osservazione: non tutte le matrici quadrate sono invertibili. Inoltre, per le matrici A,B, abbiamo che (AB)−1=B−1A−1
 
 <br>
 
 **26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
 
-&#10230;
+&#10230; 
 
 <br>
 

From ff531b743eb9bc6eacbf10dddf7c94df1e01d974 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=86=20Piotti?=
 <46027345+aepiotti@users.noreply.github.com>
Date: Wed, 19 Feb 2020 11:15:20 +0100
Subject: [PATCH 475/531] Update cs-229-linear-algebra.md

up to #42
---
 it/cs-229-linear-algebra.md | 39 +++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 19 deletions(-)

diff --git a/it/cs-229-linear-algebra.md b/it/cs-229-linear-algebra.md
index 95cf6d14c..bbce49096 100644
--- a/it/cs-229-linear-algebra.md
+++ b/it/cs-229-linear-algebra.md
@@ -90,13 +90,13 @@
 
 **15. inner product: for x,y∈Rn, we have:**
 
-&#10230; prodotto scalare: per ogni x,y∈Rn, abbiamo che:
+&#10230; prodotto scalare: per x,y∈Rn, abbiamo che:
 
 <br>
 
 **16. outer product: for x∈Rm,y∈Rn, we have:**
 
-&#10230; prodotto vettoriale: per ogni x∈Rm,y∈Rn, abbiamo che:
+&#10230; prodotto vettoriale: per x∈Rm,y∈Rn, abbiamo che:
 
 <br>
 
@@ -156,103 +156,104 @@
 
 **26. Trace ― The trace of a square matrix A, noted tr(A), is the sum of its diagonal entries:**
 
-&#10230; 
+&#10230; Traccia — La traccia di una matrice quadrata A, indicata con tr(A), è la somma degli elementi sulla diagonale principale:
 
 <br>
 
 **27. Remark: for matrices A,B, we have tr(AT)=tr(A) and tr(AB)=tr(BA)**
 
-&#10230;
+&#10230; Osservazione: per le matrici A,C, abbiamo che tr(AT)=tr(A) e tr(AB)=tr(BA)
 
 <br>
 
 **28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
 
-&#10230;
+&#10230; Determinante — Il determinante di una matrice quadrata A∈Rn×n, indicata con |A| o det(A) è espresso ricorsivamente come A\i,\j, che è la matrice A senza l'i-esima riga e la j-esima colonna, come segue:
 
 <br>
 
 **29. Remark: A is invertible if and only if |A|≠0. Also, |AB|=|A||B| and |AT|=|A|.**
 
-&#10230;
+&#10230; Osservazione: A è invertibile se e solo se |A|≠0. Inoltre, |AB|=|A||B| e |AT|=|A|.
 
 <br>
 
 **30. Matrix properties**
 
-&#10230;
+&#10230; Proprietà delle matrici
 
 <br>
 
 **31. Definitions**
 
-&#10230;
+&#10230; Definizioni
 
 <br>
 
 **32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
 
-&#10230;
+&#10230; Scomposizione simmetrica — Una matrice A può essere espressa tramite la sua componente smmetricaed antisimmetrica come segue:
 
 <br>
 
 **33. [Symmetric, Antisymmetric]**
 
-&#10230;
+&#10230; [Simmetrica, Antisimmetrica]
 
 <br>
 
 **34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
 
-&#10230;
+&#10230; Norma — La norma è una funzione N:V⟶[0,+∞[ dove V è uno spazioe vettoriale, tale che per 
+x,y∈V, abbiamo che:
 
 <br>
 
 **35. N(ax)=|a|N(x) for a scalar**
 
-&#10230;
+&#10230; N(ax)=|a|N(x) per uno scalare
 
 <br>
 
 **36. if N(x)=0, then x=0**
 
-&#10230;
+&#10230; if N(x)=0, allora x=0
 
 <br>
 
 **37. For x∈V, the most commonly used norms are summed up in the table below:**
 
-&#10230;
+&#10230; Per x∈V, le norme più usate comunemente sono riassunte nella tabella seguente:
 
 <br>
 
 **38. [Norm, Notation, Definition, Use case]**
 
-&#10230;
+&#10230; [Norma, Notazione, Definizione, Caso d'uso]
 
 <br>
 
 **39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
 
-&#10230;
+&#10230; Linearmente dipendente — Un insieme di vettori è detto linearmente dipendente se uno dei vettori nell'insieme può essere definito come combinazione lineare degli altri.
 
 <br>
 
 **40. Remark: if no vector can be written this way, then the vectors are said to be linearly independent**
 
-&#10230;
+&#10230; Osservazione: se nessun vettore può essere scritto in questo modo, allora i vettori sono detti linearmente indipendenti
 
 <br>
 
 **41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
 
-&#10230;
+&#10230; Rango di una matrice — Il rango di una data matrice A si indica rank(A) ed è della dimensione dello spazio vettoriale generato dalle sue colonne. Questo è equivalente al numero massimo di colonne linearmente dipendenti di A. 
 
 <br>
 
 **42. Positive semi-definite matrix ― A matrix A∈Rn×n is positive semi-definite (PSD) and is noted A⪰0 if we have:**
 
-&#10230;
+&#10230; Matrice semidefinita positiva — Una matrice A∈Rn×n è semidefinita positiva (PSD) ed è indicata da A⪰0, se abbiamo che:
 
 <br>
 

From badb23f9edfb6b1cfa5c6b92cecfd12023a817aa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=86=20Piotti?=
 <46027345+aepiotti@users.noreply.github.com>
Date: Wed, 19 Feb 2020 11:58:13 +0100
Subject: [PATCH 476/531] Update cs-229-linear-algebra.md

all translated, //TODO review
---
 it/cs-229-linear-algebra.md | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/it/cs-229-linear-algebra.md b/it/cs-229-linear-algebra.md
index bbce49096..83aeedb31 100644
--- a/it/cs-229-linear-algebra.md
+++ b/it/cs-229-linear-algebra.md
@@ -235,7 +235,7 @@ x,y∈V, abbiamo che:
 
 **39. Linearly dependence ― A set of vectors is said to be linearly dependent if one of the vectors in the set can be defined as a linear combination of the others.**
 
-&#10230; Linearmente dipendente — Un insieme di vettori è detto linearmente dipendente se uno dei vettori nell'insieme può essere definito come combinazione lineare degli altri.
+&#10230; Linearmente dipendente — Un insieme di vettori è detto linearmente dipendente, se uno dei vettori dell'insieme può essere definito come combinazione lineare degli altri.
 
 <br>
 
@@ -259,88 +259,88 @@ x,y∈V, abbiamo che:
 
 **43. Remark: similarly, a matrix A is said to be positive definite, and is noted A≻0, if it is a PSD matrix which satisfies for all non-zero vector x, xTAx>0.**
 
-&#10230;
+&#10230; Osservazione: analogamente, una matrice A è detta definita positiva, ed è indicata con A≻0, se è una matrice PSD che soddisfa per ogni vettore x non nullo, xTAx>0.
 
 <br>
 
 **44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
-&#10230;
+&#10230; Autovalore, autovettore — Data una matrice A∈Rn×n, si dice che λ è un autovalore di A, se esiste un vettore z∈Rn∖{0}, chiamato autovettore, tale che abbiamo:
 
 <br>
 
 **45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230;
+&#10230; Teorema spettrale — Sia A∈Rn×n. Se A è simmetrico, allora A è diagonalizzabile da una matrice reale ortogonale U∈Rn×n. Osservando Λ=diag(λ1,...,λn), abbiamo che: 
 
 <br>
 
 **46. diagonal**
 
-&#10230;
+&#10230; diagonale
 
 <br>
 
 **47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
 
-&#10230;
+&#10230; Decomposizione ai valori singolari — Per una data matrice A di dimensione m×n, la decomposizione ai valori singolari (SVD) è una tecnica di fattorizzazione che garantisce l'esistenza della matrice unitaria m×m, della matrice diagonale Σ m×n e della matrice unitaria V n×n, tale che:
 
 <br>
 
 **48. Matrix calculus**
 
-&#10230;
+&#10230; Matrice
 
 <br>
 
 **49. Gradient ― Let f:Rm×n→R be a function and A∈Rm×n be a matrix. The gradient of f with respect to A is a m×n matrix, noted ∇Af(A), such that:**
 
-&#10230;
+&#10230; Gradiente — Sia f:Rm×n→R una funzione e A∈Rm×n una matrice. Il gradiente di f in funzione di A è una matrice m×n, indicata con ∇Af(A), tale che:
 
 <br>
 
 **50. Remark: the gradient of f is only defined when f is a function that returns a scalar.**
 
-&#10230;
+&#10230; Osservazione: il gradiente di f è definito solamente quando f è una funzione che restituisce uno scalare.
 
 <br>
 
 **51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
 
-&#10230;
+&#10230; Matrice Hessiana — Sia f:Rn→R una funzione e un x∈Rn vettore. La matrice hessiana di f in funzione di x è una matrice non simmetrica, indicata con ∇2xf(x), tale che:
 
 <br>
 
 **52. Remark: the hessian of f is only defined when f is a function that returns a scalar**
 
-&#10230;
+&#10230; Osservazione: la matrice Hessiana di f è definita solamente quando f è una funzione che restituisce uno scalare
 
 <br>
 
 **53. Gradient operations ― For matrices A,B,C, the following gradient properties are worth having in mind:**
 
-&#10230;
+&#10230; Operazioni del gradiente — Per le matrici A,B,C, vale la pena ricordare le seguenti proprietà del gradiente:
 
 <br>
 
 **54. [General notations, Definitions, Main matrices]**
 
-&#10230;
+&#10230; [Notazione generale, Definizioni, Matrici principali]
 
 <br>
 
 **55. [Matrix operations, Multiplication, Other operations]**
 
-&#10230;
+&#10230; [Operazioni tra matrici, Moltiplicazione, Altre operazioni]
 
 <br>
 
 **56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
 
-&#10230;
+&#10230; [Proprietà delle matrici, Norma, Autovalore/Autovettore, Decomposizione ai valori singolari]
 
 <br>
 
 **57. [Matrix calculus, Gradient, Hessian, Operations]**
 
-&#10230;
+&#10230; [Calcolo tra matrici, Gradiente, Matrice Hessiana, Operazioni]

From 340a42c97d8c1da968681ce2db1ea9eeb9c08d48 Mon Sep 17 00:00:00 2001
From: Nicola Dall'Asen <nicola.dallasen@gmail.com>
Date: Sat, 22 Feb 2020 21:07:44 +0100
Subject: [PATCH 477/531] translate up to 24

---
 it/cs-229-probability.md | 385 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 385 insertions(+)
 create mode 100644 it/cs-229-probability.md

diff --git a/it/cs-229-probability.md b/it/cs-229-probability.md
new file mode 100644
index 000000000..611c60052
--- /dev/null
+++ b/it/cs-229-probability.md
@@ -0,0 +1,385 @@
+**Probabilities and Statistics translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/refresher-probabilities-statistics)
+
+<br>
+
+**1. Probabilities and Statistics refresher**
+
+&#10230; Ripasso di Probabilità e Statistica
+
+<br>
+
+**2. Introduction to Probability and Combinatorics**
+
+&#10230; Introduzione di Probabilità e Calcolo Combinatorio
+
+<br>
+
+**3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
+
+&#10230; Spazio campionario ― L'insieme di tutti i risultati possibili di un esperimento è noto come spazio campionario dell'esperimento ed è chiamato S.
+
+<br>
+
+**4. Event ― Any subset E of the sample space is known as an event. That is, an event is a set consisting of possible outcomes of the experiment. If the outcome of the experiment is contained in E, then we say that E has occurred.**
+
+&#10230; Evento ― Ogni sottinsieme E dello spazio campionario è chiamato evento. Un evento è quindi un insieme di possibili risultati dell'esperimento. Se il risultato dell'esperimento è contenuto in E, diciamo che E è accaduto.
+
+<br>
+
+**5. Axioms of probability For each event E, we denote P(E) as the probability of event E occuring.**
+
+&#10230; Assiomi della probabilità Per ogni evento E, chiamiamo P(E) la probabilità che E accada.
+
+<br>
+
+**6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
+
+&#10230; Assioma 1 ― Ogni probabilità ha valore tra 0 e 1 inclusi, quindi:
+
+<br>
+
+**7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
+
+&#10230; Assioma 2 ― La probabilità che almeno uno degli eventi elementari dell'intero spazio campionario avvenga è 1, quindi:
+
+<br>
+
+**8. Axiom 3 ― For any sequence of mutually exclusive events E1,...,En, we have:**
+
+&#10230; Assioma 3 ― Per ogni sequenza di eventi mutualmente esclusivi E1, ..., En, abbiamo:
+
+<br>
+
+**9. Permutation ― A permutation is an arrangement of r objects from a pool of n objects, in a given order. The number of such arrangements is given by P(n,r), defined as:**
+
+&#10230; Permutazione ― Una permutazione è una raggruppamento di r oggetti fra n disponibili in un ordine dato. Il numero di tali raggruppamenti è dato da P(n,r) definito come:
+
+<br>
+
+**10. Combination ― A combination is an arrangement of r objects from a pool of n objects, where the order does not matter. The number of such arrangements is given by C(n,r), defined as:**
+
+&#10230; Combinazione ― Una combinazione è un raggruppamento di r oggetti fra n disponibili dove l'ordine non importa. Il numero di tali raggruppamenti è dato da C(n,r) definito come:
+
+<br>
+
+**11. Remark: we note that for 0⩽r⩽n, we have P(n,r)⩾C(n,r)**
+
+&#10230; Osservazione: notiamo che per 0⩽r⩽n abbiamo che P(n,r)⩾C(n,r)
+
+<br>
+
+**12. Conditional Probability**
+
+&#10230; Probabilità Condizionata
+
+<br>
+
+**13. Bayes' rule ― For events A and B such that P(B)>0, we have:**
+
+&#10230; Teorema di Bayes ― Dati due eventi A e B tali che P(B)>0, abbiamo che:
+
+<br>
+
+**14. Remark: we have P(A∩B)=P(A)P(B|A)=P(A|B)P(B)**
+
+&#10230; Osservazione: abbiamo che P(A∩B)=P(A)P(B|A)=P(A|B)P(B)
+
+<br>
+
+**15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
+
+&#10230; Partizione ― Sia {Ai,i∈[[1,n]]} tale che for ogni i, Ai≠∅. Diciamo che {Ai} è una partizione se abbiamo che:
+
+<br>
+
+**16. Remark: for any event B in the sample space, we have P(B)=n∑i=1P(B|Ai)P(Ai).**
+
+&#10230; Osservazione: per ogni evento B nello spazio campionario, abbiamo che P(B)=n∑i=1P(B|Ai)P(Ai).
+
+<br>
+
+**17. Extended form of Bayes' rule ― Let {Ai,i∈[[1,n]]} be a partition of the sample space. We have:**
+
+&#10230; Forma estesa del teorema di Bayes ― Sia {Ai,i∈[[1,n]]} una partizione dello spazio campionario. Abbiamo che:
+
+<br>
+
+**18. Independence ― Two events A and B are independent if and only if we have:**
+
+&#10230; Indipendenza ― Due eventi A e B sono indipendenti se e solo se abbiamo che:
+
+<br>
+
+**19. Random Variables**
+
+&#10230; Variabili Aleatorie
+
+<br>
+
+**20. Definitions**
+
+&#10230; Definizioni
+
+<br>
+
+**21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
+
+&#10230; Variabile aleatoria ― Una variabile aleatoria, spesso chiamata X, è una funzione dagli elementi dello spazio campionario a un reale.
+
+<br>
+
+**22. Cumulative distribution function (CDF) ― The cumulative distribution function F, which is monotonically non-decreasing and is such that limx→−∞F(x)=0 and limx→+∞F(x)=1, is defined as:**
+
+&#10230; Funzione di ripartizione (cumulativa) ― La funzione di ripartizione F, che è monotona non-decrescente e tale che limx→−∞F(x)=0 e limx→+∞F(x)=1, è definita come:
+
+<br>
+
+**23. Remark: we have P(a<X⩽B)=F(b)−F(a).**
+
+&#10230; Osservazione: abbiamo che P(a<X⩽B)=F(b)−F(a).
+
+<br>
+
+**24. Probability density function (PDF) ― The probability density function f is the probability that X takes on values between two adjacent realizations of the random variable.**
+
+&#10230; Funzione di densità ― La funzione di densità f è la probabilità che X assuma un valore tra due realizzazioni consecutive della variabile aleatoria.
+
+<br>
+
+**25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
+
+&#10230;
+
+<br>
+
+**26. [Case, CDF F, PDF f, Properties of PDF]**
+
+&#10230;
+
+<br>
+
+**27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
+
+&#10230;
+
+<br>
+
+**28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
+
+&#10230;
+
+<br>
+
+**29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
+
+&#10230;
+
+<br>
+
+**30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
+
+&#10230;
+
+<br>
+
+**31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
+
+&#10230;
+
+<br>
+
+**32. Probability Distributions**
+
+&#10230;
+
+<br>
+
+**33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
+
+&#10230;
+
+<br>
+
+**34. Main distributions ― Here are the main distributions to have in mind:**
+
+&#10230;
+
+<br>
+
+**35. [Type, Distribution]**
+
+&#10230;
+
+<br>
+
+**36. Jointly Distributed Random Variables**
+
+&#10230;
+
+<br>
+
+**37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
+
+&#10230;
+
+<br>
+
+**38. [Case, Marginal density, Cumulative function]**
+
+&#10230;
+
+<br>
+
+**39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
+
+&#10230;
+
+<br>
+
+**40. Independence ― Two random variables X and Y are said to be independent if we have:**
+
+&#10230;
+
+<br>
+
+**41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
+
+&#10230;
+
+<br>
+
+**42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
+
+&#10230;
+
+<br>
+
+**43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
+
+&#10230;
+
+<br>
+
+**44. Remark 2: If X and Y are independent, then ρXY=0.**
+
+&#10230;
+
+<br>
+
+**45. Parameter estimation**
+
+&#10230;
+
+<br>
+
+**46. Definitions**
+
+&#10230;
+
+<br>
+
+**47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
+
+&#10230;
+
+<br>
+
+**48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
+
+&#10230;
+
+<br>
+
+**49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
+
+&#10230;
+
+<br>
+
+**50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
+
+&#10230;
+
+<br>
+
+**51. Estimating the mean**
+
+&#10230;
+
+<br>
+
+**52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
+
+&#10230;
+
+<br>
+
+**54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
+
+&#10230;
+
+<br>
+
+**55. Estimating the variance**
+
+&#10230;
+
+<br>
+
+**56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
+
+&#10230;
+
+<br>
+
+**57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
+
+&#10230;
+
+<br>
+
+**58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
+
+&#10230;
+
+<br>
+
+**59. [Introduction, Sample space, Event, Permutation]**
+
+&#10230;
+
+<br>
+
+**60. [Conditional probability, Bayes' rule, Independence]**
+
+&#10230;
+
+<br>
+
+**61. [Random variables, Definitions, Expectation, Variance]**
+
+&#10230;
+
+<br>
+
+**62. [Probability distributions, Chebyshev's inequality, Main distributions]**
+
+&#10230;
+
+<br>
+
+**63. [Jointly distributed random variables, Density, Covariance, Correlation]**
+
+&#10230;
+
+<br>
+
+**64. [Parameter estimation, Mean, Variance]**
+
+&#10230;

From d6b6e3dc1a62cfb54d3b0858335e8858cf073ce1 Mon Sep 17 00:00:00 2001
From: Nicola Dall'Asen <nicola.dallasen@gmail.com>
Date: Sat, 22 Feb 2020 22:41:18 +0100
Subject: [PATCH 478/531] translate up to 48

---
 it/cs-229-probability.md | 68 ++++++++++++++++++++--------------------
 1 file changed, 34 insertions(+), 34 deletions(-)

diff --git a/it/cs-229-probability.md b/it/cs-229-probability.md
index 611c60052..7c7a58200 100644
--- a/it/cs-229-probability.md
+++ b/it/cs-229-probability.md
@@ -148,205 +148,205 @@
 
 **25. Relationships involving the PDF and CDF ― Here are the important properties to know in the discrete (D) and the continuous (C) cases.**
 
-&#10230;
+&#10230; Relazioni tra funzione di densità e di ripartizione ― Sono riportate le proprietà importanti da sapere nel caso discreto (D) e continuo (C).
 
 <br>
 
 **26. [Case, CDF F, PDF f, Properties of PDF]**
 
-&#10230;
+&#10230; [Caso, funzione di ripartizione F, funzione di densità f, Proprietà della funzione di densità]
 
 <br>
 
 **27. Expectation and Moments of the Distribution ― Here are the expressions of the expected value E[X], generalized expected value E[g(X)], kth moment E[Xk] and characteristic function ψ(ω) for the discrete and continuous cases:**
 
-&#10230;
+&#10230; Valore atteso e Momenti della Distribuzione ― Sono riportate le espressioni del valore atteso E[X], valore atteso generalizzato E[g(X)], momento k-esimo E[Xk] e funzione caratteristica ψ(ω) per il caso discreto e continuo:
 
 <br>
 
 **28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
 
-&#10230;
+&#10230; Varianza ― La varianza di una variable aleatoria, spesso denotata da Var(X) o σ2, è una misura della variabilità della funzione di distribuzione. È determinata nel modo seguente:
 
 <br>
 
 **29. Standard deviation ― The standard deviation of a random variable, often noted σ, is a measure of the spread of its distribution function which is compatible with the units of the actual random variable. It is determined as follows:**
 
-&#10230;
+&#10230; Deviazione standard ― La deviazione standard di una variabile aleatoria, spesso denotata da σ, è una misura della variabilità della funzione di distribuzione che è compatibile con l'unità di misura della variabile aleatoria. È determinata nel modo seguente:
 
 <br>
 
 **30. Transformation of random variables ― Let the variables X and Y be linked by some function. By noting fX and fY the distribution function of X and Y respectively, we have:**
 
-&#10230;
+&#10230; Trasformazione di una variabile aleatoria ― Siano X e Y variabili collegate da qualche funzione. Siano fX e fY le funzioni di distribuzione di X e Y, rispettivamente. Abbiamo che:
 
 <br>
 
 **31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
 
-&#10230;
+&#10230; Regola di integrazione di Leibniz ― Sia g una funzione di x e potenzialmente c, e siano a e b contorni che possono dipendere da c. Abbiamo che:
 
 <br>
 
 **32. Probability Distributions**
 
-&#10230;
+&#10230; Distribuzioni di Probabilità
 
 <br>
 
 **33. Chebyshev's inequality ― Let X be a random variable with expected value μ. For k,σ>0, we have the following inequality:**
 
-&#10230;
+&#10230; Disuguaglianza di Chebyshev ― Sia X una variabile aleatoria con valore atteso μ. Per k,σ>0 abbiamo la seguente disuaglianza:
 
 <br>
 
 **34. Main distributions ― Here are the main distributions to have in mind:**
 
-&#10230;
+&#10230; Distribuzioni principali ― Sono presentati le distribuzioni principali da tenere a mente:
 
 <br>
 
 **35. [Type, Distribution]**
 
-&#10230;
+&#10230; [Tipologia, Distrubuzione]
 
 <br>
 
 **36. Jointly Distributed Random Variables**
 
-&#10230;
+&#10230; Distribuzione congiunta di variabili aleatorie
 
 <br>
 
 **37. Marginal density and cumulative distribution ― From the joint density probability function fXY , we have**
 
-&#10230;
+&#10230; Densità marginale e distribuzione cumulativa ― Dalla funzione di densità congiunta fXY abbiamo che:
 
 <br>
 
 **38. [Case, Marginal density, Cumulative function]**
 
-&#10230;
+&#10230; [Caso, Densità marginale, Funzione cumulativa]
 
 <br>
 
 **39. Conditional density ― The conditional density of X with respect to Y, often noted fX|Y, is defined as follows:**
 
-&#10230;
+&#10230; Densità condizionata ― La densità condizionata di X rispetto a Y, spesso denotata come fX|Y, è definita come:
 
 <br>
 
 **40. Independence ― Two random variables X and Y are said to be independent if we have:**
 
-&#10230;
+&#10230; Indipendenza― Due variabili aleatorie X e Y si dicono indipendenti se:
 
 <br>
 
 **41. Covariance ― We define the covariance of two random variables X and Y, that we note σ2XY or more commonly Cov(X,Y), as follows:**
 
-&#10230;
+&#10230; Covarianza ― Si definisce la covarianza di due variabili aleatorie X e Y, denotata da σ2XY o più comunemente Cov(X,Y), come segue:
 
 <br>
 
 **42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
 
-&#10230;
+&#10230; Correlazione ― Date σX,σY le deviazioni standard di X e Y, definiamo la correlazione tra le variabili aleatorie X e Y, denotata da ρXY, come segue:
 
 <br>
 
 **43. Remark 1: we note that for any random variables X,Y, we have ρXY∈[−1,1].**
 
-&#10230;
+&#10230; Osservazione 1: notiamo che per ogni variabile aleatoria X,Y, abbiamo che ρXY∈[−1,1].
 
 <br>
 
 **44. Remark 2: If X and Y are independent, then ρXY=0.**
 
-&#10230;
+&#10230; Osservazione 2: Se X e Y sono indipendenti, allora ρXY=0.
 
 <br>
 
 **45. Parameter estimation**
 
-&#10230;
+&#10230; Stima dei parametri
 
 <br>
 
 **46. Definitions**
 
-&#10230;
+&#10230; Definizioni
 
 <br>
 
 **47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
 
-&#10230;
+&#10230; ―
 
 <br>
 
 **48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
 
-&#10230;
+&#10230; Stimatore ― Uno stimatore è una funzione dei dati usata per dedurre il valore di un parametro sconosciuto in un  modello statistico.
 
 <br>
 
 **49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
 
-&#10230;
+&#10230; Bias ―
 
 <br>
 
 **50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
 
-&#10230;
+&#10230; Osservazione: uno stimatore si dice ??? quando abbiamo E[^θ]=θ.
 
 <br>
 
 **51. Estimating the mean**
 
-&#10230;
+&#10230; Stima della media
 
 <br>
 
 **52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
 
-&#10230;
+&#10230; Media campionaria ―
 
 <br>
 
 **53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
 
-&#10230;
+&#10230; Osservazione:
 
 <br>
 
 **54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
 
-&#10230;
+&#10230; Teorema del Limite Centrale ―
 
 <br>
 
 **55. Estimating the variance**
 
-&#10230;
+&#10230; Stima della varianza
 
 <br>
 
 **56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
 
-&#10230;
+&#10230; Varianza campionaria ―
 
 <br>
 
 **57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
 
-&#10230;
+&#10230; Osservazione: 
 
 <br>
 
 **58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
 
-&#10230;
+&#10230; ―
 
 <br>
 

From d9bba301286229b83fa7546a6a219c4c5a64f958 Mon Sep 17 00:00:00 2001
From: Nicola Dall'Asen <nicola.dallasen@gmail.com>
Date: Sat, 22 Feb 2020 22:51:04 +0100
Subject: [PATCH 479/531] translate up to 51

---
 it/cs-229-probability.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/it/cs-229-probability.md b/it/cs-229-probability.md
index 7c7a58200..559cce734 100644
--- a/it/cs-229-probability.md
+++ b/it/cs-229-probability.md
@@ -292,13 +292,13 @@
 
 **49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
 
-&#10230; Bias ―
+&#10230; Bias (distorsione) ― La distorsione di uno stimatore ^θ è definita come la differenza tra il valore atteso della distribuzione di ^θ e il vero valore, quindi:
 
 <br>
 
 **50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
 
-&#10230; Osservazione: uno stimatore si dice ??? quando abbiamo E[^θ]=θ.
+&#10230; Osservazione: uno stimatore si dice non distorto quando abbiamo E[^θ]=θ.
 
 <br>
 

From 1951fbccd6b643adecf6ef8b637d9c1903c88ff2 Mon Sep 17 00:00:00 2001
From: Nicola Dall'Asen <fodark@protonmail.com>
Date: Sun, 23 Feb 2020 13:36:10 +0100
Subject: [PATCH 480/531] translate up to 64

---
 it/cs-229-probability.md | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/it/cs-229-probability.md b/it/cs-229-probability.md
index 559cce734..f92b0b698 100644
--- a/it/cs-229-probability.md
+++ b/it/cs-229-probability.md
@@ -238,7 +238,7 @@
 
 **40. Independence ― Two random variables X and Y are said to be independent if we have:**
 
-&#10230; Indipendenza― Due variabili aleatorie X e Y si dicono indipendenti se:
+&#10230; Indipendenza ― Due variabili aleatorie X e Y si dicono indipendenti se:
 
 <br>
 
@@ -280,7 +280,7 @@
 
 **47. Random sample ― A random sample is a collection of n random variables X1,...,Xn that are independent and identically distributed with X.**
 
-&#10230; ―
+&#10230; Campione casuale ― Un campione casuale è un gruppo di n variabili aleatorie X1,...,Xn distribuite in modo indipendente e indentico con X.
 
 <br>
 
@@ -292,7 +292,7 @@
 
 **49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
 
-&#10230; Bias (distorsione) ― La distorsione di uno stimatore ^θ è definita come la differenza tra il valore atteso della distribuzione di ^θ e il vero valore, quindi:
+&#10230; Distorsione ― La distorsione di uno stimatore ^θ è definita come la differenza tra il valore atteso della distribuzione di ^θ e il vero valore, quindi:
 
 <br>
 
@@ -310,19 +310,19 @@
 
 **52. Sample mean ― The sample mean of a random sample is used to estimate the true mean μ of a distribution, is often noted ¯¯¯¯¯X and is defined as follows:**
 
-&#10230; Media campionaria ―
+&#10230; Media campionaria ― La media campionaria di un campione casuale è usata per stimare la vera media μ di una distribuzione, è spesso denotata da ¯¯¯¯¯X ed è definita come segue:
 
 <br>
 
 **53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
 
-&#10230; Osservazione:
+&#10230; Osservazione: la media campionaria non è distorta, quindi E[¯¯¯¯¯X]=μ.
 
 <br>
 
 **54. Central Limit Theorem ― Let us have a random sample X1,...,Xn following a given distribution with mean μ and variance σ2, then we have:**
 
-&#10230; Teorema del Limite Centrale ―
+&#10230; Teorema del Limite Centrale ― Sia X1,...,Xn un campione casuale che segue una data distribuzione di media μ e varianza σ2, di conseguenza abbiamo che:
 
 <br>
 
@@ -334,52 +334,52 @@
 
 **56. Sample variance ― The sample variance of a random sample is used to estimate the true variance σ2 of a distribution, is often noted s2 or ^σ2 and is defined as follows:**
 
-&#10230; Varianza campionaria ―
+&#10230; Varianza campionaria ― La varianza campionaria di un campione casuale è usata per stimare il vero valore della varianza σ2 di una distribuzione, è spesso denotata da s2 o ^σ2 ed è definita come segue:
 
 <br>
 
 **57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
 
-&#10230; Osservazione: 
+&#10230; Osservazione: la varianza campionaria non è distorta, quindi E[s2]=σ2.
 
 <br>
 
 **58. Chi-Squared relation with sample variance ― Let s2 be the sample variance of a random sample. We have:**
 
-&#10230; ―
+&#10230; Relazione tra Chi-Quadro e la varianza campionaria ― Sia s2 la varianza campionaria di un campione casuale. Abbiamo che:
 
 <br>
 
 **59. [Introduction, Sample space, Event, Permutation]**
 
-&#10230;
+&#10230; [Introduzione, Spazio campionaria, Evento, Permutazione]
 
 <br>
 
 **60. [Conditional probability, Bayes' rule, Independence]**
 
-&#10230;
+&#10230; [Probabilità condizionata, Teorema di Bayes, Indipendenza]
 
 <br>
 
 **61. [Random variables, Definitions, Expectation, Variance]**
 
-&#10230;
+&#10230; [Variabile aleatoria, Definizioni, Valore atteso, Varianza]
 
 <br>
 
 **62. [Probability distributions, Chebyshev's inequality, Main distributions]**
 
-&#10230;
+&#10230; [Distribuzioni, Disuguaglianza di Chebyshev, Distribuzioni principali]
 
 <br>
 
 **63. [Jointly distributed random variables, Density, Covariance, Correlation]**
 
-&#10230;
+&#10230; [Distribuzione congiunta di variabili aleatorie, Densità, Covarianza, Correlazione]
 
 <br>
 
 **64. [Parameter estimation, Mean, Variance]**
 
-&#10230;
+&#10230; [Stima del parametro, Media, Varianza]

From 56e2d25fcfd425c1f70535a33395104cf2cd7eb3 Mon Sep 17 00:00:00 2001
From: Nicola Dall'Asen <fodark@protonmail.com>
Date: Sun, 23 Feb 2020 22:29:43 +0100
Subject: [PATCH 481/531] fix

---
 it/cs-229-probability.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/it/cs-229-probability.md b/it/cs-229-probability.md
index f92b0b698..85ab60ee8 100644
--- a/it/cs-229-probability.md
+++ b/it/cs-229-probability.md
@@ -16,7 +16,7 @@
 
 **3. Sample space ― The set of all possible outcomes of an experiment is known as the sample space of the experiment and is denoted by S.**
 
-&#10230; Spazio campionario ― L'insieme di tutti i risultati possibili di un esperimento è noto come spazio campionario dell'esperimento ed è chiamato S.
+&#10230; Spazio campionario ― L'insieme di tutti i risultati possibili di un esperimento è noto come spazio campionario dell'esperimento ed è denotato da S.
 
 <br>
 
@@ -88,7 +88,7 @@
 
 **15. Partition ― Let {Ai,i∈[[1,n]]} be such that for all i, Ai≠∅. We say that {Ai} is a partition if we have:**
 
-&#10230; Partizione ― Sia {Ai,i∈[[1,n]]} tale che for ogni i, Ai≠∅. Diciamo che {Ai} è una partizione se abbiamo che:
+&#10230; Partizione ― Sia {Ai,i∈[[1,n]]} tale che per ogni i, Ai≠∅. Diciamo che {Ai} è una partizione se abbiamo che:
 
 <br>
 
@@ -124,7 +124,7 @@
 
 **21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
 
-&#10230; Variabile aleatoria ― Una variabile aleatoria, spesso chiamata X, è una funzione dagli elementi dello spazio campionario a un reale.
+&#10230; Variabile aleatoria ― Una variabile aleatoria, spesso chiamata X, è una funzione che associa ogni elemento dello spazio campionario a un reale.
 
 <br>
 
@@ -202,7 +202,7 @@
 
 **34. Main distributions ― Here are the main distributions to have in mind:**
 
-&#10230; Distribuzioni principali ― Sono presentati le distribuzioni principali da tenere a mente:
+&#10230; Distribuzioni principali ― Di seguito le distribuzioni principali da tenere a mente:
 
 <br>
 
@@ -352,7 +352,7 @@
 
 **59. [Introduction, Sample space, Event, Permutation]**
 
-&#10230; [Introduzione, Spazio campionaria, Evento, Permutazione]
+&#10230; [Introduzione, Spazio campionario, Evento, Permutazione]
 
 <br>
 
@@ -370,7 +370,7 @@
 
 **62. [Probability distributions, Chebyshev's inequality, Main distributions]**
 
-&#10230; [Distribuzioni, Disuguaglianza di Chebyshev, Distribuzioni principali]
+&#10230; [Distribuzioni di probabilità, Disuguaglianza di Chebyshev, Distribuzioni principali]
 
 <br>
 

From f3193cc51fc0f8a0d5b1d830859bc1cff3d5e317 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=86=20Piotti?=
 <46027345+aepiotti@users.noreply.github.com>
Date: Mon, 24 Feb 2020 10:47:35 +0100
Subject: [PATCH 482/531] [it] translation of linear algebra

added contributor
---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index eff44e7e5..9cc9ca182 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -97,6 +97,9 @@
   Prasetia Utama Putra (translation of convolutional neural networks)
   Gunawan Tri (review of convolutional neural networks)
 
+--it
+  Alessandro Piotti (translation of linear algebra)
+
 --ko
   Wooil Jeong (translation of machine learning tips and tricks)
   

From 23816a232bbf49e0523a3b474fc5a00202c89121 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Fri, 28 Feb 2020 22:22:09 -0800
Subject: [PATCH 483/531] Update [it] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index b97bd9fb6..7d0567337 100644
--- a/README.md
+++ b/README.md
@@ -65,7 +65,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**हिन्दी**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|not started|not started|
 |**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
-|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|not started|not started|
+|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/201)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/202)|
 |**日本語**|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
 |**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|

From 520e34f42dbdf41afde593a90ef9d7c4597d8318 Mon Sep 17 00:00:00 2001
From: Renato Kano <renatokano16@gmail.com>
Date: Tue, 3 Mar 2020 16:46:14 -0300
Subject: [PATCH 484/531] Spelling updates (fix)

---
 pt/cs-229-deep-learning.md | 40 +++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/pt/cs-229-deep-learning.md b/pt/cs-229-deep-learning.md
index 2e3e63879..6d7c083f4 100644
--- a/pt/cs-229-deep-learning.md
+++ b/pt/cs-229-deep-learning.md
@@ -24,25 +24,25 @@
 
 **5. [Input layer, hidden layer, output layer]**
 
-&#10230; [Camada de entrada, camada escondida, camada de saída]
+&#10230; [Camada de entrada, camada oculta, camada de saída]
 
 <br>
 
 **6. By noting i the ith layer of the network and j the jth hidden unit of the layer, we have:**
 
-&#10230; Dado que i é a i-ésima camada da rede e j a j-ésima unidade escondida da camada, nós temos:
+&#10230; Dado que i é a i-ésima camada da rede e j a j-ésima unidade oculta da camada, nós temos:
 
 <br>
 
 **7. where we note w, b, z the weight, bias and output respectively.**
 
-&#10230; onde é definido que w, b, z, o peso, o viés e a saída respectivamente. 
+&#10230; onde é definido que w, b, z, representam o peso, o viés e a saída, respectivamente. 
 
 <br>
 
 **8. Activation function ― Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:**
 
-&#10230; Função de ativação - Funções de ativação são usadas no fim de uma unidade escondida para introduzir complexidades não lineares ao modelo. Aqui estão as mais comuns:
+&#10230; Função de ativação - Funções de ativação são usadas no fim de uma unidade oculta para introduzir complexidades não lineares ao modelo. Aqui estão as mais comuns:
 
 <br>
 
@@ -108,7 +108,7 @@
 
 **19. Dropout ― Dropout is a technique meant at preventing overfitting the training data by dropping out units in a neural network. In practice, neurons are either dropped with probability p or kept with probability 1−p**
 
-&#10230; Abandono (Dropout) - Abandono (Dropout) é uma técnica que pretende prevenir o sobreajuste dos dados de treinamente abandonando unidades na rede neural. Na prática, neurônios são ou abandonados com a propabilidade p ou mantidos com a propabilidade 1-p
+&#10230; Abandono (Dropout) - Abandono (Dropout) é uma técnica que pretende evitar o sobreajuste (overfitting) dos dados de treinamento abandonando unidades na rede neural. Na prática, os neurônios são, ou abandonados com a propabilidade p, ou mantidos com a propabilidade 1-p
 
 <br>
 
@@ -120,7 +120,7 @@
 
 **21. Convolutional layer requirement ― By noting W the input volume size, F the size of the convolutional layer neurons, P the amount of zero padding, then the number of neurons N that fit in a given volume is such that:**
 
-&#10230; Requisito de camada convolucional - Dado que W é o tamanho do volume de entrada, F o tamanho dos neurônios da camada convolucional, P a quantidade de preenchimento de zeros, então o número de neurônios N que cabem em um dado volume é tal que:
+&#10230; Requisito da camada convolucional - Dado que W é o tamanho do volume de entrada, F o tamanho dos neurônios da camada convolucional, P a quantidade de preenchimento de zeros, então o número de neurônios N que cabem em um dado volume é tal que:
 
 <br>
 
@@ -132,7 +132,7 @@
 
 **23. It is usually done after a fully connected/convolutional layer and before a non-linearity layer and aims at allowing higher learning rates and reducing the strong dependence on initialization.**
 
-&#10230; Isso é usualmente feito após de uma totalmente conectada/camada concolucional e antes de uma camada não linear e objetiva permitir maiores taxas de apredizado e reduzir a forte dependência na inicialização.
+&#10230; Isso é geralmente feito após uma camada convolucional totalmente conectada e antes de uma camada não-linear, e objetiva permitir maiores taxas de apredizado e reduzir a forte dependência na inicialização.
 
 <br>
 
@@ -144,7 +144,7 @@
 
 **25. Types of gates ― Here are the different types of gates that we encounter in a typical recurrent neural network:**
 
-&#10230; Tipos de portas (gates) - Aqui estão os diferentes tipos de portas (gates) que encontramos em uma rede neural recorrente típica:
+&#10230; Tipos de portas (gates) - Aqui estão os diferentes tipos de portas (gates) que encontramos em uma típica rede neural recorrente:
 
 <br>
 
@@ -162,19 +162,19 @@
 
 **28. LSTM ― A long short-term memory (LSTM) network is a type of RNN model that avoids the vanishing gradient problem by adding 'forget' gates.**
 
-&#10230; LSTM - Uma rede de memória de longo prazo (LSTM) é um tipo de modelo de rede neural recorretne (RNN) que evita o problema do desaparecimento da gradiente adicionando portas de 'esquecimento'.
+&#10230; LSTM - Uma rede de memória de longo prazo (LSTM) é um tipo de modelo de rede neural recorrente (RNN) que evita o problema do desaparecimento do gradiente adicionando portas de 'esquecimento' (forget gate).
 
 <br>
 
 **29. Reinforcement Learning and Control**
 
-&#10230; Aprendizado e Controle Reforçado
+&#10230; Controle e Aprendizado por Reforço
 
 <br>
 
 **30. The goal of reinforcement learning is for an agent to learn how to evolve in an environment.**
 
-&#10230; O objetivo do aprendizado reforçado é fazer um agente aprender como evoluir em um ambiente.
+&#10230; O objetivo do aprendizado por reforço é fazer um agente aprender como evoluir em um ambiente.
 
 <br>
 
@@ -204,7 +204,7 @@
 
 **35. {Psa} are the state transition probabilities for s∈S and a∈A**
 
-&#10230; Psa são as probabilidade de transição de estado para s∈S e a∈A
+&#10230; {Psa} são as probabilidade de transição de estado para s∈S e a∈A
 
 <br>
 
@@ -222,7 +222,7 @@
 
 **38. Policy ― A policy π is a function π:S⟶A that maps states to actions.**
 
-&#10230; Diretriz - Uma diretriz π é a função π:S⟶A que mapeia os estados a ações.
+&#10230; Diretriz - Uma diretriz π é a função π:S⟶A que mapeia os estados em ações.
 
 <br>
 
@@ -240,13 +240,13 @@
 
 **41. Bellman equation ― The optimal Bellman equations characterizes the value function Vπ∗ of the optimal policy π∗:**
 
-&#10230; Equação de Bellman - As equações de Bellman ótimas caracterizam a função de valor Vπ∗ para a ótima diretriz π∗:
+&#10230; Equação de Bellman - As equações de Bellman ótimas descrevem a função de valor Vπ∗ a partir da diretriz ótima π∗:
 
 <br>
 
 **42. Remark: we note that the optimal policy π∗ for a given state s is such that:**
 
-&#10230; Observação: definimos que a ótima diretriz π∗ para um dado estado s é tal que: 
+&#10230; Observe: definimos que a diretriz ótima π∗ para um dado estado s é tal que: 
 
 <br>
 
@@ -270,7 +270,7 @@
 
 **46. Maximum likelihood estimate ― The maximum likelihood estimates for the state transition probabilities are as follows:**
 
-&#10230; Máxima probabilidade estimada - A máxima probabildiade estima para o estado de transição de probabilidades como se segue:
+&#10230; Estimador de Máxima Verossimilhança - O estimador de máxima verossimilhança para as probabilidades de transição de estados são como segue:
 
 <br>
 
@@ -288,7 +288,7 @@
 
 **49. Q-learning ― Q-learning is a model-free estimation of Q, which is done as follows:**
 
-&#10230; Aprendizado Q - Aprendizado Q é um modelo livre de estimativa de Q, o qual é feito como se segue:
+&#10230; Aprendizado-Q (Q-learning) - Aprendizado-Q é um modelo livre de estimativa de Q, o qual é feito como se segue:
 
 <br>
 
@@ -306,16 +306,16 @@
 
 **52. [Convolutional Neural Networks, Convolutional layer, Batch normalization]**
 
-&#10230; [Redes Neurais Convolucionais, Camada convolucional, Normalização em lote]
+&#10230; [Redes Neurais Convolucionais (CNN), Camada Convolucional, Normalização em lote]
 
 <br>
 
 **53. [Recurrent Neural Networks, Gates, LSTM]**
 
-&#10230;[Redes Nerais Recorrentes, Portas (Gates), LSTM]
+&#10230;[Redes Neurais Recorrentes (RNN), Portas (Gates), LSTM]
 
 <br>
 
 **54. [Reinforcement learning, Markov decision processes, Value/policy iteration, Approximate dynamic programming, Policy search]**
 
-&#10230; [Aprendizado reforçado, Processo de decisão de Markov, Iteração de valor/diretriz, Programação dinâmica aproximada, Busca de diretriz]
+&#10230; [Aprendizado por Reforço, Processo de Decisão de Markov, Iteração de valor/diretriz, Programação dinâmica aproximada, Busca de diretriz]

From f9faf22346e2cd06ab6b4bd17a6ebc1762bd302c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=86=20Piotti?=
 <46027345+aepiotti@users.noreply.github.com>
Date: Fri, 13 Mar 2020 11:06:22 +0100
Subject: [PATCH 485/531] Update it/cs-229-linear-algebra.md

Co-Authored-By: Nicola Dall'Asen <nicola.dallasen@gmail.com>
---
 it/cs-229-linear-algebra.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/it/cs-229-linear-algebra.md b/it/cs-229-linear-algebra.md
index 83aeedb31..869606b2d 100644
--- a/it/cs-229-linear-algebra.md
+++ b/it/cs-229-linear-algebra.md
@@ -1,6 +1,6 @@
 **Linear Algebra and Calculus translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-229/refresher-algebra-calculus)
 
-&#10230; Algebra lineare e Calcolo numerico
+&#10230; Algebra lineare e Analisi
 
 <br>
 

From 0b356eeb96cd21260f5df820030b356c13c39fbf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=86=20Piotti?=
 <46027345+aepiotti@users.noreply.github.com>
Date: Fri, 13 Mar 2020 11:06:30 +0100
Subject: [PATCH 486/531] Update it/cs-229-linear-algebra.md

Co-Authored-By: Nicola Dall'Asen <nicola.dallasen@gmail.com>
---
 it/cs-229-linear-algebra.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/it/cs-229-linear-algebra.md b/it/cs-229-linear-algebra.md
index 869606b2d..e86d65b77 100644
--- a/it/cs-229-linear-algebra.md
+++ b/it/cs-229-linear-algebra.md
@@ -6,7 +6,7 @@
 
 **1. Linear Algebra and Calculus refresher**
 
-&#10230; Ripasso di Algebra lineare e Calcolo
+&#10230; Ripasso di Algebra lineare e Analisi
 
 <br>
 

From 5fb84d71a7faed41d8249d37a4b48a700367ff0f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=86=20Piotti?=
 <46027345+aepiotti@users.noreply.github.com>
Date: Fri, 13 Mar 2020 11:06:43 +0100
Subject: [PATCH 487/531] Update it/cs-229-linear-algebra.md

Co-Authored-By: Nicola Dall'Asen <nicola.dallasen@gmail.com>
---
 it/cs-229-linear-algebra.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/it/cs-229-linear-algebra.md b/it/cs-229-linear-algebra.md
index e86d65b77..4c011b3ba 100644
--- a/it/cs-229-linear-algebra.md
+++ b/it/cs-229-linear-algebra.md
@@ -168,7 +168,7 @@
 
 **28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
 
-&#10230; Determinante — Il determinante di una matrice quadrata A∈Rn×n, indicata con |A| o det(A) è espresso ricorsivamente come A\i,\j, che è la matrice A senza l'i-esima riga e la j-esima colonna, come segue:
+&#10230; Determinante — Il determinante di una matrice quadrata A∈Rn×n, indicata con |A| o det(A) è espresso ricorsivamente rispetto a A\i,\j, che è la matrice A senza l'i-esima riga e la j-esima colonna, come segue:
 
 <br>
 

From 2260ceb94af092acf9fa1aa248748a12d337539b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=86=20Piotti?=
 <46027345+aepiotti@users.noreply.github.com>
Date: Fri, 13 Mar 2020 11:07:08 +0100
Subject: [PATCH 488/531] Apply suggestions from code review

Co-Authored-By: Nicola Dall'Asen <nicola.dallasen@gmail.com>
---
 it/cs-229-linear-algebra.md | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/it/cs-229-linear-algebra.md b/it/cs-229-linear-algebra.md
index 4c011b3ba..4d05b1fc4 100644
--- a/it/cs-229-linear-algebra.md
+++ b/it/cs-229-linear-algebra.md
@@ -192,7 +192,7 @@
 
 **32. Symmetric decomposition ― A given matrix A can be expressed in terms of its symmetric and antisymmetric parts as follows:**
 
-&#10230; Scomposizione simmetrica — Una matrice A può essere espressa tramite la sua componente smmetricaed antisimmetrica come segue:
+&#10230; Decomposizione simmetrica — Una matrice A può essere espressa tramite la sua componente simmetrica ed antisimmetrica come segue:
 
 <br>
 
@@ -204,7 +204,7 @@
 
 **34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
 
-&#10230; Norma — La norma è una funzione N:V⟶[0,+∞[ dove V è uno spazioe vettoriale, tale che per 
+&#10230; Norma — La norma è una funzione N:V⟶[0,+∞[ dove V è uno spazio vettoriale, tale che per 
 x,y∈V, abbiamo che:
 
 <br>
@@ -247,7 +247,7 @@ x,y∈V, abbiamo che:
 
 **41. Matrix rank ― The rank of a given matrix A is noted rank(A) and is the dimension of the vector space generated by its columns. This is equivalent to the maximum number of linearly independent columns of A.**
 
-&#10230; Rango di una matrice — Il rango di una data matrice A si indica rank(A) ed è della dimensione dello spazio vettoriale generato dalle sue colonne. Questo è equivalente al numero massimo di colonne linearmente dipendenti di A. 
+&#10230; Rango di una matrice — Il rango di una data matrice A si indica rg(A) ed è la dimensione dello spazio vettoriale generato dalle sue colonne. Questo è equivalente al numero massimo di colonne linearmente indipendenti di A. 
 
 <br>
 
@@ -271,7 +271,7 @@ x,y∈V, abbiamo che:
 
 **45. Spectral theorem ― Let A∈Rn×n. If A is symmetric, then A is diagonalizable by a real orthogonal matrix U∈Rn×n. By noting Λ=diag(λ1,...,λn), we have:**
 
-&#10230; Teorema spettrale — Sia A∈Rn×n. Se A è simmetrico, allora A è diagonalizzabile da una matrice reale ortogonale U∈Rn×n. Osservando Λ=diag(λ1,...,λn), abbiamo che: 
+&#10230; Teorema spettrale — Sia A∈Rn×n. Se A è simmetrico, allora A è diagonalizzabile da una matrice reale ortogonale U∈Rn×n. Chiamando Λ=diag(λ1,...,λn), abbiamo che: 
 
 <br>
 
@@ -283,7 +283,7 @@ x,y∈V, abbiamo che:
 
 **47. Singular-value decomposition ― For a given matrix A of dimensions m×n, the singular-value decomposition (SVD) is a factorization technique that guarantees the existence of U m×m unitary, Σ m×n diagonal and V n×n unitary matrices, such that:**
 
-&#10230; Decomposizione ai valori singolari — Per una data matrice A di dimensione m×n, la decomposizione ai valori singolari (SVD) è una tecnica di fattorizzazione che garantisce l'esistenza della matrice unitaria m×m, della matrice diagonale Σ m×n e della matrice unitaria V n×n, tale che:
+&#10230; Decomposizione ai valori singolari — Per una data matrice A di dimensione m×n, la decomposizione ai valori singolari (SVD) è una tecnica di fattorizzazione che garantisce l'esistenza della matrice unitaria U m×m, della matrice diagonale Σ m×n e della matrice unitaria V n×n, tale che:
 
 <br>
 
@@ -307,7 +307,7 @@ x,y∈V, abbiamo che:
 
 **51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
 
-&#10230; Matrice Hessiana — Sia f:Rn→R una funzione e un x∈Rn vettore. La matrice hessiana di f in funzione di x è una matrice non simmetrica, indicata con ∇2xf(x), tale che:
+&#10230; Matrice Hessiana — Sia f:Rn→R una funzione e x∈Rn un vettore. La matrice hessiana di f in funzione di x è una matrice simmetrica n×n, indicata con ∇2xf(x), tale che:
 
 <br>
 

From 4201ca15aea4a8dc2c96882824b3c916cbcaaa08 Mon Sep 17 00:00:00 2001
From: Nicola Dall'Asen <fodark@protonmail.com>
Date: Fri, 13 Mar 2020 14:07:38 +0100
Subject: [PATCH 489/531] fix

---
 it/cs-229-probability.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/it/cs-229-probability.md b/it/cs-229-probability.md
index 85ab60ee8..7165a859e 100644
--- a/it/cs-229-probability.md
+++ b/it/cs-229-probability.md
@@ -286,7 +286,7 @@
 
 **48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
 
-&#10230; Stimatore ― Uno stimatore è una funzione dei dati usata per dedurre il valore di un parametro sconosciuto in un  modello statistico.
+&#10230; Stimatore ― Uno stimatore è una funzione dei dati usata per dedurre il valore di un parametro sconosciuto in un modello statistico.
 
 <br>
 

From fd2d0236f0d664ae2a4a2dc9735bfa4b88c66c18 Mon Sep 17 00:00:00 2001
From: Nicola Dall'Asen <fodark@protonmail.com>
Date: Fri, 13 Mar 2020 14:07:56 +0100
Subject: [PATCH 490/531] edit contributors

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index eff44e7e5..056db18a6 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -97,6 +97,9 @@
   Prasetia Utama Putra (translation of convolutional neural networks)
   Gunawan Tri (review of convolutional neural networks)
 
+--it
+  Nicola Dall'Asen (translation of probabilities and statistics)
+
 --ko
   Wooil Jeong (translation of machine learning tips and tricks)
   

From dc0f3ebe502ec8e53be44b0ef8ac187f2dd37a5d Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 22 Mar 2020 17:23:56 -0700
Subject: [PATCH 491/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index eff44e7e5..69a24b651 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -141,6 +141,7 @@
 
   Gabriel Fonseca (translation of deep learning)
   Leticia Portella (review of deep learning)
+  Renato Kano (review of deep learning)
 
   Gabriel Fonseca (translation of linear algebra)
   Leticia Portella (review of linear algebra)

From 8b825657fe5a3bef75126b4247200f5af33e28af Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 22 Mar 2020 17:28:28 -0700
Subject: [PATCH 492/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 056db18a6..5b1a2db93 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -99,6 +99,7 @@
 
 --it
   Nicola Dall'Asen (translation of probabilities and statistics)
+  Æ Piotti (review of probabilities and statistics)
 
 --ko
   Wooil Jeong (translation of machine learning tips and tricks)

From f270cc2b941b60356874c6cd6367ee9c9e76f2ae Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 22 Mar 2020 17:30:36 -0700
Subject: [PATCH 493/531] Update [it] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 7d0567337..08e737907 100644
--- a/README.md
+++ b/README.md
@@ -65,7 +65,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**हिन्दी**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|not started|not started|
 |**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
-|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/201)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/202)|
+|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/202)|
 |**日本語**|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
 |**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|

From 0963e8310a7f3ef0e6323d176c33c2a1844dc4b2 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 22 Mar 2020 17:32:42 -0700
Subject: [PATCH 494/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 9cc9ca182..c4840daf6 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -99,6 +99,7 @@
 
 --it
   Alessandro Piotti (translation of linear algebra)
+  Nicola Dall'Asen (review of linear algebra)
 
 --ko
   Wooil Jeong (translation of machine learning tips and tricks)

From b34ce7ec58de48eaee6a41dd954c0264ae088b9e Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 22 Mar 2020 17:35:22 -0700
Subject: [PATCH 495/531] Update progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 08e737907..0c98fb520 100644
--- a/README.md
+++ b/README.md
@@ -65,7 +65,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**हिन्दी**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|not started|not started|
 |**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
-|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/202)|
+|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|done|done|
 |**日本語**|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
 |**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|

From 2bc1d373fb0756deb115b0efe4b7cc6882f52fb5 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 22 Mar 2020 19:29:05 -0700
Subject: [PATCH 496/531] Add contributors

---
 CONTRIBUTORS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index db83fa3d0..ddd105875 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -182,6 +182,12 @@
   Gregory Reshetniak (translation of probabilities and statistics)
   Denys (review of probabilities and statistics)
   
+--vi
+  Tran Tuan Anh (translation of recurrent neural networks)
+  Dam Minh Tien (review of recurrent neural networks)
+  Hung Nguyễn (review of recurrent neural networks)
+  Nguyễn Trí Minh (review of recurrent neural networks)
+
 --zh
   Wang Hongnian (translation of supervised learning)
   Xiaohu Zhu (朱小虎) (review of supervised learning)

From 5f8f55deb76bb6547d9e696599e8885b2b65f6ae Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 22 Mar 2020 19:30:07 -0700
Subject: [PATCH 497/531] Update progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index ba299d551..a99d1c9c3 100644
--- a/README.md
+++ b/README.md
@@ -99,7 +99,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Русский**|not started|not started|not started|
 |**Türkçe**|done|done|done|
 |**Українська**|not started|not started|not started|
-|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/180)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
+|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/180)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
 |**简体中文**|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/181)|not started|
 |**繁體中文**|not started|not started|not started|
 

From 00bf2424b040c889ef7e974c60fdd5d4ee3c3b5c Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 22 Mar 2020 19:45:33 -0700
Subject: [PATCH 498/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index c2c4e3ae1..d28e62daf 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -178,6 +178,10 @@
   Gregory Reshetniak (translation of probabilities and statistics)
   Denys (review of probabilities and statistics)
   
+--vi
+  Phạm Hồng Vinh (translation of convolutional neural networks)
+  Dam Minh Tien (review of convolutional neural networks)
+  
 --zh
   Wang Hongnian (translation of supervised learning)
   Xiaohu Zhu (朱小虎) (review of supervised learning)

From 3482a16eb75291e2edf92160b6a94dc1b1f01e2b Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sun, 22 Mar 2020 19:46:32 -0700
Subject: [PATCH 499/531] Update progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 48c60d76d..e680b2a22 100644
--- a/README.md
+++ b/README.md
@@ -97,7 +97,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Русский**|not started|not started|not started|
 |**Türkçe**|done|done|done|
 |**Українська**|not started|not started|not started|
-|**Tiếng Việt**|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
+|**Tiếng Việt**|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
 |**中文**|not started|not started|not started|
 
 ## Acknowledgements

From f91e8678f04a94ac6ec56209231041b32842ed8f Mon Sep 17 00:00:00 2001
From: Tran Tuan Anh <tt-anh@eole.co.jp>
Date: Tue, 31 Mar 2020 12:31:13 +0900
Subject: [PATCH 500/531] vi translate for ML tips and tricks

---
 ...tsheet-machine-learning-tips-and-tricks.md | 285 ++++++++++++++++++
 1 file changed, 285 insertions(+)
 create mode 100644 vi/cheatsheet-machine-learning-tips-and-tricks.md

diff --git a/vi/cheatsheet-machine-learning-tips-and-tricks.md b/vi/cheatsheet-machine-learning-tips-and-tricks.md
new file mode 100644
index 000000000..d08b7cd9a
--- /dev/null
+++ b/vi/cheatsheet-machine-learning-tips-and-tricks.md
@@ -0,0 +1,285 @@
+**1. Machine Learning tips and tricks cheatsheet**
+
+&#10230; Các mẹo và thủ thuật trong Machine Learning (Học máy)
+
+<br>
+
+**2. Classification metrics**
+
+&#10230; Độ đo phân loại
+
+<br>
+
+**3. In a context of a binary classification, here are the main metrics that are important to track in order to assess the performance of the model.**
+
+&#10230; Đối với phân loại nhị phân (binary classification) là các độ đo chính, chúng khá quan trọng để theo dõi (track), qua đó đánh giá hiệu năng của mô hình (model)
+
+<br>
+
+**4. Confusion matrix ― The confusion matrix is used to have a more complete picture when assessing the performance of a model. It is defined as follows:**
+
+&#10230; Ma trận nhầm lẫn (Confusion matrix) - Confusion matrix được sử dụng để có kết quả hoàn chỉnh hơn khi đánh giá hiệu năng của model. Nó được định nghĩa như sau:
+
+<br>
+
+**5. [Predicted class, Actual class]**
+
+&#10230; [Lớp dự đoán, lớp thực sự]
+
+<br>
+
+**6. Main metrics ― The following metrics are commonly used to assess the performance of classification models:**
+
+&#10230; Độ đo chính - Các độ đo sau thường được sử dụng để đánh giá hiệu năng của mô hình phân loại:
+
+<br>
+
+**7. [Metric, Formula, Interpretation]**
+
+&#10230; [Độ đo, Công thức, Diễn giải]
+
+<br>
+
+**8. Overall performance of model**
+
+&#10230; Hiệu năng tổng thể của mô hình
+
+<br>
+
+**9. How accurate the positive predictions are**
+
+&#10230; Độ chính xác của các dự đoán positive
+
+<br>
+
+**10. Coverage of actual positive sample**
+
+&#10230; Bao phủ các mẫu thử chính xác (positive) thực sự 
+
+<br>
+
+**11. Coverage of actual negative sample**
+
+&#10230; Bao phủ các mẫu thử sai (negative) thực sự 
+
+<br>
+
+**12. Hybrid metric useful for unbalanced classes**
+
+&#10230; Độ đo Hybrid hữu ích cho các lớp không cân bằng (unbalanced classes)
+
+<br>
+
+**13. ROC ― The receiver operating curve, also noted ROC, is the plot of TPR versus FPR by varying the threshold. These metrics are are summed up in the table below:**
+
+&#10230; ROC - Đường cong thao tác nhận, được kí hiệu là ROC, là minh hoạ của TPR với FPR bằng việc thay đổi ngưỡng (threshold). Các độ đo này được tổng kết ở bảng bên dưới:
+
+<br>
+
+**14. [Metric, Formula, Equivalent]**
+
+&#10230; [Độ đo, Công thức, Tương đương]
+
+<br>
+
+**15. AUC ― The area under the receiving operating curve, also noted AUC or AUROC, is the area below the ROC as shown in the following figure:**
+
+&#10230; AUC - Khu vực phía dưới đường cong thao tác nhận, còn được gọi tắt là AUC hoặc AUROC, là khu vực phía dưới ROC như hình minh hoạ phía dưới: 
+
+<br>
+
+**16. [Actual, Predicted]**
+
+&#10230; [Thực sự, Dự đoán]
+
+<br>
+
+**17. Basic metrics ― Given a regression model f, the following metrics are commonly used to assess the performance of the model:**
+
+&#10230; Độ đo cơ bản - Cho trước mô hình hồi quy f, độ đo sau được sử dụng phổ biến để đánh giá hiệu năng của mô hình:
+
+<br>
+
+**18. [Total sum of squares, Explained sum of squares, Residual sum of squares]**
+
+&#10230; [Tổng của tổng các bình phương, Mô hình tổng bình phương, Tổng bình phương dư]
+
+<br>
+
+**19. Coefficient of determination ― The coefficient of determination, often noted R2 or r2, provides a measure of how well the observed outcomes are replicated by the model and is defined as follows:**
+
+&#10230; Hệ số quyết định - Hệ số quyết định, thường được kí hiệu là R2 hoặc r2, cung cấp độ đo mức độ tốt của kết quả quan sát đầu ra (được nhân rộng bởi mô hình), và được định nghĩa như sau:
+
+<br>
+
+**20. Main metrics ― The following metrics are commonly used to assess the performance of regression models, by taking into account the number of variables n that they take into consideration:**
+
+&#10230; Độ đo chính - Độ đo sau đây thường được sử dụng để đánh giá hiệu năng của mô hình hồi quy, bằng cách tính số lượng các biến n mà độ đo đó sẽ cân nhắc:
+
+<br>
+
+**21. where L is the likelihood and ˆσ2 is an estimate of the variance associated with each response.**
+
+&#10230; trong đó L là khả năng và ˆσ2 là giá trị ước tính của phương sai tương ứng với mỗi response (hồi đáp).
+
+<br>
+
+**22. Model selection**
+
+&#10230; Lựa chọn model (mô hình)
+
+<br>
+
+**23. Vocabulary ― When selecting a model, we distinguish 3 different parts of the data that we have as follows:**
+
+&#10230; Vocabulary - Khi lựa chọn mô hình, chúng ta chia tập dữ liệu thành 3 tập con như sau:
+
+<br>
+
+**24. [Training set, Validation set, Testing set]**
+
+&#10230; [Tập huấn luyện, Tập xác thực, Tập kiểm tra (testing)]
+
+<br>
+
+**25. [Model is trained, Model is assessed, Model gives predictions]**
+
+&#10230; [Mô hình được huấn luyện, mô hình được xác thực, mô hình đưa ra dự đoán]
+
+<br>
+
+**26. [Usually 80% of the dataset, Usually 20% of the dataset]**
+
+&#10230; [Thường là 80% tập dữ liệu, Thường là 20% tập dữ liệu]
+
+<br>
+
+**27. [Also called hold-out or development set, Unseen data]**
+
+&#10230; [Cũng được gọi là hold-out hoặc development set (tập phát triển), Dữ liệu chưa được biết]
+
+<br>
+
+**28. Once the model has been chosen, it is trained on the entire dataset and tested on the unseen test set. These are represented in the figure below:**
+
+&#10230; Khi mô hình đã được chọn, nó sẽ được huấn luyện trên tập dữ liệu đầu vào và được test trên tập dữ liệu test hoàn toàn khác. Tất cả được minh hoạ ở hình bên dưới:
+
+<br>
+
+**29. Cross-validation ― Cross-validation, also noted CV, is a method that is used to select a model that does not rely too much on the initial training set. The different types are summed up in the table below:**
+
+&#10230; Cross-validation - Cross-validation, còn được gọi là CV, một phương thức được sử dụng để chọn ra một mô hình không dựa quá nhiều vào tập dữ liệu huấn luyện ban đầu. Các loại khác nhau được tổng kết ở bảng bên dưới:
+
+<br>
+
+**30. [Training on k−1 folds and assessment on the remaining one, Training on n−p observations and assessment on the p remaining ones]**
+
+&#10230; [Huấn luyện trên k-1 phần và đánh giá trên 1 phần còn lại, Huấn luyện trên n-p phần và đánh giá trên p phần còn lại]
+
+<br>
+
+**31. [Generally k=5 or 10, Case p=1 is called leave-one-out]**
+
+&#10230; [Thường thì k=5 hoặc 10, Trường hợp p=1 được gọi là leave-one-out]
+
+<br>
+
+**32. The most commonly used method is called k-fold cross-validation and splits the training data into k folds to validate the model on one fold while training the model on the k−1 other folds, all of this k times. The error is then averaged over the k folds and is named cross-validation error.**
+
+&#10230; Phương thức hay được sử dụng được gọi là k-fold cross-validation và chia dữ liệu huấn luyện thành k phần, đánh giá mô hình trên 1 phần trong khi huấn luyện mô hình trên k-1 phần còn lại, tất cả k lần. Lỗi sau đó được tính trung bình trên k phần và được đặt tên là cross-validation error.
+
+<br>
+
+**33. Regularization ― The regularization procedure aims at avoiding the model to overfit the data and thus deals with high variance issues. The following table sums up the different types of commonly used regularization techniques:**
+
+&#10230; Chuẩn hoá - Mục đích của thủ tục chuẩn hoá là tránh cho mô hình bị overfit với dữ liệu, do đó gặp phải vấn đề phương sai lớn. Bảng sau đây sẽ tổng kết các loại kĩ thuật chuẩn hoá khác nhau hay được sử dụng:
+
+<br>
+
+**34. [Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
+
+&#10230; [Giảm hệ số xuống còn 0, Tốt cho việc lựa chọn biến, Làm cho hệ số nhỏ hơn, Thay đổi giữa chọn biến và hệ số nhỏ hơn]
+
+<br>
+
+**35. Diagnostics**
+
+&#10230; Dự đoán (Diagnostics)
+
+<br>
+
+**36. Bias ― The bias of a model is the difference between the expected prediction and the correct model that we try to predict for given data points.**
+
+&#10230; Bias - Bias của mô hình là sai số giữa dự đoán mong đợi và dự đoán của mô hình trên các điểm dữ liệu cho trước.
+
+<br>
+
+**37. Variance ― The variance of a model is the variability of the model prediction for given data points.**
+
+&#10230; Phương sai - Phương sai của một mô hình là sự thay đổi dự đoán của mô hình trên các điểm dữ liệu cho trước.
+
+<br>
+
+**38. Bias/variance tradeoff ― The simpler the model, the higher the bias, and the more complex the model, the higher the variance.**
+
+&#10230; Thay đổi/ Thay thế Bias/phương sai - Mô hình càng đơn giản bias càng lớn, mô hình càng phức tạp phương sai càng cao.
+
+<br>
+
+**39. [Symptoms, Regression illustration, classification illustration, deep learning illustration, possible remedies]**
+
+&#10230; [Symptoms, Minh hoạ hồi quy, Minh hoạ phân loại, Minh hoạ deep learning (học sâu), Biện pháp khắc phục có thể dùng]
+
+<br>
+
+**40. [High training error, Training error close to test error, High bias, Training error slightly lower than test error, Very low training error, Training error much lower than test error, High variance]**
+
+&#10230; [Lỗi huấn luyện cao, Lỗi huấn luyện tiến gần tới lỗi test, Bias cao, Lỗi huấn luyện thấp hơn một chút so với lỗi test, Lỗi huấn luyện rất thấp, Lỗi huấn luyện thấp hơn lỗi test rất nhiều, Phương sai cao]
+
+<br>
+
+**41. [Complexify model, Add more features, Train longer, Perform regularization, Get more data]**
+
+&#10230; [Mô hình phức tạp, Thêm nhiều đặc trưng, Huấn luyện lâu hơn, Thực hiện chuẩn hóa, Lấy nhiều dữ liệu hơn]
+
+<br>
+
+**42. Error analysis ― Error analysis is analyzing the root cause of the difference in performance between the current and the perfect models.**
+
+&#10230; Phân tích lỗi - Phân tích lỗi là phân tích nguyên nhân của sự khác biệt trong hiệu năng giữa mô hình hiện tại và mô hình lí tưởng.
+
+<br>
+
+**43. Ablative analysis ― Ablative analysis is analyzing the root cause of the difference in performance between the current and the baseline models.**
+
+&#10230; Phân tích Ablative - Phân tích Ablative là phân tích nguyên nhân của sự khác biệt giữa hiệu năng của mô hình hiện tại và mô hình cơ sở.
+
+<br>
+
+**44. Regression metrics**
+
+&#10230; Độ đo hồi quy
+
+<br>
+
+**45. [Classification metrics, confusion matrix, accuracy, precision, recall, F1 score, ROC]**
+
+&#10230; [Độ đo phân loại, Ma trận nhầm lẫn, chính xác, dự đoán, recall, Điểm F1, ROC]
+
+<br>
+
+**46. [Regression metrics, R squared, Mallow's CP, AIC, BIC]**
+
+&#10230; [Độ đo hồi quy, Bình phương R, CP của Mallow, AIC, BIC]
+
+<br>
+
+**47. [Model selection, cross-validation, regularization]**
+
+&#10230; [Lựa chọn mô hình, cross-validation, Chuẩn hoá (regularization)]
+
+<br>
+
+**48. [Diagnostics, Bias/variance tradeoff, error/ablative analysis]**
+
+&#10230; [Dự đoán, Đánh đổi Bias/phương sai, Phân tích lỗi/ablative]

From adb28d855059198e3ec31cb57a0e9758a1db8916 Mon Sep 17 00:00:00 2001
From: Tran Tuan Anh <tt-anh@eole.co.jp>
Date: Wed, 1 Apr 2020 14:39:33 +0900
Subject: [PATCH 501/531] fix vi translate unsupervised learning

---
 vi/cs-229-unsupervised-learning.md | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/vi/cs-229-unsupervised-learning.md b/vi/cs-229-unsupervised-learning.md
index e12309283..6da806cb9 100644
--- a/vi/cs-229-unsupervised-learning.md
+++ b/vi/cs-229-unsupervised-learning.md
@@ -16,7 +16,7 @@
 
 **3. Motivation ― The goal of unsupervised learning is to find hidden patterns in unlabeled data {x(1),...,x(m)}.**
 
-&#10230; Động lực ― Mục tiêu của học không giám sát là tìm được mẫu ẩn (hidden pattern) trong tập dữ liệu không được gán nhãn {x(1),...,x(m)}.
+&#10230; Động lực ― Mục tiêu của học không giám sát là tìm được quy luật ẩn (hidden pattern) trong tập dữ liệu không được gán nhãn {x(1),...,x(m)}.
 
 <br>
 
@@ -40,7 +40,7 @@
 
 **7. Latent variables ― Latent variables are hidden/unobserved variables that make estimation problems difficult, and are often denoted z. Here are the most common settings where there are latent variables:**
 
-&#10230; Các biến Latent - Các biến Latent là các biến ẩn/ không thấy được khiến cho việc dự đoán trở nên khó khăn, và thường được kí hiệu là z. ĐÂy là các thiết lập phổ biến mà các biến latent thường có:
+&#10230; Các biến Latent - Các biến Latent là các biến ẩn/ không thấy được khiến cho việc dự đoán trở nên khó khăn, và thường được kí hiệu là z. Đây là các thiết lập phổ biến mà các biến latent thường có:
 
 <br>
 
@@ -94,7 +94,7 @@
 
 **16. Algorithm ― After randomly initializing the cluster centroids μ1,μ2,...,μk∈Rn, the k-means algorithm repeats the following step until convergence:**
 
-&#10230; Thuật toán - Sau khi khởi tạo ngẫu nhiên các tâm của cụm (centroids) μ1,μ2,...,μk∈Rn, thuật toán k-means lặp lại bước sau cho đến khi hội tụ:
+&#10230; Thuật toán - Sau khi khởi tạo ngẫu nhiên các tâm cụm (centroids) μ1,μ2,...,μk∈Rn, thuật toán k-means lặp lại bước sau cho đến khi hội tụ:
 
 <br>
 
@@ -148,7 +148,7 @@
 
 **25. In an unsupervised learning setting, it is often hard to assess the performance of a model since we don't have the ground truth labels as was the case in the supervised learning setting.**
 
-&#10230; Trong quá trình thiết lập học không giám sát, khá khó khăn để đánh giá hiệu năng của một mô hình vì chúng ta không có các nhãn đủ tin cậy như trong trường hợp của học có giám sát.
+&#10230; Trong quá trình thiết lập học không giám sát, sẽ khá khó khăn để đánh giá hiệu năng của một mô hình vì chúng ta không có các nhãn đủ tin cậy như trong trường hợp của học có giám sát.
 
 <br>
 
@@ -166,7 +166,7 @@
 
 **28. the Calinski-Harabaz index s(k) indicates how well a clustering model defines its clusters, such that the higher the score, the more dense and well separated the clusters are. It is defined as follows:**
 
-&#10230; Chỉ số Calinski-Harabaz s(k) cho biết khả năng phân cụm tốt đến đâu của một mô hình phân cụm, như là với score cao hơn, sẽ kém hơn và việc phân cụm tốt hơn. Nó được định nghĩa như sau:
+&#10230; Chỉ số Calinski-Harabaz s(k) cho biết khả năng phân cụm tốt đến đâu của một mô hình phân cụm, ví dụ như với score cao hơn thì sẽ dày đặc hơn và việc phân cụm tốt hơn. Nó được định nghĩa như sau:
 
 <br>
 
@@ -184,13 +184,13 @@
 
 **31. It is a dimension reduction technique that finds the variance maximizing directions onto which to project the data.**
 
-&#10230; Là một kĩ thuật giảm số chiều dữ liệu, kĩ thuật này sẽ tìm các hướng tối đa hoá phương sai để chiếu dữ liệu trên đó. 
+&#10230; Là một kĩ thuật giảm số chiều dữ liệu, kĩ thuật này sẽ tìm các hướng tối đa hoá phương sai để chiếu dữ liệu lên trên đó. 
 
 <br>
 
 **32. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
-&#10230; Giá trị riêng, vector riêng - Cho ma trận A∈Rn×n, λ là giá trị riêng của A nếu tồn tại một vector z∈Rn∖{0}, gọi là vector riêng, mà ta có như sau:
+&#10230; Giá trị riêng, vector riêng - Cho ma trận A∈Rn×n, λ là giá trị riêng của A nếu tồn tại một vector z∈Rn∖{0}, gọi là vector riêng, như vậy ta có:
 
 <br>
 
@@ -269,7 +269,7 @@ dimensions by maximizing the variance of the data as follows:**
 
 **45. Assumptions ― We assume that our data x has been generated by the n-dimensional source vector s=(s1,...,sn), where si are independent random variables, via a mixing and non-singular matrix A as follows:**
 
-&#10230; Giả định - Chúng ta giả sử rằng dữ liệu x của chúng ta được tạo ra bởi vector nguồn n-chiều s=(s1,...,sn), với si là các biến ngẫu nhiên độc lập, thông qua một ma trận mixing và non-singular A như sau:
+&#10230; Giả định - Chúng ta giả sử rằng dữ liệu x được tạo ra bởi vector nguồn n-chiều s=(s1,...,sn), với si là các biến ngẫu nhiên độc lập, thông qua một ma trận mixing và non-singular A như sau:
 
 <br>
 
@@ -293,13 +293,13 @@ dimensions by maximizing the variance of the data as follows:**
 
 **49. Write the log likelihood given our training data {x(i),i∈[[1,m]]} and by noting g the sigmoid function as:**
 
-&#10230; Ghi log likelihood cho dữ liệu huấn luyện {x(i),i∈[[1,m]]} của chúng ta và bằng cách kí hiệu g là hàm sigmoid như là:
+&#10230; Ghi log likelihood cho dữ liệu huấn luyện {x(i),i∈[[1,m]]} và kí hiệu g là hàm sigmoid như sau:
 
 <br>
 
 **50. Therefore, the stochastic gradient ascent learning rule is such that for each training example x(i), we update W as follows:**
 
-&#10230; Vì thế, quy tắc học của stochastic gradient ascent là cho mỗi ví dụ huấn luyện x(i), chúng ta cập nhật W như sau:
+&#10230; Vì thế, quy tắc học của stochastic gradient ascent là với mỗi ví dụ huấn luyện x(i), chúng ta sẽ cập nhật W như sau:
 
 <br>
 

From 919f417b16e374be3407702ed871fca228861488 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 11 Apr 2020 20:02:28 -0700
Subject: [PATCH 502/531] Rename cheatsheet-machine-learning-tips-and-tricks.md
 to cs-229-machine-learning-tips-and-tricks.md

---
 ...s-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename vi/{cheatsheet-machine-learning-tips-and-tricks.md => cs-229-machine-learning-tips-and-tricks.md} (100%)

diff --git a/vi/cheatsheet-machine-learning-tips-and-tricks.md b/vi/cs-229-machine-learning-tips-and-tricks.md
similarity index 100%
rename from vi/cheatsheet-machine-learning-tips-and-tricks.md
rename to vi/cs-229-machine-learning-tips-and-tricks.md

From 2851ffa177ed19f3f9159100d90eece9ba8aa976 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 11 Apr 2020 20:12:29 -0700
Subject: [PATCH 503/531] Update [vi] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index dd691e6ba..2153c44b0 100644
--- a/README.md
+++ b/README.md
@@ -73,7 +73,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|done|done|
 |**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
-|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/199)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
+|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/199)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
 |**简体中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 |**繁體中文**|done|done|done|done|done|done|
 

From e07f657afaa708e1a39b19e6b40689da333686e0 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 11 Apr 2020 20:17:23 -0700
Subject: [PATCH 504/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 42be371da..fc79c792f 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -216,6 +216,11 @@
   Phạm Hồng Vinh (translation of convolutional neural networks)
   Dam Minh Tien (review of convolutional neural networks)
 
+  Tran Tuan Anh (translation of machine learning tips and tricks)
+  Nguyễn Trí Minh (review of machine learning tips and tricks)
+  Vinh Pham (review of machine learning tips and tricks)
+  Dam Minh Tien (review of machine learning tips and tricks)
+
   Tran Tuan Anh (translation of recurrent neural networks)
   Dam Minh Tien (review of recurrent neural networks)
   Hung Nguyễn (review of recurrent neural networks)

From a6918c3f4dcd4a684029b972369e2036996615c9 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 11 Apr 2020 21:35:13 -0700
Subject: [PATCH 505/531] Create cs-230-convolutional-neural-networks.md

---
 zh-tw/cs-230-convolutional-neural-networks.md | 715 ++++++++++++++++++
 1 file changed, 715 insertions(+)
 create mode 100644 zh-tw/cs-230-convolutional-neural-networks.md

diff --git a/zh-tw/cs-230-convolutional-neural-networks.md b/zh-tw/cs-230-convolutional-neural-networks.md
new file mode 100644
index 000000000..87e24704a
--- /dev/null
+++ b/zh-tw/cs-230-convolutional-neural-networks.md
@@ -0,0 +1,715 @@
+**Convolutional Neural Networks translation** [[webpage]](https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-convolutional-neural-networks)
+
+<br>
+
+**1. Convolutional Neural Networks cheatsheet**
+
+&#10230; 卷積神經網路
+<br>
+
+
+**2. CS 230 - Deep Learning**
+
+&#10230; CS230 - 深度學習
+
+<br>
+
+
+**3. [Overview, Architecture structure]**
+
+&#10230; [概論, 架構結構]
+
+<br>
+
+
+**4. [Types of layer, Convolution, Pooling, Fully connected]**
+
+&#10230; [層的種類, 卷積, 池化, 全連接]
+
+<br>
+
+
+**5. [Filter hyperparameters, Dimensions, Stride, Padding]**
+
+&#10230; [卷積核超參數, 維度, 滑動間隔, 填充]
+
+<br>
+
+
+**6. [Tuning hyperparameters, Parameter compatibility, Model complexity, Receptive field]**
+
+&#10230; [調整超參數, 參數相容性, 模型複雜度, 感知區域]
+
+<br>
+
+
+**7. [Activation functions, Rectified Linear Unit, Softmax]**
+
+&#10230; [激活函數, 線性整流函數, 歸一化指數函數]
+
+<br>
+
+
+**8. [Object detection, Types of models, Detection, Intersection over Union, Non-max suppression, YOLO, R-CNN]**
+
+&#10230; [物體偵測, 模型種類, 偵測, 交併比, 非最大值抑制, YOLO, 區域卷積神經網路]
+
+<br>
+
+
+**9. [Face verification/recognition, One shot learning, Siamese network, Triplet loss]**
+
+&#10230; [人臉驗證/辨別, 單樣本學習, 孿生網路, 三重損失函數]
+
+<br>
+
+
+**10. [Neural style transfer, Activation, Style matrix, Style/content cost function]**
+
+&#10230; [神經風格轉換, 激發, 風格矩陣/內容矩陣, 風格/內容成本函數]
+
+<br>
+
+
+**11. [Computational trick architectures, Generative Adversarial Net, ResNet, Inception Network]**
+
+&#10230; [計算架構手法, 生成對抗網路, 殘差網路, inception 網路]
+
+<br>
+
+
+**12. Overview**
+
+&#10230; 概論
+
+<br>
+
+
+**13. Architecture of a traditional CNN ― Convolutional neural networks, also known as CNNs, are a specific type of neural networks that are generally composed of the following layers:**
+
+&#10230; 傳統卷積神經網路架構 - 卷積神經網路, 簡稱為 CNNs, 是一種神經網路的變形，通常由下列的層組成：
+
+<br>
+
+
+**14. The convolution layer and the pooling layer can be fine-tuned with respect to hyperparameters that are described in the next sections.**
+
+&#10230; 卷積層和池化層可利用超參數來優化，詳細內容由下個部分敘述。
+
+<br>
+
+
+**15. Types of layer**
+
+&#10230; 層的種類
+
+<br>
+
+
+**16. Convolution layer (CONV) ― The convolution layer (CONV) uses filters that perform convolution operations as it is scanning the input I with respect to its dimensions. Its hyperparameters include the filter size F and stride S. The resulting output O is called feature map or activation map.**
+
+&#10230; 卷積層 (CONV) - 卷積層利用卷積核沿著輸入數據的維度進行掃描。其超參數包含卷積核的尺寸 F 和滑動間隔 S。輸出 O 稱為特徵圖或激發圖。
+
+<br>
+
+
+**17. Remark: the convolution step can be generalized to the 1D and 3D cases as well.**
+
+&#10230; 備註：卷積之運算亦可推廣為一維或三維。
+
+<br>
+
+
+**18. Pooling (POOL) ― The pooling layer (POOL) is a downsampling operation, typically applied after a convolution layer, which does some spatial invariance. In particular, max and average pooling are special kinds of pooling where the maximum and average value is taken, respectively.**
+
+&#10230; 池化層 (POOL) - 池化層用於降低取樣頻率，通常用於卷積層之後以處理空間變異性。其中，最大池化與平均池化，分別選取池中之最大值與平均值，為特別的池化種類。
+
+<br>
+
+
+**19. [Type, Purpose, Illustration, Comments]**
+
+&#10230; [種類, 目的, 圖示, 註解]
+
+<br>
+
+
+**20. [Max pooling, Average pooling, Each pooling operation selects the maximum value of the current view, Each pooling operation averages the values of the current view]**
+
+&#10230; [最大池化層, 平均池化層, 每個池化計算該池中之最大值, 每個池化計算該池中平均值]
+
+<br>
+
+
+**21. [Preserves detected features, Most commonly used, Downsamples feature map, Used in LeNet]**
+
+&#10230; [保留偵測到之特徵, 最常使用, 降低特徵圖之採樣頻率,於 LeNet 中使用]
+
+<br>
+
+
+**22. Fully Connected (FC) ― The fully connected layer (FC) operates on a flattened input where each input is connected to all neurons. If present, FC layers are usually found towards the end of CNN architectures and can be used to optimize objectives such as class scores.**
+
+&#10230; 全連接層 (FC) - 全連接層之運作需要扁平的輸入，其中，所有的輸入數值與所有的神經元是全連接的。
+
+<br>
+
+
+**23. Filter hyperparameters**
+
+&#10230; 卷積核的超參數
+
+<br>
+
+
+**24. The convolution layer contains filters for which it is important to know the meaning behind its hyperparameters.**
+
+&#10230; 卷積層有卷積核，而了解其中超參數的意義是重要的。
+
+<br>
+
+
+**25. Dimensions of a filter ― A filter of size F×F applied to an input containing C channels is a F×F×C volume that performs convolutions on an input of size I×I×C and produces an output feature map (also called activation map) of size O×O×1.**
+
+&#10230; 卷積核的維度 - 一個尺寸為 F×F 的卷積核，套用在有 C 個頻道的輸入，是一個維度為 F×F×C 的體，計算卷積於輸入維度為 I×I×C，輸出一個維度為 O×O×1 的特徵圖。
+
+<br>
+
+
+**26. Filter**
+
+&#10230; 卷積核
+
+<br>
+
+
+**27. Remark: the application of K filters of size F×F results in an output feature map of size O×O×K.**
+
+&#10230; 備註：應用 K 個維度為 F×F 的卷積核會得到維度為 O×O×K 的特徵圖。
+
+<br>
+
+
+**28. Stride ― For a convolutional or a pooling operation, the stride S denotes the number of pixels by which the window moves after each operation.**
+
+&#10230; 滑動間隔 - 對卷積或池化的運算，滑動間隔S表示每次運算結束後，視窗移動的像素數量。
+
+<br>
+
+
+**29. Zero-padding ― Zero-padding denotes the process of adding P zeroes to each side of the boundaries of the input. This value can either be manually specified or automatically set through one of the three modes detailed below:**
+
+&#10230; 零填充 - 零填充表示將 P 個 0 填充於輸入資料的邊緣。此數值可手動指定，或是透過以下三種模式自動設定。
+
+<br>
+
+
+**30. [Mode, Value, Illustration, Purpose, Valid, Same, Full]**
+
+&#10230; [模式, 數值, 圖示, 用途, Valid, Same, Full]
+
+<br>
+
+
+**31. [No padding, Drops last convolution if dimensions do not match, Padding such that feature map size has size ⌈IS⌉, Output size is mathematically convenient, Also called 'half' padding, Maximum padding such that end convolutions are applied on the limits of the input, Filter 'sees' the input end-to-end]**
+
+&#10230; [無填充, 維度不相符則捨棄最後一個卷積, 填充使得特徵圖的維度為 ⌈IS⌉, 輸出維度是數學上方便的, 又稱為半填充, 最大的填充使終端的卷積運作於輸入之限度, 卷積核可端到端的「看到」整個輸入]
+
+<br>
+
+
+**32. Tuning hyperparameters**
+
+&#10230; 優化超參數
+
+<br>
+
+
+**33. Parameter compatibility in convolution layer ― By noting I the length of the input volume size, F the length of the filter, P the amount of zero padding, S the stride, then the output size O of the feature map along that dimension is given by:**
+
+&#10230; 卷積層中的參數相容性 - 輸入資料維度 I，卷積核維度 F，零填充維度 P，滑動間隔 S，則輸出的特徵圖維度為 O。
+
+<br>
+
+
+**34. [Input, Filter, Output]**
+
+&#10230; [輸入, 卷積核, 輸出]
+
+<br>
+
+
+**35. Remark: often times, Pstart=Pend≜P, in which case we can replace Pstart+Pend by 2P in the formula above.**
+
+&#10230; 備註：時常 Pstart=Pend≜P，則我們於上式中將 Pstart+Pend 以取 2P 代為。
+
+<br>
+
+
+**36. Understanding the complexity of the model ― In order to assess the complexity of a model, it is often useful to determine the number of parameters that its architecture will have. In a given layer of a convolutional neural network, it is done as follows:**
+
+&#10230; 了解模型複雜度 - 為了了解模型的複雜度，我們時常計算模型中含有的參數量。給定一卷積神經網路，定義為：
+
+<br>
+
+
+**37. [Illustration, Input size, Output size, Number of parameters, Remarks]**
+
+&#10230; [圖示, 輸入維度, 輸出維度, 參數數量, 備註]
+
+<br>
+
+
+**38. [One bias parameter per filter, In most cases, S<F, A common choice for K is 2C]**
+
+&#10230; [一個卷積核一個偏移值, 大部分來說 S<F, K常見的選擇為 2C]
+
+<br>
+
+
+**39. [Pooling operation done channel-wise, In most cases, S=F]**
+
+&#10230; [池化運算以頻道為單位, 大部分來說 S=F]
+
+<br>
+
+
+**40. [Input is flattened, One bias parameter per neuron, The number of FC neurons is free of structural constraints]**
+
+&#10230; [輸入需扁平化, 一個神經元一個偏差值, 全連接層中的神經元數量沒有結構限制]
+
+<br>
+
+
+**41. Receptive field ― The receptive field at layer k is the area denoted Rk×Rk of the input that each pixel of the k-th activation map can 'see'. By calling Fj the filter size of layer j and Si the stride value of layer i and with the convention S0=1, the receptive field at layer k can be computed with the formula:**
+
+&#10230; 接受區 - 在第 k 層的接受區表示為 Rk×Rk，是輸入資料中，可被第 k 個激發圖所看見的像素。設 Fj 為第 j 層中卷積核的尺寸，Si 為第 i 層的滑動間隔，通常為 1；在第 k 層的接受區之運算為以下公式：
+
+<br>
+
+
+**42. In the example below, we have F1=F2=3 and S1=S2=1, which gives R2=1+2⋅1+2⋅1=5.**
+
+&#10230; 以下範例中，F1=F2=3， S1=S2=1，因此 R2=1+2⋅1+2⋅1=5。
+
+<br>
+
+
+**43. Commonly used activation functions**
+
+&#10230; 常用的激發函數。
+
+<br>
+
+
+**44. Rectified Linear Unit ― The rectified linear unit layer (ReLU) is an activation function g that is used on all elements of the volume. It aims at introducing non-linearities to the network. Its variants are summarized in the table below:**
+
+&#10230; 線性整流函數 - 線性整流函數(ReLU)是一激發函數，可應用於所有體中的元素。用於增加非線性的性質到網路中。線性整流函數的變形如下：
+
+<br>
+
+
+**45. [ReLU, Leaky ReLU, ELU, with]**
+
+&#10230; 線性整流函數, 洩漏線性整流器,指數性線性函數, 其中
+
+<br>
+
+
+**46. [Non-linearity complexities biologically interpretable, Addresses dying ReLU issue for negative values, Differentiable everywhere]**
+
+&#10230; [非線性複雜度生物可解釋性, 處理線性整流函數抑制負數問題, 全區間可微分]
+
+<br>
+
+
+**47. Softmax ― The softmax step can be seen as a generalized logistic function that takes as input a vector of scores x∈Rn and outputs a vector of output probability p∈Rn through a softmax function at the end of the architecture. It is defined as follows:**
+
+&#10230; 歸一化指數函數 - 歸一化指數函數可被視為一廣義的邏輯函數，將一個分數的陣列 x∈Rn 輸出為一個機率的陣列 p∈Rn，用於網路架構的終端。定義為：
+
+<br>
+
+
+**48. where**
+
+&#10230; 其中
+
+<br>
+
+
+**49. Object detection**
+
+&#10230; 物體偵測
+
+<br>
+
+
+**50. Types of models ― There are 3 main types of object recognition algorithms, for which the nature of what is predicted is different. They are described in the table below:**
+
+&#10230; 模型種類 - 有三種主要的物體辨別演算法，差別在於預測的目的不同。敘述於以下表格：
+
+<br>
+
+
+**51. [Image classification, Classification w. localization, Detection]**
+
+&#10230; [影像分類, 影像分類定位, 偵測]
+
+<br>
+
+
+**52. [Teddy bear, Book]**
+
+&#10230; [泰迪熊, 書]
+
+<br>
+
+
+**53. [Classifies a picture, Predicts probability of object, Detects an object in a picture, Predicts probability of object and where it is located, Detects up to several objects in a picture, Predicts probabilities of objects and where they are located]**
+
+&#10230; [分類一張圖, 預測可能為一物件的機率, 偵測一張圖中的物件, 預測可能為一物件的機率與物件的位置, 偵測一張圖中的數個物件, 預測可能為一物件的機率與物件的位置]
+
+<br>
+
+
+**54. [Traditional CNN, Simplified YOLO, R-CNN, YOLO, R-CNN]**
+
+&#10230; [傳統的卷積神經網路. 簡化版 YOLO, 區域卷積神經網路, YOLO, 區域卷積神經網路]
+
+<br>
+
+
+**55. Detection ― In the context of object detection, different methods are used depending on whether we just want to locate the object or detect a more complex shape in the image. The two main ones are summed up in the table below:**
+
+&#10230; 偵測 - 於物件之中，選擇不同方法取決於是否想要定位物體的位置，或是偵測更複雜的形狀。兩個主要的介紹如下表：
+
+<br>
+
+
+**56. [Bounding box detection, Landmark detection]**
+
+&#10230; [定界框偵測, 特徵點偵測]
+
+<br>
+
+
+**57. [Detects the part of the image where the object is located, Detects a shape or characteristics of an object (e.g. eyes), More granular]**
+
+&#10230; [偵測影像中有包含物件的部分, 偵測一物件之形狀或特性(如：眼睛), 更精準]
+
+<br>
+
+
+**58. [Box of center (bx,by), height bh and width bw, Reference points (l1x,l1y), ..., (lnx,lny)]**
+
+&#10230; 框的中心 (bx,by), 高 bh 與寬 bw, 參考點 (l1x,l1y), ..., (lnx,lny)]
+
+<br>
+
+
+**59. Intersection over Union ― Intersection over Union, also known as IoU, is a function that quantifies how correctly positioned a predicted bounding box Bp is over the actual bounding box Ba. It is defined as:**
+
+&#10230; 交併比 - 交併比，簡稱為 IoU，是一個用於評估定界框 Bp 預測位置與實際位置 Ba 比較正確性之函數。定義如下：
+
+<br>
+
+
+**60. Remark: we always have IoU∈[0,1]. By convention, a predicted bounding box Bp is considered as being reasonably good if IoU(Bp,Ba)⩾0.5.**
+
+&#10230; 備註：交併比介於 0 到 1 之間。一般來說，一個好的定界框該有 IoU(Bp,Ba)⩾0.5。
+
+<br>
+
+
+**61. Anchor boxes ― Anchor boxing is a technique used to predict overlapping bounding boxes. In practice, the network is allowed to predict more than one box simultaneously, where each box prediction is constrained to have a given set of geometrical properties. For instance, the first prediction can potentially be a rectangular box of a given form, while the second will be another rectangular box of a different geometrical form.**
+
+&#10230; 錨框 - 錨框是一個用於預測重疊定界框的技術。實務上，網路可以同時預測多個定界框，而每個定界框有限制的幾何性質。例如：第一個預測定界框可能是一個正方形，而第二個可能是另一個有不同幾何性質的正方形。
+
+<br>
+
+
+**62. Non-max suppression ― The non-max suppression technique aims at removing duplicate overlapping bounding boxes of a same object by selecting the most representative ones. After having removed all boxes having a probability prediction lower than 0.6, the following steps are repeated while there are boxes remaining:**
+
+&#10230; 非最大值抑制 - 非最大值抑制是一個用於移除重複、重疊選取同一物體定界框的方法，並選取最具代表性的。在去除預測機率小於 0.6 的定界框後，會重複以下的步驟:
+
+<br>
+
+
+**63. [For a given class, Step 1: Pick the box with the largest prediction probability., Step 2: Discard any box having an IoU⩾0.5 with the previous box.]**
+
+&#10230; [給定一類別, 步驟一：選擇有最大機率的定界框, 步驟二：拋棄與前一步驟選取的定界框有 IoU⩾0.5 的定界框]
+
+<br>
+
+
+**64. [Box predictions, Box selection of maximum probability, Overlap removal of same class, Final bounding boxes]**
+
+&#10230; [定界框預測, 選擇有最大機率的定界框, 移除同類別且重疊的定界框, 最終的定界框]
+
+<br>
+
+
+**65. YOLO ― You Only Look Once (YOLO) is an object detection algorithm that performs the following steps:**
+
+&#10230; YOLO - YOLO 是一個物體偵測演算法, 流程如下：
+
+<br>
+
+
+**66. [Step 1: Divide the input image into a G×G grid., Step 2: For each grid cell, run a CNN that predicts y of the following form:, repeated k times]**
+
+&#10230; [步驟一：把輸入影像切成 G×G 個格子, 步驟二：對於每一個格子, 分別進行 CNN 的運算來預測以下所表示的 y：, 重複 k 次]
+
+<br>
+
+
+**67. where pc is the probability of detecting an object, bx,by,bh,bw are the properties of the detected bouding box, c1,...,cp is a one-hot representation of which of the p classes were detected, and k is the number of anchor boxes.**
+
+&#10230; 其中, pc 為預測物體之機率, bx,by,bh,bw 為定界框的屬性, c1,...,cp 為 p 個偵測類別的一位有效編碼, k 為錨框的數量。
+
+<br>
+
+
+**68. Step 3: Run the non-max suppression algorithm to remove any potential duplicate overlapping bounding boxes.**
+
+&#10230; 步驟三: 計算非最大值抑制演算法來移除可能是重複、重疊的定界框。
+
+<br>
+
+
+**69. [Original image, Division in GxG grid, Bounding box prediction, Non-max suppression]**
+
+&#10230;[原始影像, GxG 的格子, 定界框的預測, 非最大值抑制]
+
+<br>
+
+
+**70. Remark: when pc=0, then the network does not detect any object. In that case, the corresponding predictions bx,...,cp have to be ignored.**
+
+&#10230; 備註：當 pc=0，代表網路沒有預測到任何物件。在這種情況下，相關的預測 bx,...,cp可 忽略。
+
+<br>
+
+
+**71. R-CNN ― Region with Convolutional Neural Networks (R-CNN) is an object detection algorithm that first segments the image to find potential relevant bounding boxes and then run the detection algorithm to find most probable objects in those bounding boxes.**
+
+&#10230; 區域卷積神經網路 ― 區域卷積神經網路是一個物件偵測演算法, 先將一個影像分割以找尋可能的定界框, 再執行偵測的演算法來預測最可能出現在該定界框的物件。
+
+<br>
+
+
+**72. [Original image, Segmentation, Bounding box prediction, Non-max suppression]**
+
+&#10230; [原始圖片, 分割, 定界框預測, 非最大值抑制]
+
+<br>
+
+
+**73. Remark: although the original algorithm is computationally expensive and slow, newer architectures enabled the algorithm to run faster, such as Fast R-CNN and Faster R-CNN.**
+
+&#10230; 備註：即使原始的演算法耗費很多計算資源且速度慢，新提出的架構提供更快的演算法，例如快速型區域卷積神經網路與更快速型區域卷積神經網路。
+
+<br>
+
+
+**74. Face verification and recognition**
+
+&#10230; 人臉驗證與辨別
+
+<br>
+
+
+**75. Types of models ― Two main types of model are summed up in table below:**
+
+&#10230; 模型的種類 - 有兩種主要的模型種類，如下表：
+
+<br>
+
+
+**76. [Face verification, Face recognition, Query, Reference, Database]**
+
+&#10230; [人臉驗證, 人臉辨別, 查詢, 對照, 資料庫]
+
+<br>
+
+
+**77. [Is this the correct person?, One-to-one lookup, Is this one of the K persons in the database?, One-to-many lookup]**
+
+&#10230; [是否是正確的人？, 一對一查詢, 是否是K個存在資料庫中的其中一人？, 一對多查詢]
+
+<br>
+
+
+**78. One Shot Learning ― One Shot Learning is a face verification algorithm that uses a limited training set to learn a similarity function that quantifies how different two given images are. The similarity function applied to two images is often noted d(image 1,image 2).**
+
+&#10230; 單樣本學習 - 單樣本學習是一種人臉驗證演算法，使用有限的訓練資料集來學習一個相似度函數，用來量化兩影像之間的差異。應用於兩影像之間的相似度函數時常標示為 d (影像1、 影像2)。
+
+<br>
+
+
+**79. Siamese Network ― Siamese Networks aim at learning how to encode images to then quantify how different two images are. For a given input image x(i), the encoded output is often noted as f(x(i)).**
+
+&#10230; 孿生網路 - 孿生網路之目的為學習如何將影像編碼，並用於後續量化兩影像之間得差異。給定一輸入影像 x(i), 編碼後的輸出標示為 f(x(i))。
+
+<br>
+
+
+**80. Triplet loss ― The triplet loss ℓ is a loss function computed on the embedding representation of a triplet of images A (anchor), P (positive) and N (negative). The anchor and the positive example belong to a same class, while the negative example to another one. By calling α∈R+ the margin parameter, this loss is defined as follows:**
+
+&#10230; 三重損失函數 - 三重損失函數ℓ是一個計算影像 A(錨框)、P(正向樣本) 和 N(負向樣本) 間嵌入表徵的損失函數。錨框與正向樣本屬於同個類別，而與負向樣本不同。指定 α∈R+ 為一範圍參數，此損失函數定義為：
+
+<br>
+
+
+**81. Neural style transfer**
+
+&#10230; 神經風格轉換
+
+<br>
+
+
+**82. Motivation ― The goal of neural style transfer is to generate an image G based on a given content C and a given style S.**
+
+&#10230; 動機 - 神經風格轉換之目的為根據給定的內容 C 與風格 S，產生一張圖片 G。
+
+<br>
+
+
+**83. [Content C, Style S, Generated image G]**
+
+&#10230; [內容 C, 風格 S, 生成影像 G]
+
+<br>
+
+
+**84. Activation ― In a given layer l, the activation is noted a[l] and is of dimensions nH×nw×nc**
+
+&#10230; 激發 - 給定一層 l, 它的激發可表示為 a[l], 其維度為 nH×nw×nc。
+
+<br>
+
+
+**85. Content cost function ― The content cost function Jcontent(C,G) is used to determine how the generated image G differs from the original content image C. It is defined as follows:**
+
+&#10230; 內容成本函數 - 內容成本函數 Jcontent(C,G) 用於計算生成影像 G 與內容影像 C 之間的差異。定義如下：
+
+<br>
+
+
+**86. Style matrix ― The style matrix G[l] of a given layer l is a Gram matrix where each of its elements G[l]kk′ quantifies how correlated the channels k and k′ are. It is defined with respect to activations a[l] as follows:**
+
+&#10230; 風格矩陣 - 於第 l 層的風格矩陣 G[l] 是一個格拉姆矩陣，矩陣中的每個元素 G[l] kk′ 量化 k 與 k′ 頻道之間的相關程度。此矩陣透過激發函數 a[l] 定義如下：
+
+<br>
+
+
+**87. Remark: the style matrix for the style image and the generated image are noted G[l] (S) and G[l] (G) respectively.**
+
+&#10230; 備註：風格影像 S 與生成影像 G 的風格矩陣分別表示為 G[l] (S) 與 G[l] (G)。
+
+<br>
+
+
+**88. Style cost function ― The style cost function Jstyle(S,G) is used to determine how the generated image G differs from the style S. It is defined as follows:**
+
+&#10230; 風格成本函數 - 風格成本函數 Jstyle(S,G) 用於評估生成影像 G 與風格 S 之差別。定義如下：
+
+<br>
+
+
+**89. Overall cost function ― The overall cost function is defined as being a combination of the content and style cost functions, weighted by parameters α,β, as follows:**
+
+&#10230; 總體成本函數 - 總體成本函數定義為內容成本函數與風格成本函數之組合，權重為 α, β 。
+
+<br>
+
+
+**90. Remark: a higher value of α will make the model care more about the content while a higher value of β will make it care more about the style.**
+
+&#10230; 備註：越高的 α 值會使模型會較注重於內容，而較高的 β 值會使模型較注重風格。
+
+<br>
+
+
+**91. Architectures using computational tricks**
+
+&#10230; 計算架構手法
+
+<br>
+
+
+**92. Generative Adversarial Network ― Generative adversarial networks, also known as GANs, are composed of a generative and a discriminative model, where the generative model aims at generating the most truthful output that will be fed into the discriminative which aims at differentiating the generated and true image.**
+
+&#10230; 生成對抗網路 - 生成對抗網路，簡稱為 GANs，是一個由生成網路與對抗網路所組成的模型，其中生成網路的目的為生成最貼近真實的輸出，並當作對抗網路之輸入，而對抗網路之目的為分辨輸入資料為真實或偽造。
+
+<br>
+
+
+**93. [Training, Noise, Real-world image, Generator, Discriminator, Real Fake]**
+
+&#10230; [訓練資料, 雜訊, 真實影像, 生成網路, 對抗網路, 真實 偽造]
+
+<br>
+
+
+**94. Remark: use cases using variants of GANs include text to image, music generation and synthesis.**
+
+&#10230; 備註：生成對抗網路不同種類的用途包括：由文字生成影像、生成或合成音樂等。
+
+<br>
+
+
+**95. ResNet ― The Residual Network architecture (also called ResNet) uses residual blocks with a high number of layers meant to decrease the training error. The residual block has the following characterizing equation:**
+
+&#10230; 殘差網路 - 殘差網路(ResNet) 利用殘差架構連接更高層以減少訓練誤差。殘差架構可表示為下式：
+
+<br>
+
+
+**96. Inception Network ― This architecture uses inception modules and aims at giving a try at different convolutions in order to increase its performance through features diversification. In particular, it uses the 1×1 convolution trick to limit the computational burden.**
+
+&#10230; Inception 網路 - 此架構利用 inception 模組, 目的為嘗試不同的卷積運算, 透過特徵多樣化來提高模型的效能。特別的是, 此架構利用 1×1 卷積技術來限制計算負擔。
+
+<br>
+
+
+**97. The Deep Learning cheatsheets are now available in [target language].**
+
+&#10230; 深度學習參考手冊目前已有[目標語言]版。
+
+<br>
+
+
+**98. Original authors**
+
+&#10230; 原始作者
+
+<br>
+
+
+**99. Translated by X, Y and Z**
+
+&#10230; 由 X, Y 與 Z 翻譯
+
+<br>
+
+
+**100. Reviewed by X, Y and Z**
+
+&#10230; 由 X, Y 與 Z 檢閱
+
+<br>
+
+
+**101. View PDF version on GitHub**
+
+&#10230; 在 GitHub 上閱讀 PDF 版
+
+<br>
+
+
+**102. By X and Y**
+
+&#10230; X, Y
+
+<br>

From cda0d9bce21bf39cb3ddfc2c9446eece767aa0c8 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 11 Apr 2020 21:36:12 -0700
Subject: [PATCH 506/531] Update README.md

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 2153c44b0..8c038a966 100644
--- a/README.md
+++ b/README.md
@@ -101,7 +101,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Українська**|not started|not started|not started|
 |**Tiếng Việt**|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
 |**简体中文**|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/181)|not started|
-|**繁體中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/196)|not started|not started|
+|**繁體中文**|done|not started|not started|
 
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From 6fe76af487e32f5272ecf35e7e83e484156fe9d6 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 11 Apr 2020 21:37:32 -0700
Subject: [PATCH 507/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index fc79c792f..5e77e8f28 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -237,6 +237,9 @@
   Chaoying Xue (review of supervised learning)
 
 --zh-tw
+  kentropy (translation of convolutional neural networks)
+  kevingo (review of convolutional neural networks)
+
   kevingo (translation of deep learning)
   TobyOoO (review of deep learning)
 

From 8fead90809e0b620c789fcc2f0c5f8b2943b7bb4 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 11 Apr 2020 22:42:36 -0700
Subject: [PATCH 508/531] Rename cheatsheet-deep-learning.md to
 cs-229-deep-learning.md

---
 vi/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename vi/{cheatsheet-deep-learning.md => cs-229-deep-learning.md} (100%)

diff --git a/vi/cheatsheet-deep-learning.md b/vi/cs-229-deep-learning.md
similarity index 100%
rename from vi/cheatsheet-deep-learning.md
rename to vi/cs-229-deep-learning.md

From a106cf145935b85280a48b4f2f46c4b66ea63515 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 11 Apr 2020 22:51:01 -0700
Subject: [PATCH 509/531] Update README.md

---
 README.md | 126 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 68 insertions(+), 58 deletions(-)

diff --git a/README.md b/README.md
index dd151c4c8..24a88de72 100644
--- a/README.md
+++ b/README.md
@@ -1,6 +1,6 @@
 # Translation of VIP Cheatsheets
 ## Goal
-This repository aims at collaboratively translating our [Machine Learning](https://github.com/afshinea/stanford-cs-229-machine-learning) and [Deep Learning](https://github.com/afshinea/stanford-cs-230-deep-learning) cheatsheets into a ton of languages, so that this content can be enjoyed by anyone from any part of the world!
+This repository aims at collaboratively translating our [Machine Learning](https://github.com/afshinea/stanford-cs-229-machine-learning), [Deep Learning](https://github.com/afshinea/stanford-cs-230-deep-learning) and [Artificial Intelligence](https://github.com/afshinea/stanford-cs-221-artificial-intelligence) cheatsheets into a ton of languages, so that this content can be enjoyed by anyone from any part of the world!
 
 ## Contribution guidelines
 The translation process of each cheatsheet contains two steps:
@@ -33,65 +33,75 @@ The translation process of each cheatsheet contains two steps:
 ### Important note
 Please make sure to propose the translation of **only one** cheatsheet per pull request -- it simplifies a lot the review process.
 
-
-## Progression for CS 230 (Deep Learning)
-|Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|done|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/128)|not started|
-|Recurrent Neural Nets|not started|done|done|not started|not started|not started|
-|DL tips and tricks|not started|done|done|not started|not started|not started|
-
-|Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
+## Progression
+### CS 221 (Artificial Intelligence)
+| |[Reflex models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-reflex-models.md)|[States models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-states-models.md)|[Variables models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-variables-models.md)|[Logic models](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-221-logic-models.md)|
+|:---|:---:|:---:|:---:|:---:|
+|**Deutsch**|not started|not started|not started|not started|
+|**Español**|not started|not started|not started|not started|
+|**فارسی**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/200)|not started|not started|not started|
+|**Français**|done|done|done|done|
+|**עִבְרִית**|not started|not started|not started|not started|
+|**Italiano**|not started|not started|not started|not started|
+|**日本語**|not started|not started|not started|not started|
+|**한국어**|not started|not started|not started|not started|
+|**Português**|not started|not started|not started|not started|
+|**Türkçe**|done|done|done|done|
+|**Tiếng Việt**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/179)|
+|**简体中文**|not started|not started|not started|not started|
+|**繁體中文**|not started|not started|not started|not started|
+
+### CS 229 (Machine Learning)
+| |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|not started|not started|done|not started|not started|
-|Recurrent Neural Nets|not started|not started|not started|done|not started|not started|
-|DL tips and tricks|not started|not started|not started|done|not started|not started|
-
-|Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|
-|:---|:---:|:---:|:---:|:---:|:---:|
-|Convolutional Neural Nets|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|
-|Recurrent Neural Nets|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|
-|DL tips and tricks|not started|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
-
-## Progression for CS 229 (Machine Learning)
-|Cheatsheet topic|Español|فارسی|Français|日本語|Português|中文|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|
-|Supervised learning|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/144)|done|done|
-|Unsupervised learning|done|done|done|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|
-|ML tips and tricks|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|
-|Probabilities and Statistics|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/142)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|
-|Linear algebra|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/140)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
-
-|Cheatsheet topic|العَرَبِيَّة|עִבְרִית|हिन्दी|Türkçe|Русский|Italiano|
-|:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|done|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|
-|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|
-|Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|not started|not started|done|not started|not started|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|done|not started|not started|
-|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|not started|not started|done|not started|not started|
-|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|not started|not started|done|not started|not started|
-
-
-|Cheatsheet topic|Polski|Suomi|Català|Українська|한국어|
-|:---|:---:|:---:|:---:|:---:|:---:|
-|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|
-|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|
-|Unsupervised learning|not started|not started|not started|not started|done|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|not started|done|
-|Probabilities and Statistics|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|done|done|
-|Linear algebra|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|done|
-
-
-|Cheatsheet topic|Magyar|Deutsch|Bahasa Indonesia|
+|**العَرَبِيَّة**|done|done|done|done|done|done|
+|**Català**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|
+|**Deutsch**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|
+|**Español**|done|done|done|done|done|done|
+|**فارسی**|done|done|done|done|done|done|
+|**Suomi**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/34)|not started|not started|not started|not started|not started|
+|**Français**|done|done|done|done|done|done|
+|**עִבְרִית**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/156)|not started|not started|not started|not started|not started|
+|**हिन्दी**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|not started|not started|
+|**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
+|**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
+|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|done|done|
+|**日本語**|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
+|**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
+|**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|
+|**Português**|done|done|done|done|done|done|
+|**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started|
+|**Türkçe**|done|done|done|done|done|done|
+|**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
+|**Tiếng Việt**|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/199)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
+|**简体中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|**繁體中文**|done|done|done|done|done|done|
+
+### CS 230 (Deep Learning)
+| |[Convolutional Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-convolutional-neural-networks.md)|[Recurrent Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-recurrent-neural-networks.md)|[Deep Learning tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-deep-learning-tips-and-tricks.md)|
 |:---|:---:|:---:|:---:|
-|Deep learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
-|Supervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|not started|
-|Unsupervised learning|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|
-|ML tips and tricks|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|
-|Probabilities and Statistics|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
-|Linear algebra|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124/files#diff-5e2ba65ef08acd57024e82d0ae94b923)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/143)|
-
+|**العَرَبِيَّة**|not started|not started|not started|
+|**Català**|not started|not started|not started|
+|**Deutsch**|not started|not started|not started|
+|**Español**|not started|not started|not started|
+|**فارسی**|done|done|done|
+|**Suomi**|not started|not started|not started|
+|**Français**|done|done|done|
+|**עִבְרִית**|not started|not started|not started|
+|**हिन्दी**|not started|not started|not started|
+|**Magyar**|not started|not started|not started|
+|**Bahasa Indonesia**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
+|**Italiano**|not started|not started|not started|
+|**日本語**|done|done|done|
+|**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
+|**Polski**|not started|not started|not started|
+|**Português**|done|not started|not started|
+|**Русский**|not started|not started|not started|
+|**Türkçe**|done|done|done|
+|**Українська**|not started|not started|not started|
+|**Tiếng Việt**|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/178)|
+|**简体中文**|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/181)|not started|
+|**繁體中文**|done|not started|not started|
 
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From b4a9219a948dd2916f64f6d2b1e16bf1a1a36b97 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Sat, 11 Apr 2020 23:02:26 -0700
Subject: [PATCH 510/531] Add contributors + miscellaneous name corrections

---
 CONTRIBUTORS | 119 ++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 108 insertions(+), 11 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index dc4167fc2..19ffde67f 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -1,14 +1,26 @@
 --ar
   Amjad Khatabi (translation of deep learning)
   Zaid Alyafeai (review of deep learning)
-  
+
   Zaid Alyafeai (translation of linear algebra)
   Amjad Khatabi (review of linear algebra)
   Mazen Melibari (review of linear algebra)
+
+  Fares Al-Qunaieer (translation of machine learning tips and tricks)
+  Zaid Alyafeai (review of machine learning tips and tricks)
   
+  Mahmoud Aslan (translation of probabilities and statistics)
+  Fares Al-Qunaieer (review of probabilities and statistics)
+
+  Fares Al-Qunaieer (translation of supervised learning)
+  Zaid Alyafeai (review of supervised learning)
+
+  Redouane Lguensat (translation of unsupervised learning)
+  Fares Al-Qunaieer (review of unsupervised learning)
+
 --de
 
---es 
+--es
   Erick Gabriel Mendoza Flores (translation of deep learning)
   Fernando Diaz (review of deep learning)
   Fernando González-Herrera (review of deep learning)
@@ -17,12 +29,12 @@
   Alonso Melgar López (review of deep learning)
   Gustavo Velasco-Hernández (review of deep learning)
   Juan Manuel Nava Zamudio (review of deep learning)
-  
+
   Fernando González-Herrera (translation of linear algebra)
   Fernando Diaz (review of linear algebra)
   Gustavo Velasco-Hernández (review of linear algebra)
   Juan P. Chavat (review of linear algebra)
-  
+
   David Jiménez Paredes (translation of machine learning tips and tricks)
   Fernando Diaz (translation of machine learning tips and tricks)
   Gustavo Velasco-Hernández (review of machine learning tips and tricks)
@@ -40,7 +52,7 @@
   Jaime Noel Alvarez Luna (translation of unsupervised learning)
   Alonso Melgar López (review of unsupervised learning)
   Fernando Diaz (review of unsupervised learning)
-  
+
 --fa
   AlisterTA (translation of convolutional neural networks)
   Ehsan Kermani (translation of convolutional neural networks)
@@ -55,7 +67,7 @@
 
   Erfan Noury (translation of linear algebra)
   Mohammad Karimi (review of linear algebra)
-  
+
   AlisterTA (translation of machine learning tips and tricks)
   Mohammad Reza (translation of machine learning tips and tricks)
   Erfan Noury (review of machine learning tips and tricks)
@@ -70,10 +82,10 @@
   Amirhosein Kazemnejad (translation of supervised learning)
   Erfan Noury (review of supervised learning)
   Mohammad Karimi (review of supervised learning)
-  
+
   Erfan Noury (translation of unsupervised learning)
   Mohammad Karimi (review of unsupervised learning)
-  
+
 --fr
   Original authors
 
@@ -81,21 +93,62 @@
 
 --hi
 
+--id
+  Prasetia Utama Putra (translation of convolutional neural networks)
+  Gunawan Tri (review of convolutional neural networks)
+
+--it
+  Alessandro Piotti (translation of linear algebra)
+  Nicola Dall'Asen (review of linear algebra)
+  
+  Nicola Dall'Asen (translation of probabilities and statistics)
+  Alessandro Piotti (review of probabilities and statistics)
+
 --ko
   Wooil Jeong (translation of machine learning tips and tricks)
   
   Wooil Jeong (translation of probabilities and statistics)
   
-  Kwang Hyeok Ahn (translation of Unsupervised Learning)
+  Kwang Hyeok Ahn (translation of unsupervised learning)
 
 --ja
-
+  Tran Tuan Anh (translation of convolutional neural networks)
+  Yoshiyuki Nakai (review of convolutional neural networks)
+  Linh Dang (review of convolutional neural networks)
+  
+  Taichi Kato (translation of deep learning)
+  Dan Lillrank (review of deep learning)
+  Yoshiyuki Nakai (review of deep learning)
+  Yuki Tokyo (review of deep learning)
+  
+  Kamuela Lau (translation of deep learning tips and tricks)
+  Yoshiyuki Nakai (review of deep learning tips and tricks)
+  Hiroki Mori (review of deep learning tips and tricks)
+  
+  Robert Altena (translation of linear algebra)
+  Kamuela Lau (review of linear algebra)
+  
+  Takatoshi Nao (translation of probabilities and statistics)
+  Yuta Kanzawa (review of probabilities and statistics)
+  
+  H. Hamano (translation of recurrent neural networks)
+  Yoshiyuki Nakai (review of recurrent neural networks)
+  
+  Yuta Kanzawa (translation of supervised learning)
+  Tran Tuan Anh (review of supervised learning)
+  
+  Tran Tuan Anh (translation of unsupervised learning)
+  Yoshiyuki Nakai (review of unsupervised learning)
+  Yuta Kanzawa (review of unsupervised learning)
+  Dan Lillrank (review of unsupervised learning)
+  
 --pt
   Leticia Portella (translation of convolutional neural networks)
   Gabriel Aparecido Fonseca (review of convolutional neural networks)
 
   Gabriel Fonseca (translation of deep learning)
   Leticia Portella (review of deep learning)
+  Renato Kano (review of deep learning)
 
   Gabriel Fonseca (translation of linear algebra)
   Leticia Portella (review of linear algebra)
@@ -110,7 +163,7 @@
   Leticia Portella (translation of supervised learning)
   Gabriel Fonseca (review of supervised learning)
   Flavio Clesio (review of supervised learning)
-  
+
   Gabriel Fonseca (translation of unsupervised learning)
   Tiago Danin (review of unsupervised learning)
 
@@ -127,6 +180,9 @@
   Kadir Tekeli (translation of linear algebra)
   Ekrem Çetinkaya (review of linear algebra)
   
+  Ayyüce Kızrak (translation of logic-based models)
+  Başak Buluz (review of logic-based models)
+  
   Seray Beşer (translation of machine learning tips and tricks)
   Ayyüce Kızrak (review of machine learning tips and tricks)
   Yavuz Kömeçoğlu (review of machine learning tips and tricks)
@@ -137,22 +193,60 @@
   Başak Buluz (translation of recurrent neural networks)
   Yavuz Kömeçoğlu (review of recurrent neural networks)
   
+  Yavuz Kömeçoğlu (translation of reflex-based models)
+  Ayyüce Kızrak (review of reflex-based models)
+  
+  Cemal Gurpinar (translation of states-based models)
+  Başak Buluz (review of states-based models)
+  
   Başak Buluz (translation of supervised learning)
   Ayyüce Kızrak (review of supervised learning)
   
   Yavuz Kömeçoğlu (translation of unsupervised learning)
   Başak Buluz (review of unsupervised learning)
   
+  Başak Buluz (translation of variables-based models)
+  Ayyüce Kızrak (review of variables-based models)
+  
 --uk
   Gregory Reshetniak (translation of probabilities and statistics)
   Denys (review of probabilities and statistics)
   
+--vi
+  Phạm Hồng Vinh (translation of convolutional neural networks)
+  Đàm Minh Tiến (review of convolutional neural networks)
+  
+  Trần Tuấn Anh (translation of deep learning)
+  Phạm Hồng Vinh (review of deep learning)
+  Đàm Minh Tiến (review of deep learning)
+  Nguyễn Khánh Hưng (review of deep learning)
+  Hoàng Vũ Đạt (review of deep learning)
+  Nguyễn Trí Minh (review of deep learning)
+
+  Trần Tuấn Anh (translation of machine learning tips and tricks)
+  Nguyễn Trí Minh (review of machine learning tips and tricks)
+  Vinh Pham (review of machine learning tips and tricks)
+  Đàm Minh Tiến (review of machine learning tips and tricks)
+
+  Trần Tuấn Anh (translation of recurrent neural networks)
+  Đàm Minh Tiến (review of recurrent neural networks)
+  Hung Nguyễn (review of recurrent neural networks)
+  Nguyễn Trí Minh (review of recurrent neural networks)
+
+  Trần Tuấn Anh (translation of supervised learning)
+  Đàm Minh Tiến (review of supervised learning)
+  Hung Nguyễn (review of supervised learning)
+  Nguyễn Trí Minh (review of supervised learning)
+  
 --zh
   Wang Hongnian (translation of supervised learning)
   Xiaohu Zhu (朱小虎) (review of supervised learning)
   Chaoying Xue (review of supervised learning)
 
 --zh-tw
+  kentropy (translation of convolutional neural networks)
+  kevingo (review of convolutional neural networks)
+
   kevingo (translation of deep learning)
   TobyOoO (review of deep learning)
 
@@ -168,3 +262,6 @@
   kevingo (translation of unsupervised learning)
   imironhead (review of unsupervised learning)
   johnnychhsu (review of unsupervised learning)
+
+  kevingo (translation of machine learning tips and tricks)
+  kentropy (review of machine learning tips and tricks)

From 7a19a6cdfcc43febd1976afdf33f4320fb73656e Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Wed, 15 Apr 2020 22:33:13 -0700
Subject: [PATCH 511/531] Update [it] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 24a88de72..4bb94dee9 100644
--- a/README.md
+++ b/README.md
@@ -65,7 +65,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**हिन्दी**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|not started|not started|
 |**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
-|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|done|done|
+|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/207)|not started|not started|done|done|
 |**日本語**|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
 |**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|

From 8a2ea1ff6dc58895ac81848005d16f79f64f2f33 Mon Sep 17 00:00:00 2001
From: Minh Tuan <tuan.hm@teko.vn>
Date: Sun, 19 Apr 2020 11:57:27 +0700
Subject: [PATCH 512/531] Change better translation for word correlation

---
 vi/cs-229-probability.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/vi/cs-229-probability.md b/vi/cs-229-probability.md
index d0784543e..82d784be6 100644
--- a/vi/cs-229-probability.md
+++ b/vi/cs-229-probability.md
@@ -250,7 +250,7 @@
 
 **42. Correlation ― By noting σX,σY the standard deviations of X and Y, we define the correlation between the random variables X and Y, noted ρXY, as follows:**
 
-&#10230; Mối tương quan ― Kí hiệu σX,σY là độ lệch chuẩn của X và Y, chúng ta xác định mối tương quan giữa X và Y, kí hiệu ρXY, như sau:
+&#10230; Hệ số tương quan ― Kí hiệu σX,σY là độ lệch chuẩn của X và Y, chúng ta xác định hệ số tương quan giữa X và Y, kí hiệu ρXY, như sau:
 
 <br>
 
@@ -376,7 +376,7 @@
 
 **63. [Jointly distributed random variables, Density, Covariance, Correlation]**
 
-&#10230; [Các biến ngẫu nhiên đồng thời, Mật độ, Hiệp phương sai, Mối tương quan]
+&#10230; [Các biến ngẫu nhiên đồng thời, Mật độ, Hiệp phương sai, Hệ số tương quan]
 
 <br>
 

From 239f47843a8c7538b48eb5c36a3465fa1ce6b44d Mon Sep 17 00:00:00 2001
From: Minh Tuan <tuan.hm@teko.vn>
Date: Sun, 19 Apr 2020 12:13:30 +0700
Subject: [PATCH 513/531] Edit translate by suggestion of @damminhtien

---
 vi/cs-221-logic-models.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/vi/cs-221-logic-models.md b/vi/cs-221-logic-models.md
index 045dc5851..010ad09f6 100644
--- a/vi/cs-221-logic-models.md
+++ b/vi/cs-221-logic-models.md
@@ -88,7 +88,7 @@
 
 **13. Definition ― The knowledge base KB is the conjunction of all formulas that have been considered so far. The set of models of the knowledge base is the intersection of the set of models that satisfy each formula. In other words:**
 
-&#10230; Định nghĩa - Cơ sở tri thức KB là sự kết hợp của tất cả các công thức đã được xem xét cho đến nay. Tập hợp các mô hình của cơ sở tri thức là giao điểm của tập hợp các mô hình thỏa mãn từng công thức. Nói cách khác:
+&#10230; Định nghĩa - Cơ sở tri thức KB là sự kết hợp của tất cả các công thức đã được xem xét cho đến nay. Tập hợp các mô hình của cơ sở tri thức là tập giao của tập hợp các mô hình thỏa mãn từng công thức. Nói cách khác:
 
 <br>
 
@@ -123,7 +123,7 @@
 
 **18. Relation between formulas and knowledge base - We define the following properties between the knowledge base KB and a new formula f:**
 
-&#10230; Mối liên hệ giữa công thức và cơ sở tri thức - Chúng tôi xác định các thuộc tính sau giữa KB cơ sở tri thức và công thức mới f:
+&#10230; Mối liên hệ giữa công thức và cơ sở tri thức - Chúng tôi định nghĩa các thuộc tính sau giữa KB cơ sở tri thức và công thức mới f:
 
 <br>
 
@@ -137,7 +137,7 @@
 
 **20. [KB entails f, KB contradicts f, f contingent to KB]**
 
-&#10230; [KB đòi hỏi f, KB mâu thuẫn với f, f phụ thuộc vào KB]
+&#10230; [KB suy luận (kết thừa) từ f, KB mâu thuẫn với f, f phụ thuộc vào KB]
 
 <br>
 
@@ -403,7 +403,7 @@
 
 **58. [Knowledge base, Definition, Probabilistic interpretation, Satisfiability, Relationship with formulas, Forward inference, Rule properties]**
 
-&#10230; [Cơ sở tri thức, Định nghĩa, Giải thích xác suất, Hài lòng, Mối quan hệ với các công thức, Suy luận chuyển tiếp, Thuộc tính quy tắc]
+&#10230; [Cơ sở tri thức, Định nghĩa, Giải thích xác suất, Sự thỏa mãn, Mối quan hệ với các công thức, Suy luận chuyển tiếp, Thuộc tính quy tắc]
 
 <br>
 
@@ -459,4 +459,4 @@
 
 **66. The Artificial Intelligence cheatsheets are now available in [target language].**
 
-&#10230; Trí tuệ nhân tạo cheatsheats hiện đã có vơi ngôn ngữ [Tiếng Việt]
+&#10230; Trí tuệ nhân tạo cheatsheats hiện đã có với ngôn ngữ [Tiếng Việt]

From 08367f553d81931c958c74339634f7fe7a194d0e Mon Sep 17 00:00:00 2001
From: Minh Tuan <tuan.hm@teko.vn>
Date: Sun, 19 Apr 2020 12:33:45 +0700
Subject: [PATCH 514/531] Edit by suggestion of @damminhtien and
 @tuananhhedspibk

---
 vi/cs-230-deep-learning-tips-and-tricks.md | 70 +++++++++++-----------
 1 file changed, 35 insertions(+), 35 deletions(-)

diff --git a/vi/cs-230-deep-learning-tips-and-tricks.md b/vi/cs-230-deep-learning-tips-and-tricks.md
index 6edf85a81..d07da7509 100644
--- a/vi/cs-230-deep-learning-tips-and-tricks.md
+++ b/vi/cs-230-deep-learning-tips-and-tricks.md
@@ -4,14 +4,14 @@
 
 **1. Deep Learning Tips and Tricks cheatsheet**
 
-&#10230; Một số mẹo trong học sâu cheatsheet
+&#10230; Cheatsheet về một số thủ thuật trong Deep Learning
 
 <br>
 
 
 **2. CS 230 - Deep Learning**
 
-&#10230; CS 230 - Học sâu
+&#10230; CS 230 - Deep Learning
 
 <br>
 
@@ -25,14 +25,14 @@
 
 **4. [Data processing, Data augmentation, Batch normalization]**
 
-&#10230; [Xử lí dữ liệu, Thêm dữ liệu, Chuẩn hóa batch]
+&#10230; [Xử lí dữ liệu, Data augmentation, Batch normalization]
 
 <br>
 
 
 **5. [Training a neural network, Epoch, Mini-batch, Cross-entropy loss, Backpropagation, Gradient descent, Updating weights, Gradient checking]**
 
-&#10230; [Huấn luyện một mô hình nhân tạo, Epoch, Mini-batch, Cross-entropy loss, Lan truyền ngược, Gradient descent, Cập nhật trọng số, Kiểm tra gradient]
+&#10230; [Huấn luyện mạng neural, Epoch, Mini-batch, Cross-entropy loss, Lan truyền ngược, Gradient descent, Cập nhật trọng số, Gradient checking]
 
 <br>
 
@@ -74,7 +74,7 @@
 
 **11. Data augmentation ― Deep learning models usually need a lot of data to be properly trained. It is often useful to get more data from the existing ones using data augmentation techniques. The main ones are summed up in the table below. More precisely, given the following input image, here are the techniques that we can apply:**
 
-&#10230; Tăng cường dữ liệu - Các mô hình học sâu thường cần rất nhiều dữ liệu để có thể được huấn luyện đúng cách. Nó thường hữu ích để có được nhiều dữ liệu hơn từ những cái hiện có bằng cách sử dụng các kỹ thuật tăng dữ liệu. Những cái chính được tóm tắt trong bảng dưới đây. Chính xác hơn, với hình ảnh đầu vào sau đây, đây là những kỹ thuật mà chúng ta có thể áp dụng:
+&#10230; Data augmentation - Các mô hình Deep Learning thường cần rất nhiều dữ liệu để có thể được huấn luyện đúng cách. Việc sử dụng các kỹ thuật Data augmentation là khá hữu ích để có thêm nhiều dữ liệu hơn từ tập dữ liệu hiện thời. Những kĩ thuật chính được tóm tắt trong bảng dưới đây. Chính xác hơn, với hình ảnh đầu vào sau đây, đây là những kỹ thuật mà chúng ta có thể áp dụng:
 
 <br>
 
@@ -102,7 +102,7 @@
 
 **15. [Nuances of RGB is slightly changed, Captures noise that can occur with light exposure, Addition of noise, More tolerance to quality variation of inputs, Parts of image ignored, Mimics potential loss of parts of image, Luminosity changes, Controls difference in exposition due to time of day]**
 
-&#10230; [Các sắc thái của RGB bị thay đổi một chút, Nhiễu nhiễu có thể xảy ra khi tiếp xúc với ánh sáng nhẹ, Bổ sung nhiễu, Chịu được sự thay đổi chất lượng của các yếu tố đầu vào, Các phần của hình ảnh bị bỏ qua, Bắt chước mất khả năng của các phần của hình ảnh, Thay đổi độ sáng, Kiểm soát sự khác biệt do phơi sáng do thời gian trong ngày]
+&#10230; [Các sắc thái của RGB bị thay đổi một chút, Captures noise có thể xảy ra khi tiếp xúc với ánh sáng nhẹ, Bổ sung nhiễu, Chịu được sự thay đổi chất lượng của các yếu tố đầu vào, Các phần của hình ảnh bị bỏ qua, Mô phỏng khả năng mất của các phần trong hình ảnh, Thay đổi độ sáng, Kiểm soát sự khác biệt do phơi sáng theo thời gian trong ngày]
 
 <br>
 
@@ -130,7 +130,7 @@
 
 **19. Training a neural network**
 
-&#10230; Huấn luyện một mô hình nhân tạo
+&#10230; Huấn luyện mạng neural
 
 <br>
 
@@ -144,21 +144,21 @@
 
 **21. Epoch ― In the context of training a model, epoch is a term used to refer to one iteration where the model sees the whole training set to update its weights.**
 
-&#10230; Vòng lặp ― Trong ngữ cảnh huấn luyện mô hình, vòng lặp là một từ chỉ một lần lặp qua toàn bộ dữ liệu huấn luyện để cập nhật tham số.
+&#10230; Epoch ― Trong ngữ cảnh huấn luyện mô hình, epoch là một thuật ngữ chỉ một vòng lặp mà mô hình sẽ duyệt toàn bộ tập dữ liệu huấn luyện để cập nhật trọng số của nó.
 
 <br>
 
 
 **22. Mini-batch gradient descent ― During the training phase, updating weights is usually not based on the whole training set at once due to computation complexities or one data point due to noise issues. Instead, the update step is done on mini-batches, where the number of data points in a batch is a hyperparameter that we can tune.**
 
-&#10230; Giảm độ dốc theo lô nhỏ - Trong giai đoạn đào tạo, việc cập nhật trọng số thường không dựa trên toàn bộ tập huấn cùng một lúc do độ phức tạp tính toán hoặc một điểm dữ liệu do vấn đề nhiễu. Thay vào đó, bước cập nhật được thực hiện trên các lô nhỏ, trong đó số lượng điểm dữ liệu trong một lô là một siêu tham số mà chúng ta có thể điều chỉnh.
+&#10230; Mini-batch gradient descent - Trong quá trình huấn luyện, việc cập nhật trọng số thường không dựa trên toàn bộ tập huấn luyện cùng một lúc do độ phức tạp tính toán hoặc một điểm dữ liệu nhiễu. Thay vào đó, bước cập nhật được thực hiện trên các lô nhỏ (mini-batch), trong đó số lượng điểm dữ liệu trong một lô (batch) là một siêu tham số (hyperparameter) mà chúng ta có thể điều chỉnh.
 
 <br>
 
 
 **23. Loss function ― In order to quantify how a given model performs, the loss function L is usually used to evaluate to what extent the actual outputs y are correctly predicted by the model outputs z.**
 
-&#10230; Hàm mất mát - Để định lượng cách thức một mô hình nhất định thực hiện, hàm mất L thường được sử dụng để đánh giá mức độ đầu ra thực tế y được dự đoán chính xác bởi mô hình đầu ra z.
+&#10230; Hàm mất mát - Để định lượng cách thức một mô hình nhất định thực hiện, hàm mất mát L thường được sử dụng để đánh giá mức độ đầu ra thực tế y được dự đoán chính xác bởi đầu ra của mô hình là z.
 
 <br>
 
@@ -179,35 +179,35 @@
 
 **26. Backpropagation ― Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to each weight w is computed using the chain rule.**
 
-&#10230; Lan truyền ngược - Lan truyền ngược là một phương pháp để cập nhật các trọng số trong mạng nhân tạo bằng cách tính đến đầu ra thực tế và đầu ra mong muốn. Đạo hàm tương ứng với từng trọng số w được tính bằng quy tắc chuỗi.
+&#10230; Lan truyền ngược (Backpropagation) - Lan truyền ngược là một phương thức để cập nhật các trọng số trong mạng neural bằng cách tính toán đầu ra thực tế và đầu ra mong muốn. Đạo hàm tương ứng với từng trọng số w được tính bằng quy tắc chuỗi.
 
 <br>
 
 
 **27. Using this method, each weight is updated with the rule:**
 
-&#10230; Sử dụng mô hình này, mỗi trọng số được cập nhật theo quy luật:
+&#10230; Sử dụng phương thức này, mỗi trọng số được cập nhật theo quy luật:
 
 <br>
 
 
 **28. Updating weights ― In a neural network, weights are updated as follows:**
 
-&#10230; Cập nhật trọng số ― Trong một mô hình nhân tạo, trọng số được cập nhật như sau:
+&#10230; Cập nhật trọng số ― Trong một mạng neural, các trọng số được cập nhật như sau:
 
 <br>
 
 
 **29. [Step 1: Take a batch of training data and perform forward propagation to compute the loss, Step 2: Backpropagate the loss to get the gradient of the loss with respect to each weight, Step 3: Use the gradients to update the weights of the network.]**
 
-&#10230; [Bước 1: Lấy một loạt dữ liệu huấn luyện và thực hiện lan truyền thẳng để tính toán mất mát, Bước 2: Sao lưu lại mất mát để có được độ dốc của mất mát theo từng trọng số, Bước 3: Sử dụng độ dốc để cập nhật trọng số của mạng.]
+&#10230; [Bước 1: Lấy một loạt dữ liệu huấn luyện và thực hiện lan truyền xuôi (forward propagation) để tính toán mất mát, Bước 2: Lan truyền ngược mất mát để có được độ dốc (gradient) của mất mát theo từng trọng số, Bước 3: Sử dụng độ dốc để cập nhật trọng số của mạng.]
 
 <br>
 
 
 **30. [Forward propagation, Backpropagation, Weights update]**
 
-&#10230; [Lan truyền thẳng, Lan truyền ngược, Cập nhật trọng số]
+&#10230; [Lan truyền xuôi, Lan truyền ngược, Cập nhật trọng số]
 
 <br>
 
@@ -228,14 +228,14 @@
 
 **33. Xavier initialization ― Instead of initializing the weights in a purely random manner, Xavier initialization enables to have initial weights that take into account characteristics that are unique to the architecture.**
 
-&#10230; Khởi tạo Xavier - Thay vì khởi tạo trọng số một cách ngẫu nhiên, khởi tạo Xavier cho chúng ta một cách khởi tạo tham số dựa trên một đặc tính độc nhất của mô hình.
+&#10230; Khởi tạo Xavier - Thay vì khởi tạo trọng số một cách ngẫu nhiên, khởi tạo Xavier cho chúng ta một cách khởi tạo trọng số dựa trên một đặc tính độc nhất của kiến trúc mô hình.
 
 <br>
 
 
 **34. Transfer learning ― Training a deep learning model requires a lot of data and more importantly a lot of time. It is often useful to take advantage of pre-trained weights on huge datasets that took days/weeks to train, and leverage it towards our use case. Depending on how much data we have at hand, here are the different ways to leverage this:**
 
-&#10230; Học chuyển tiếp - Huấn luyện một mô hình học tập sâu đòi hỏi nhiều dữ liệu và quan trọng hơn là rất nhiều thời gian. Sẽ rất hữu ích để tận dụng các trọng số được đào tạo trước trên các bộ dữ liệu khổng lồ mất vài ngày / tuần để đào tạo và tận dụng nó cho trường hợp sử dụng của chúng ta. Tùy thuộc vào lượng dữ liệu chúng ta có trong tay, đây là các cách khác nhau để tận dụng điều này:
+&#10230; Transfer learning - Huấn luyện một mô hình deep learning đòi hỏi nhiều dữ liệu và quan trọng hơn là rất nhiều thời gian. Sẽ rất hữu ích để tận dụng các trọng số đã được huyến luyện trước trên các bộ dữ liệu rất lớn mất vài ngày / tuần để huấn luyện và tận dụng nó cho trường hợp (use case) của chúng ta. Tùy thuộc vào lượng dữ liệu chúng ta có trong tay, đây là các cách khác nhau để tận dụng điều này:
 
 <br>
 
@@ -256,35 +256,35 @@
 
 **37. [Freezes all layers, trains weights on softmax, Freezes most layers, trains weights on last layers and softmax, Trains weights on layers and softmax by initializing weights on pre-trained ones]**
 
-&#10230; [Đông đặc các lớp, huấn luyện hàm softmax, Đông đặc hầu hết các lớp, huấn luyện trên lớp cuối và hàm softmax, Huấn luyện trọng số trên tầng và softmax với khởi tạo trọng số trên mô hình đã huấn luyện sẵn]
+&#10230; [Cố định các tầng, huấn luyện trọng số trên hàm softmax, Cố định hầu hết các tầng, huấn luyện trọng số trên tầng cuối và hàm softmax, Huấn luyện trọng số trên tầng và softmax bằng việc khởi tạo trọng số trên mô hình đã huấn luyện sẵn]
 
 <br>
 
 
 **38. Optimizing convergence**
 
-&#10230; Tối ưu hội tự
+&#10230; Tối ưu hội tụ
 
 <br>
 
 
 **39. Learning rate ― The learning rate, often noted α or sometimes η, indicates at which pace the weights get updated. It can be fixed or adaptively changed. The current most popular method is called Adam, which is a method that adapts the learning rate.**
 
-&#10230; Tốc độ học - Tốc độ học, thường được kí hiệu là α hoặc đôi khi là η, cho biết mức độ thay đổi của các trọng số sau mỗi lần cập nhật. Nó có thể được cố định hoặc thay đổi thích ứng. Phương pháp phổ biến nhất hiện nay được gọi là Adam, đây là phương pháp thích nghi với tốc độ học.
+&#10230; Tốc độ học - Tốc độ học, thường được kí hiệu là α hoặc đôi khi là η, cho biết mức độ thay đổi của các trọng số sau mỗi lần được cập nhật. Nó có thể được cố định hoặc thay đổi thích ứng. Phương thức phổ biến nhất hiện nay là Adam, đây là phương thức thích nghi với tốc độ học.
 
 <br>
 
 
 **40. Adaptive learning rates ― Letting the learning rate vary when training a model can reduce the training time and improve the numerical optimal solution. While Adam optimizer is the most commonly used technique, others can also be useful. They are summed up in the table below:**
 
-&#10230; Tốc độ học thích nghi - Để tốc độ học thay đổi khi huấn luyện một mô hình có thể giảm thời gian huấn luyện và cải thiện giải pháp tối ưu số. Trong khi Adam tối ưu hóa là kỹ thuật được sử dụng phổ biến nhất, những phương pháp khác cũng có thể hữu ích. Chúng được tóm tắt trong bảng dưới đây:
+&#10230; Tốc độ học thích nghi - Để cho tốc độ học thay đổi khi huấn luyện một mô hình có thể giảm thời gian huấn luyện và cải thiện giải pháp tối ưu số. Trong khi tối ưu hóa Adam (Adam optimizer) là kỹ thuật được sử dụng phổ biến nhất, nhưng những phương pháp khác cũng có thể hữu ích. Chúng được tổng kết trong bảng dưới đây:
 
 <br>
 
 
 **41. [Method, Explanation, Update of w, Update of b]**
 
-&#10230; [Phương pháp, Giải thích, Cập nhật của w, Cập nhật của b]
+&#10230; [Phương thức, Giải thích, Cập nhật của w, Cập nhật của b]
 
 <br>
 
@@ -298,14 +298,14 @@
 
 **43. [RMSprop, Root Mean Square propagation, Speeds up learning algorithm by controlling oscillations]**
 
-&#10230; [RMSprop, lan truyền Root Mean Square, Thuật toán tăng tốc bằng kiểm soát dao động]
+&#10230; [RMSprop, lan truyền Root Mean Square, Thuật toán tăng tốc độ học bằng kiểm soát dao động]
 
 <br>
 
 
 **44. [Adam, Adaptive Moment estimation, Most popular method, 4 parameters to tune]**
 
-&#10230; [Adam, Ước lượng Adam Moment, Các phương pháp phổ biến, 4 tham số để tinh chỉnh]
+&#10230; [Adam, Ước lượng Adaptive Moment, Các phương pháp phổ biến, 4 tham số để tinh chỉnh]
 
 <br>
 
@@ -319,28 +319,28 @@
 
 **46. Regularization**
 
-&#10230; Phạt mô hình
+&#10230; Regularization
 
 <br>
 
 
 **47. Dropout ― Dropout is a technique used in neural networks to prevent overfitting the training data by dropping out neurons with probability p>0. It forces the model to avoid relying too much on particular sets of features.**
 
-&#10230; Dropout - Dropout là một kỹ thuật được sử dụng trong các mạng nhân tạo để ngăn chặn hiện tượng quá khớp bằng cách loại bỏ các nơ-ron với xác suất p>0. Nó buộc mô hình tránh phụ thuộc quá nhiều vào một tập thuộc tính nào đó.
+&#10230; Dropout - Dropout là một kỹ thuật được sử dụng trong các mạng neural để tránh overfitting trên tập huấn luyện bằng cách loại bỏ các nơ-ron (neural) với xác suất p>0. Nó giúp mô hình không bị phụ thuộc quá nhiều vào một tập thuộc tính nào đó.
 
 <br>
 
 
 **48. Remark: most deep learning frameworks parametrize dropout through the 'keep' parameter 1−p.**
 
-&#10230; Ghi chú: hầu hết các framework học máy có cài đặt dropout thông qua biến 'keep' với tham số 1-p.
+&#10230; Ghi chú: hầu hết các frameworks deep learning đều có thiết lập dropout thông qua biến tham số 'keep' 1-p.
 
 <br>
 
 
 **49. Weight regularization ― In order to make sure that the weights are not too large and that the model is not overfitting the training set, regularization techniques are usually performed on the model weights. The main ones are summed up in the table below:**
 
-&#10230; Phạt trọng số - Để đảm bảo rằng các trọng số không quá lớn và mô hình không vượt quá tập huấn luyện, các kỹ thuật chính quy thường được thực hiện trên các trọng số mô hình. Những cái chính được tóm tắt trong bảng dưới đây:
+&#10230; Weight regularization - Để đảm bảo rằng các trọng số không quá lớn và mô hình không bị overfitting trên tập huấn luyện, các kỹ thuật chính quy (regularization) thường được thực hiện trên các trọng số của mô hình. Những kĩ thuật chính được tổng kết trong bảng dưới đây:
 
 <br>
 
@@ -353,20 +353,20 @@
 
 **50 bis. Shrinks coefficients to 0, Good for variable selection, Makes coefficients smaller, Tradeoff between variable selection and small coefficients]**
 
-&#10230; bis. Thu hẹp hệ số về 0, Tốt cho lựa chọn biến, Làm cho hệ số nhỏ hơn, Trao đổi giữa lựa chọn biến và hệ số nhỏ]
+&#10230; bis. Giảm hệ số về 0, Tốt cho việc lựa chọn biến, Làm cho hệ số nhỏ hơn, Đánh đổi giữa việc lựa chọn biến và hệ số nhỏ]
 
 <br>
 
 **51. Early stopping ― This regularization technique stops the training process as soon as the validation loss reaches a plateau or starts to increase.**
 
-&#10230; Dừng sớm - Kĩ thuật regularization này sẽ dừng quá trình huấn luyện một khi mất mát trên tập thẩm định đạt đến một độ nào đó hoặc bắt đầu tăng
+&#10230; Dừng sớm - Kĩ thuật regularization này sẽ dừng quá trình huấn luyện một khi mất mát trên tập thẩm định (validation) đạt đến một ngưỡng nào đó hoặc bắt đầu tăng.
 
 <br>
 
 
 **52. [Error, Validation, Training, early stopping, Epochs]**
 
-&#10230; [Lỗi, Thẩm định, Huấn luyện, dừng sớm, Vòng]
+&#10230; [Lỗi, Thẩm định, Huấn luyện, dừng sớm, Vòng lặp]
 
 <br>
 
@@ -380,14 +380,14 @@
 
 **54. Overfitting small batch ― When debugging a model, it is often useful to make quick tests to see if there is any major issue with the architecture of the model itself. In particular, in order to make sure that the model can be properly trained, a mini-batch is passed inside the network to see if it can overfit on it. If it cannot, it means that the model is either too complex or not complex enough to even overfit on a small batch, let alone a normal-sized training set.**
 
-&#10230; Quá khớp batch nhỏ - Khi gỡ lỗi một mô hình, thường rất hữu ích khi thực hiện các thử nghiệm nhanh để xem liệu có bất kỳ vấn đề lớn nào với kiến ​​trúc của chính mô hình đó không. Đặc biệt, để đảm bảo rằng mô hình có thể được huấn luyện đúng cách, một batch nhỏ được truyền vào bên trong mạng để xem liệu nó có thể phù hợp với nó không. Nếu không thể, điều đó có nghĩa là mô hình quá phức tạp hoặc không đủ phức tạp để thậm chí vượt quá trên batch nhỏ, chứ đừng nói đến một tập huấn luyện có kích thước bình thường.
+&#10230; Overfitting small batch - Khi gỡ lỗi một mô hình, khá hữu ích khi thực hiện các kiểm tra (tests) nhanh để xem liệu có bất kỳ vấn đề lớn nào với kiến ​​trúc của mô hình đó không. Đặc biệt, để đảm bảo rằng mô hình có thể được huấn luyện đúng cách, một batch nhỏ (mini-batch) được truyền vào bên trong mạng để xem liệu nó có thể overfit không. Nếu không, điều đó có nghĩa là mô hình quá phức tạp hoặc không đủ phức tạp để thậm chí overfit  trên batch nhỏ (mini-batch), chứ đừng nói đến một tập huấn luyện có kích thước bình thường.
 
 <br>
 
 
 **55. Gradient checking ― Gradient checking is a method used during the implementation of the backward pass of a neural network. It compares the value of the analytical gradient to the numerical gradient at given points and plays the role of a sanity-check for correctness.**
 
-&#10230; Kiểm tra gradient - Kiểm tra gradient là một phương pháp được sử dụng trong quá trình thực hiện đường truyền ngược của mạng thần kinh. Nó so sánh giá trị của gradient phân tích với gradient số tại các điểm đã cho và đóng vai trò kiểm tra độ chính xác.
+&#10230; Kiểm tra gradient - Kiểm tra gradient là một phương thức được sử dụng trong quá trình thực hiện lan truyền ngược của mạng neural. Nó so sánh giá trị của gradient phân tích (analytical gradient) với gradient số (numerical gradient) tại các điểm đã cho và đóng vai trò kiểm tra độ chính xác.
 
 <br>
 
@@ -415,14 +415,14 @@
 
 **59. ['Exact' result, Direct computation, Used in the final implementation]**
 
-&#10230; [Kết quả 'Chính xác', Tính toán trực tiếp, Được sử dụng trong quá trình thực hiện cuối cùng]
+&#10230; [Kết quả 'Chính xác', Tính toán trực tiếp, Được sử dụng trong quá trình triển khai cuối cùng]
 
 <br>
 
 
 **60. The Deep Learning cheatsheets are now available in [target language].**
 
-&#10230; Học sâu cheetsheets đã khả dụng trên [Tiếng Việt]
+&#10230; Deep Learning cheetsheets đã khả dụng trên [Tiếng Việt]
 
 
 **61. Original authors**

From 6a956c6fa96c84b88f2b4cf3d2e93ad578c874c8 Mon Sep 17 00:00:00 2001
From: Minh Tuan <tuan.hm@teko.vn>
Date: Sun, 19 Apr 2020 12:40:12 +0700
Subject: [PATCH 515/531] Edit by suggestion of @damminhtien and
 @tuananhhedspibk

---
 vi/cs-230-deep-learning-tips-and-tricks.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/vi/cs-230-deep-learning-tips-and-tricks.md b/vi/cs-230-deep-learning-tips-and-tricks.md
index d07da7509..897fce9d5 100644
--- a/vi/cs-230-deep-learning-tips-and-tricks.md
+++ b/vi/cs-230-deep-learning-tips-and-tricks.md
@@ -46,14 +46,14 @@
 
 **7. [Regularization, Dropout, Weight regularization, Early stopping]**
 
-&#10230; [Sự phạt mô hình, Dropout, Khởi tạo trọng số, Dừng sớm]
+&#10230; [Regularization, Dropout, Weight regularization, Kỹ thuật Dừng sớm]
 
 <br>
 
 
 **8. [Good practices, Overfitting small batch, Gradient checking]**
 
-&#10230; [Thói quen tốt, Quá khớp tập nhỏ, Kiểm tra đạo hàm]
+&#10230; [Good practices, Overfitting small batch, Gradient checking]
 
 <br>
 
@@ -116,7 +116,7 @@
 
 **17. Batch normalization ― It is a step of hyperparameter γ,β that normalizes the batch {xi}. By noting μB,σ2B the mean and variance of that we want to correct to the batch, it is done as follows:**
 
-&#10230; Chuẩn hóa batch ― Đây là một bước của siêu tham số γ,β chuẩn hóa tập dữ liệu {xi}. Kí hiệu μB,σ2B là trung bình và phương sai của tập dữ liệu ta muốn chuẩn hóa, tuân theo công thức sau:
+&#10230; Chuẩn hóa batch ― Đây là một bước của hyperparameter γ,β chuẩn hóa tập dữ liệu {xi}. Bằng việc kí hiệu μB,σ2B là trung bình và phương sai của tập dữ liệu ta muốn chuẩn hóa, nó được thực hiện như sau:
 
 <br>
 
@@ -165,7 +165,7 @@
 
 **24. Cross-entropy loss ― In the context of binary classification in neural networks, the cross-entropy loss L(z,y) is commonly used and is defined as follows:**
 
-&#10230; Mất entropy chéo - Trong bối cảnh phân loại nhị phân trong các mạng thần kinh, tổn thất entropy chéo L(z,y) thường được sử dụng và được định nghĩa như sau:
+&#10230; Cross-entropy loss - Khi áp dụng phân loại nhị phân (binary classification) trong các mạng neural, cross-entropy loss L(z,y) thường được sử dụng và được định nghĩa như sau:
 
 <br>
 

From f415b012fb386933121755ba5e756fb3ad7067e5 Mon Sep 17 00:00:00 2001
From: Minh Tuan <tuan.hm@teko.vn>
Date: Sun, 19 Apr 2020 12:41:43 +0700
Subject: [PATCH 516/531] Change better translation for word correlation

---
 vi/cs-230-deep-learning-tips-and-tricks.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vi/cs-230-deep-learning-tips-and-tricks.md b/vi/cs-230-deep-learning-tips-and-tricks.md
index 897fce9d5..88a821daa 100644
--- a/vi/cs-230-deep-learning-tips-and-tricks.md
+++ b/vi/cs-230-deep-learning-tips-and-tricks.md
@@ -39,7 +39,7 @@
 
 **6. [Parameter tuning, Xavier initialization, Transfer learning, Learning rate, Adaptive learning rates]**
 
-&#10230; [Tinh chỉnh tham số, Khởi tạo Xavier, Học chuyển tiếp, Tốc độ học, Tốc độ học đáp ứng]
+&#10230; [Parameter tuning, Khởi tạo Xavier, Transfer learning, Tốc độ học, Tốc độ học đáp ứng]
 
 <br>
 

From 0b1194233973ef0fc42824a9b733b64d6a9afe0d Mon Sep 17 00:00:00 2001
From: Minh Tuan <tuan.hm@teko.vn>
Date: Sun, 19 Apr 2020 12:46:20 +0700
Subject: [PATCH 517/531] Edit translation by suggestion of @rootonchair

---
 vi/cs-229-linear-algebra.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/vi/cs-229-linear-algebra.md b/vi/cs-229-linear-algebra.md
index 8d12bc89a..cdc97c845 100644
--- a/vi/cs-229-linear-algebra.md
+++ b/vi/cs-229-linear-algebra.md
@@ -130,7 +130,7 @@
 
 **22. Transpose ― The transpose of a matrix A∈Rm×n, noted AT, is such that its entries are flipped:**
 
-&#10230; Chuyển vị ― Chuyển vị của một ma trận A∈Rm×n, kí hiệu AT, khi các phần tử hàng cột hoán vị trí cho nhau:
+&#10230; Chuyển vị ― Chuyển vị của một ma trận A∈Rm×n, kí hiệu AT, khi các phần tử hàng cột hoán đổi vị trí cho nhau:
 
 <br>
 
@@ -166,7 +166,7 @@
 
 **28. Determinant ― The determinant of a square matrix A∈Rn×n, noted |A| or det(A) is expressed recursively in terms of A∖i,∖j, which is the matrix A without its ith row and jth column, as follows:**
 
-&#10230; Định thức ― Định thức của một ma trận vuông A∈Rn×n, kí hiệu |A| hay det(A) được tính hồi quy với A∖i,∖j, ma trận A xóa đi hàng thứ i và cột thứ j:
+&#10230; Định thức ― Định thức của một ma trận vuông A∈Rn×n, kí hiệu |A| hay det(A) được tính đệ quy với A∖i,∖j, ma trận A xóa đi hàng thứ i và cột thứ j:
 
 <br>
 

From 3def10a7025854bcd7b0856de5773cdc536b4cf3 Mon Sep 17 00:00:00 2001
From: Tran Tuan Anh <tt-anh@eole.co.jp>
Date: Mon, 20 Apr 2020 15:46:01 +0900
Subject: [PATCH 518/531] fix unsupervised learning

---
 vi/cs-229-unsupervised-learning.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vi/cs-229-unsupervised-learning.md b/vi/cs-229-unsupervised-learning.md
index 6da806cb9..f0ded11f1 100644
--- a/vi/cs-229-unsupervised-learning.md
+++ b/vi/cs-229-unsupervised-learning.md
@@ -112,7 +112,7 @@
 
 **19. Hierarchical clustering**
 
-&#10230; Hierarchical clustering
+&#10230; Phân cụm phân cấp
 
 <br>
 

From 97528430f15a92835813ca242877738961c470f1 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 21 Apr 2020 23:11:31 -0700
Subject: [PATCH 519/531] Update [vi] progress

---
 README.md | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/README.md b/README.md
index 677c233aa..4e56821f3 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |:---|:---:|:---:|:---:|:---:|
 |**Deutsch**|not started|not started|not started|not started|
 |**Español**|not started|not started|not started|not started|
-|**فارسی**|not started|not started|not started|not started|
+|**فارسی**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/200)|not started|not started|not started|
 |**Français**|done|done|done|done|
 |**עִבְרִית**|not started|not started|not started|not started|
 |**Italiano**|not started|not started|not started|not started|
@@ -47,13 +47,14 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**한국어**|not started|not started|not started|not started|
 |**Português**|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|
-|**Tiếng Việt**|not started|not started|not started|not started|
-|**中文**|not started|not started|not started|not started|
+|**Tiếng Việt**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/179)|
+|**简体中文**|not started|not started|not started|not started|
+|**繁體中文**|not started|not started|not started|not started|
 
 ### CS 229 (Machine Learning)
 | |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|**العَرَبِيَّة**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|
+|**العَرَبِيَّة**|done|done|done|done|done|done|
 |**Català**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|
 |**Deutsch**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|
 |**Español**|done|done|done|done|done|done|
@@ -64,16 +65,17 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**हिन्दी**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|not started|not started|
 |**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
-|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/173)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
+|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/207)|not started|not started|done|done|
+|**日本語**|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
 |**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|
 |**Português**|done|done|done|done|done|done|
 |**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|done|done|
 |**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
-|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/162)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|not started|not started|
-|**中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|**Tiếng Việt**|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/199)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
+|**简体中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|**繁體中文**|done|done|done|done|done|done|
 
 ### CS 230 (Deep Learning)
 | |[Convolutional Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-convolutional-neural-networks.md)|[Recurrent Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-recurrent-neural-networks.md)|[Deep Learning tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-deep-learning-tips-and-tricks.md)|
@@ -88,17 +90,18 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**עִבְרִית**|not started|not started|not started|
 |**हिन्दी**|not started|not started|not started|
 |**Magyar**|not started|not started|not started|
-|**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/155)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
+|**Bahasa Indonesia**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
 |**Italiano**|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|done|done|
+|**日本語**|done|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
 |**Polski**|not started|not started|not started|
 |**Português**|done|not started|not started|
 |**Русский**|not started|not started|not started|
 |**Türkçe**|done|done|done|
 |**Українська**|not started|not started|not started|
-|**Tiếng Việt**|not started|not started|not started|
-|**中文**|not started|not started|not started|
+|**Tiếng Việt**|done|done|done|
+|**简体中文**|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/181)|not started|
+|**繁體中文**|done|not started|not started|
 
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From b19feac76e86f18ae6a166fb5fa4915f740133fa Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 21 Apr 2020 23:15:25 -0700
Subject: [PATCH 520/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 96 ++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 85 insertions(+), 11 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index addb9870f..b37e489b5 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -1,14 +1,26 @@
 --ar
   Amjad Khatabi (translation of deep learning)
   Zaid Alyafeai (review of deep learning)
-  
+
   Zaid Alyafeai (translation of linear algebra)
   Amjad Khatabi (review of linear algebra)
   Mazen Melibari (review of linear algebra)
+
+  Fares Al-Qunaieer (translation of machine learning tips and tricks)
+  Zaid Alyafeai (review of machine learning tips and tricks)
   
+  Mahmoud Aslan (translation of probabilities and statistics)
+  Fares Al-Qunaieer (review of probabilities and statistics)
+
+  Fares Al-Qunaieer (translation of supervised learning)
+  Zaid Alyafeai (review of supervised learning)
+
+  Redouane Lguensat (translation of unsupervised learning)
+  Fares Al-Qunaieer (review of unsupervised learning)
+
 --de
 
---es 
+--es
   Erick Gabriel Mendoza Flores (translation of deep learning)
   Fernando Diaz (review of deep learning)
   Fernando González-Herrera (review of deep learning)
@@ -17,12 +29,12 @@
   Alonso Melgar López (review of deep learning)
   Gustavo Velasco-Hernández (review of deep learning)
   Juan Manuel Nava Zamudio (review of deep learning)
-  
+
   Fernando González-Herrera (translation of linear algebra)
   Fernando Diaz (review of linear algebra)
   Gustavo Velasco-Hernández (review of linear algebra)
   Juan P. Chavat (review of linear algebra)
-  
+
   David Jiménez Paredes (translation of machine learning tips and tricks)
   Fernando Diaz (translation of machine learning tips and tricks)
   Gustavo Velasco-Hernández (review of machine learning tips and tricks)
@@ -40,7 +52,7 @@
   Jaime Noel Alvarez Luna (translation of unsupervised learning)
   Alonso Melgar López (review of unsupervised learning)
   Fernando Diaz (review of unsupervised learning)
-  
+
 --fa
   AlisterTA (translation of convolutional neural networks)
   Ehsan Kermani (translation of convolutional neural networks)
@@ -55,7 +67,7 @@
 
   Erfan Noury (translation of linear algebra)
   Mohammad Karimi (review of linear algebra)
-  
+
   AlisterTA (translation of machine learning tips and tricks)
   Mohammad Reza (translation of machine learning tips and tricks)
   Erfan Noury (review of machine learning tips and tricks)
@@ -70,10 +82,10 @@
   Amirhosein Kazemnejad (translation of supervised learning)
   Erfan Noury (review of supervised learning)
   Mohammad Karimi (review of supervised learning)
-  
+
   Erfan Noury (translation of unsupervised learning)
   Mohammad Karimi (review of unsupervised learning)
-  
+
 --fr
   Original authors
 
@@ -81,14 +93,34 @@
 
 --hi
 
+--id
+  Prasetia Utama Putra (translation of convolutional neural networks)
+  Gunawan Tri (review of convolutional neural networks)
+
+--it
+  Alessandro Piotti (translation of linear algebra)
+  Nicola Dall'Asen (review of linear algebra)
+  
+  Nicola Dall'Asen (translation of probabilities and statistics)
+  Alessandro Piotti (review of probabilities and statistics)
+
 --ko
   Wooil Jeong (translation of machine learning tips and tricks)
   
   Wooil Jeong (translation of probabilities and statistics)
   
-  Kwang Hyeok Ahn (translation of Unsupervised Learning)
+  Kwang Hyeok Ahn (translation of unsupervised learning)
 
 --ja
+  Tran Tuan Anh (translation of convolutional neural networks)
+  Yoshiyuki Nakai (review of convolutional neural networks)
+  Linh Dang (review of convolutional neural networks)
+  
+  Taichi Kato (translation of deep learning)
+  Dan Lillrank (review of deep learning)
+  Yoshiyuki Nakai (review of deep learning)
+  Yuki Tokyo (review of deep learning)
+  
   Kamuela Lau (translation of deep learning tips and tricks)
   Yoshiyuki Nakai (review of deep learning tips and tricks)
   Hiroki Mori (review of deep learning tips and tricks)
@@ -104,13 +136,19 @@
   
   Yuta Kanzawa (translation of supervised learning)
   Tran Tuan Anh (review of supervised learning)
-
+  
+  Tran Tuan Anh (translation of unsupervised learning)
+  Yoshiyuki Nakai (review of unsupervised learning)
+  Yuta Kanzawa (review of unsupervised learning)
+  Dan Lillrank (review of unsupervised learning)
+  
 --pt
   Leticia Portella (translation of convolutional neural networks)
   Gabriel Aparecido Fonseca (review of convolutional neural networks)
 
   Gabriel Fonseca (translation of deep learning)
   Leticia Portella (review of deep learning)
+  Renato Kano (review of deep learning)
 
   Gabriel Fonseca (translation of linear algebra)
   Leticia Portella (review of linear algebra)
@@ -125,7 +163,7 @@
   Leticia Portella (translation of supervised learning)
   Gabriel Fonseca (review of supervised learning)
   Flavio Clesio (review of supervised learning)
-  
+
   Gabriel Fonseca (translation of unsupervised learning)
   Tiago Danin (review of unsupervised learning)
 
@@ -174,12 +212,45 @@
   Gregory Reshetniak (translation of probabilities and statistics)
   Denys (review of probabilities and statistics)
   
+--vi
+  Phạm Hồng Vinh (translation of convolutional neural networks)
+  Đàm Minh Tiến (review of convolutional neural networks)
+  
+  Trần Tuấn Anh (translation of deep learning)
+  Phạm Hồng Vinh (review of deep learning)
+  Đàm Minh Tiến (review of deep learning)
+  Nguyễn Khánh Hưng (review of deep learning)
+  Hoàng Vũ Đạt (review of deep learning)
+  Nguyễn Trí Minh (review of deep learning)
+  
+  Hoàng Minh Tuấn (translation of deep learning tips and tricks)
+  Trần Tuấn Anh (review of deep learning tips and tricks)
+  Đàm Minh Tiến (review of deep learning tips and tricks)
+
+  Trần Tuấn Anh (translation of machine learning tips and tricks)
+  Nguyễn Trí Minh (review of machine learning tips and tricks)
+  Vinh Pham (review of machine learning tips and tricks)
+  Đàm Minh Tiến (review of machine learning tips and tricks)
+
+  Trần Tuấn Anh (translation of recurrent neural networks)
+  Đàm Minh Tiến (review of recurrent neural networks)
+  Hung Nguyễn (review of recurrent neural networks)
+  Nguyễn Trí Minh (review of recurrent neural networks)
+
+  Trần Tuấn Anh (translation of supervised learning)
+  Đàm Minh Tiến (review of supervised learning)
+  Hung Nguyễn (review of supervised learning)
+  Nguyễn Trí Minh (review of supervised learning)
+  
 --zh
   Wang Hongnian (translation of supervised learning)
   Xiaohu Zhu (朱小虎) (review of supervised learning)
   Chaoying Xue (review of supervised learning)
 
 --zh-tw
+  kentropy (translation of convolutional neural networks)
+  kevingo (review of convolutional neural networks)
+
   kevingo (translation of deep learning)
   TobyOoO (review of deep learning)
 
@@ -195,3 +266,6 @@
   kevingo (translation of unsupervised learning)
   imironhead (review of unsupervised learning)
   johnnychhsu (review of unsupervised learning)
+
+  kevingo (translation of machine learning tips and tricks)
+  kentropy (review of machine learning tips and tricks)

From 62364e9a621f2d16bb55ee8ba1fc2fc5069548cd Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 21 Apr 2020 23:22:51 -0700
Subject: [PATCH 521/531] Minor fix

---
 vi/cs-221-logic-models.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/vi/cs-221-logic-models.md b/vi/cs-221-logic-models.md
index 010ad09f6..94057d8d2 100644
--- a/vi/cs-221-logic-models.md
+++ b/vi/cs-221-logic-models.md
@@ -459,4 +459,4 @@
 
 **66. The Artificial Intelligence cheatsheets are now available in [target language].**
 
-&#10230; Trí tuệ nhân tạo cheatsheats hiện đã có với ngôn ngữ [Tiếng Việt]
+&#10230; Trí tuệ nhân tạo cheatsheets hiện đã có với ngôn ngữ [Tiếng Việt].

From 8413fa68ebac4786bf37e361fbfd9a734fe63ee7 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 21 Apr 2020 23:24:17 -0700
Subject: [PATCH 522/531] Update [vi] progress

---
 README.md | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/README.md b/README.md
index 677c233aa..1b33113be 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |:---|:---:|:---:|:---:|:---:|
 |**Deutsch**|not started|not started|not started|not started|
 |**Español**|not started|not started|not started|not started|
-|**فارسی**|not started|not started|not started|not started|
+|**فارسی**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/200)|not started|not started|not started|
 |**Français**|done|done|done|done|
 |**עִבְרִית**|not started|not started|not started|not started|
 |**Italiano**|not started|not started|not started|not started|
@@ -47,13 +47,14 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**한국어**|not started|not started|not started|not started|
 |**Português**|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|
-|**Tiếng Việt**|not started|not started|not started|not started|
-|**中文**|not started|not started|not started|not started|
+|**Tiếng Việt**|not started|not started|not started|done|
+|**简体中文**|not started|not started|not started|not started|
+|**繁體中文**|not started|not started|not started|not started|
 
 ### CS 229 (Machine Learning)
 | |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|**العَرَبِيَّة**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|
+|**العَرَبِيَّة**|done|done|done|done|done|done|
 |**Català**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|
 |**Deutsch**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|
 |**Español**|done|done|done|done|done|done|
@@ -64,16 +65,17 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**हिन्दी**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|not started|not started|
 |**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
-|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/173)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
+|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/207)|not started|not started|done|done|
+|**日本語**|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
 |**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|
 |**Português**|done|done|done|done|done|done|
 |**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|done|done|
 |**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
-|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/162)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|not started|not started|
-|**中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|**Tiếng Việt**|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/199)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
+|**简体中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|**繁體中文**|done|done|done|done|done|done|
 
 ### CS 230 (Deep Learning)
 | |[Convolutional Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-convolutional-neural-networks.md)|[Recurrent Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-recurrent-neural-networks.md)|[Deep Learning tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-deep-learning-tips-and-tricks.md)|
@@ -88,17 +90,18 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**עִבְרִית**|not started|not started|not started|
 |**हिन्दी**|not started|not started|not started|
 |**Magyar**|not started|not started|not started|
-|**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/155)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
+|**Bahasa Indonesia**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
 |**Italiano**|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|done|done|
+|**日本語**|done|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
 |**Polski**|not started|not started|not started|
 |**Português**|done|not started|not started|
 |**Русский**|not started|not started|not started|
 |**Türkçe**|done|done|done|
 |**Українська**|not started|not started|not started|
-|**Tiếng Việt**|not started|not started|not started|
-|**中文**|not started|not started|not started|
+|**Tiếng Việt**|done|done|done|
+|**简体中文**|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/181)|not started|
+|**繁體中文**|done|not started|not started|
 
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From 8abb81d2c1139efac3969f69bb826e7def7a341b Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 21 Apr 2020 23:27:00 -0700
Subject: [PATCH 523/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 99 ++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 88 insertions(+), 11 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index addb9870f..0f624c356 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -1,14 +1,26 @@
 --ar
   Amjad Khatabi (translation of deep learning)
   Zaid Alyafeai (review of deep learning)
-  
+
   Zaid Alyafeai (translation of linear algebra)
   Amjad Khatabi (review of linear algebra)
   Mazen Melibari (review of linear algebra)
+
+  Fares Al-Qunaieer (translation of machine learning tips and tricks)
+  Zaid Alyafeai (review of machine learning tips and tricks)
   
+  Mahmoud Aslan (translation of probabilities and statistics)
+  Fares Al-Qunaieer (review of probabilities and statistics)
+
+  Fares Al-Qunaieer (translation of supervised learning)
+  Zaid Alyafeai (review of supervised learning)
+
+  Redouane Lguensat (translation of unsupervised learning)
+  Fares Al-Qunaieer (review of unsupervised learning)
+
 --de
 
---es 
+--es
   Erick Gabriel Mendoza Flores (translation of deep learning)
   Fernando Diaz (review of deep learning)
   Fernando González-Herrera (review of deep learning)
@@ -17,12 +29,12 @@
   Alonso Melgar López (review of deep learning)
   Gustavo Velasco-Hernández (review of deep learning)
   Juan Manuel Nava Zamudio (review of deep learning)
-  
+
   Fernando González-Herrera (translation of linear algebra)
   Fernando Diaz (review of linear algebra)
   Gustavo Velasco-Hernández (review of linear algebra)
   Juan P. Chavat (review of linear algebra)
-  
+
   David Jiménez Paredes (translation of machine learning tips and tricks)
   Fernando Diaz (translation of machine learning tips and tricks)
   Gustavo Velasco-Hernández (review of machine learning tips and tricks)
@@ -40,7 +52,7 @@
   Jaime Noel Alvarez Luna (translation of unsupervised learning)
   Alonso Melgar López (review of unsupervised learning)
   Fernando Diaz (review of unsupervised learning)
-  
+
 --fa
   AlisterTA (translation of convolutional neural networks)
   Ehsan Kermani (translation of convolutional neural networks)
@@ -55,7 +67,7 @@
 
   Erfan Noury (translation of linear algebra)
   Mohammad Karimi (review of linear algebra)
-  
+
   AlisterTA (translation of machine learning tips and tricks)
   Mohammad Reza (translation of machine learning tips and tricks)
   Erfan Noury (review of machine learning tips and tricks)
@@ -70,10 +82,10 @@
   Amirhosein Kazemnejad (translation of supervised learning)
   Erfan Noury (review of supervised learning)
   Mohammad Karimi (review of supervised learning)
-  
+
   Erfan Noury (translation of unsupervised learning)
   Mohammad Karimi (review of unsupervised learning)
-  
+
 --fr
   Original authors
 
@@ -81,14 +93,34 @@
 
 --hi
 
+--id
+  Prasetia Utama Putra (translation of convolutional neural networks)
+  Gunawan Tri (review of convolutional neural networks)
+
+--it
+  Alessandro Piotti (translation of linear algebra)
+  Nicola Dall'Asen (review of linear algebra)
+  
+  Nicola Dall'Asen (translation of probabilities and statistics)
+  Alessandro Piotti (review of probabilities and statistics)
+
 --ko
   Wooil Jeong (translation of machine learning tips and tricks)
   
   Wooil Jeong (translation of probabilities and statistics)
   
-  Kwang Hyeok Ahn (translation of Unsupervised Learning)
+  Kwang Hyeok Ahn (translation of unsupervised learning)
 
 --ja
+  Tran Tuan Anh (translation of convolutional neural networks)
+  Yoshiyuki Nakai (review of convolutional neural networks)
+  Linh Dang (review of convolutional neural networks)
+  
+  Taichi Kato (translation of deep learning)
+  Dan Lillrank (review of deep learning)
+  Yoshiyuki Nakai (review of deep learning)
+  Yuki Tokyo (review of deep learning)
+  
   Kamuela Lau (translation of deep learning tips and tricks)
   Yoshiyuki Nakai (review of deep learning tips and tricks)
   Hiroki Mori (review of deep learning tips and tricks)
@@ -104,13 +136,19 @@
   
   Yuta Kanzawa (translation of supervised learning)
   Tran Tuan Anh (review of supervised learning)
-
+  
+  Tran Tuan Anh (translation of unsupervised learning)
+  Yoshiyuki Nakai (review of unsupervised learning)
+  Yuta Kanzawa (review of unsupervised learning)
+  Dan Lillrank (review of unsupervised learning)
+  
 --pt
   Leticia Portella (translation of convolutional neural networks)
   Gabriel Aparecido Fonseca (review of convolutional neural networks)
 
   Gabriel Fonseca (translation of deep learning)
   Leticia Portella (review of deep learning)
+  Renato Kano (review of deep learning)
 
   Gabriel Fonseca (translation of linear algebra)
   Leticia Portella (review of linear algebra)
@@ -125,7 +163,7 @@
   Leticia Portella (translation of supervised learning)
   Gabriel Fonseca (review of supervised learning)
   Flavio Clesio (review of supervised learning)
-  
+
   Gabriel Fonseca (translation of unsupervised learning)
   Tiago Danin (review of unsupervised learning)
 
@@ -174,12 +212,48 @@
   Gregory Reshetniak (translation of probabilities and statistics)
   Denys (review of probabilities and statistics)
   
+--vi
+  Phạm Hồng Vinh (translation of convolutional neural networks)
+  Đàm Minh Tiến (review of convolutional neural networks)
+  
+  Trần Tuấn Anh (translation of deep learning)
+  Phạm Hồng Vinh (review of deep learning)
+  Đàm Minh Tiến (review of deep learning)
+  Nguyễn Khánh Hưng (review of deep learning)
+  Hoàng Vũ Đạt (review of deep learning)
+  Nguyễn Trí Minh (review of deep learning)
+  
+  Hoàng Minh Tuấn (translation of deep learning tips and tricks)
+  Trần Tuấn Anh (review of deep learning tips and tricks)
+  Đàm Minh Tiến (review of deep learning tips and tricks)
+  
+  Hoàng Minh Tuấn (translation of logic-based models)
+  Đàm Minh Tiến (review of logic-based models)
+
+  Trần Tuấn Anh (translation of machine learning tips and tricks)
+  Nguyễn Trí Minh (review of machine learning tips and tricks)
+  Vinh Pham (review of machine learning tips and tricks)
+  Đàm Minh Tiến (review of machine learning tips and tricks)
+
+  Trần Tuấn Anh (translation of recurrent neural networks)
+  Đàm Minh Tiến (review of recurrent neural networks)
+  Hung Nguyễn (review of recurrent neural networks)
+  Nguyễn Trí Minh (review of recurrent neural networks)
+
+  Trần Tuấn Anh (translation of supervised learning)
+  Đàm Minh Tiến (review of supervised learning)
+  Hung Nguyễn (review of supervised learning)
+  Nguyễn Trí Minh (review of supervised learning)
+  
 --zh
   Wang Hongnian (translation of supervised learning)
   Xiaohu Zhu (朱小虎) (review of supervised learning)
   Chaoying Xue (review of supervised learning)
 
 --zh-tw
+  kentropy (translation of convolutional neural networks)
+  kevingo (review of convolutional neural networks)
+
   kevingo (translation of deep learning)
   TobyOoO (review of deep learning)
 
@@ -195,3 +269,6 @@
   kevingo (translation of unsupervised learning)
   imironhead (review of unsupervised learning)
   johnnychhsu (review of unsupervised learning)
+
+  kevingo (translation of machine learning tips and tricks)
+  kentropy (review of machine learning tips and tricks)

From 2374559d5c334dd90f6ecf015e0a4b92561de615 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 21 Apr 2020 23:45:29 -0700
Subject: [PATCH 524/531] Update [vi] progress

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 1b33113be..0ed768c82 100644
--- a/README.md
+++ b/README.md
@@ -73,7 +73,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|done|done|
 |**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
-|**Tiếng Việt**|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/199)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
+|**Tiếng Việt**|done|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/175)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
 |**简体中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
 |**繁體中文**|done|done|done|done|done|done|
 

From 9507da69b707389ecafab4111c3c7ceead786a50 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Tue, 21 Apr 2020 23:47:11 -0700
Subject: [PATCH 525/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index 0f624c356..631732229 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -245,6 +245,9 @@
   Hung Nguyễn (review of supervised learning)
   Nguyễn Trí Minh (review of supervised learning)
   
+  Trần Tuấn Anh (translation of unsupervised learning)
+  Đàm Minh Tiến (review of unsupervised learning)
+  
 --zh
   Wang Hongnian (translation of supervised learning)
   Xiaohu Zhu (朱小虎) (review of supervised learning)

From 8c6831958d735f7b8f188cfa1fc45e53cd9efc42 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Wed, 22 Apr 2020 22:33:47 -0700
Subject: [PATCH 526/531] Minor fixes

---
 vi/cs-229-probability.md | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/vi/cs-229-probability.md b/vi/cs-229-probability.md
index 82d784be6..4cedb5ed5 100644
--- a/vi/cs-229-probability.md
+++ b/vi/cs-229-probability.md
@@ -34,13 +34,13 @@
 
 **6. Axiom 1 ― Every probability is between 0 and 1 included, i.e:**
 
-&#10230; Tiên đề 1 - Mọi xác suất bất kì đều nằm trong khoảng 0 đến 1.
+&#10230; Tiên đề 1 - Mọi xác suất bất kì đều nằm trong khoảng 0 đến 1:
 
 <br>
 
 **7. Axiom 2 ― The probability that at least one of the elementary events in the entire sample space will occur is 1, i.e:**
 
-&#10230; Tiên đề 2 - Xác suất xảy ra của ít nhất một phần tử trong toàn bộ không gian mẫu là 1. 
+&#10230; Tiên đề 2 - Xác suất xảy ra của ít nhất một phần tử trong toàn bộ không gian mẫu là 1:
 
 <br>
 
@@ -124,7 +124,7 @@
 
 **21. Random variable ― A random variable, often noted X, is a function that maps every element in a sample space to a real line.**
 
-&#10230; Biến ngẫu nhiên - Một biến ngẫu nhiên, thường được kí hiệu là X, là một hàm nối mỗi phần tử trong một không gian mẫu thành một số thực
+&#10230; Biến ngẫu nhiên - Một biến ngẫu nhiên, thường được kí hiệu là X, là một hàm nối mỗi phần tử trong một không gian mẫu thành một số thực.
 
 <br>
 
@@ -166,7 +166,7 @@
 
 **28. Variance ― The variance of a random variable, often noted Var(X) or σ2, is a measure of the spread of its distribution function. It is determined as follows:**
 
-&#10230; Phương sai - Phương sai của một biến ngẫu nhiên, thường được kí hiệu là Var (X) hoặc σ2, là một độ đo mức độ phân tán của hàm phân phối. Nó được xác định như sau:
+&#10230; Phương sai - Phương sai của một biến ngẫu nhiên, thường được kí hiệu là Var(X) hoặc σ2, là một độ đo mức độ phân tán của hàm phân phối. Nó được xác định như sau:
 
 <br>
 
@@ -184,7 +184,7 @@
 
 **31. Leibniz integral rule ― Let g be a function of x and potentially c, and a,b boundaries that may depend on c. We have:**
 
-&#10230; Quy tắc tích phân Leibniz - Gọi g là hàm của x và có khả năng c, và a, b là các ranh giới có thể phụ thuộc vào c. Chúng ta có:
+&#10230; Quy tắc tích phân Leibniz - Gọi g là hàm của x và có khả năng c, và a,b là các ranh giới có thể phụ thuộc vào c. Chúng ta có:
 
 <br>
 
@@ -286,19 +286,19 @@
 
 **48. Estimator ― An estimator is a function of the data that is used to infer the value of an unknown parameter in a statistical model.**
 
-&#10230; Công cụ ước tính (Estimator) - Công cụ ước tính (Estimator) là một hàm của dữ liệu được sử dụng để suy ra giá trị của một tham số chưa biết trong mô hình thống kê.
+&#10230; Công cụ ước tính (estimator) - Công cụ ước tính (estimator) là một hàm của dữ liệu được sử dụng để suy ra giá trị của một tham số chưa biết trong mô hình thống kê.
 
 <br>
 
 **49. Bias ― The bias of an estimator ^θ is defined as being the difference between the expected value of the distribution of ^θ and the true value, i.e.:**
 
-&#10230; Thiên vị (Bias) - Thiên vị (Bias) của Estimator ^θ được định nghĩa là chênh lệch giữa giá trị kì vọng ​​của phân phối ^θ và giá trị thực, tức là
+&#10230; Thiên vị (bias) - Thiên vị (bias) của Estimator ^θ được định nghĩa là chênh lệch giữa giá trị kì vọng ​​của phân phối ^θ và giá trị thực, tức là
 
 <br>
 
 **50. Remark: an estimator is said to be unbiased when we have E[^θ]=θ.**
 
-&#10230; Ghi chú: một công cụ ước tính (estimator) được cho là không thiên vị (unbias) khi chúng ta có E[^θ]=θ.
+&#10230; Ghi chú: một công cụ ước tính (estimator) được cho là không thiên vị (unbiased) khi chúng ta có E[^θ]=θ.
 
 <br>
 
@@ -316,7 +316,7 @@
 
 **53. Remark: the sample mean is unbiased, i.e E[¯¯¯¯¯X]=μ.**
 
-&#10230; Ghi chú: Trung bình mẫu là không thiên vị (unbias), nghĩa là E[¯¯¯¯¯X]=μ.
+&#10230; Ghi chú: Trung bình mẫu là không thiên vị (unbiased), nghĩa là E[¯¯¯¯¯X]=μ.
 
 <br>
 
@@ -340,7 +340,7 @@
 
 **57. Remark: the sample variance is unbiased, i.e E[s2]=σ2.**
 
-&#10230; Ghi chú: phương sai mẫu không thiên vị (unbias), nghĩa là E[s2]=σ2.
+&#10230; Ghi chú: phương sai mẫu không thiên vị (unbiased), nghĩa là E[s2]=σ2.
 
 <br>
 

From 6c6a5f7b0e1042f5531e5e377b1f470c60784fbb Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Wed, 22 Apr 2020 22:36:23 -0700
Subject: [PATCH 527/531] Update [vi] progress

---
 README.md | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/README.md b/README.md
index 677c233aa..16086f631 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |:---|:---:|:---:|:---:|:---:|
 |**Deutsch**|not started|not started|not started|not started|
 |**Español**|not started|not started|not started|not started|
-|**فارسی**|not started|not started|not started|not started|
+|**فارسی**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/200)|not started|not started|not started|
 |**Français**|done|done|done|done|
 |**עִבְרִית**|not started|not started|not started|not started|
 |**Italiano**|not started|not started|not started|not started|
@@ -47,13 +47,14 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**한국어**|not started|not started|not started|not started|
 |**Português**|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|
-|**Tiếng Việt**|not started|not started|not started|not started|
-|**中文**|not started|not started|not started|not started|
+|**Tiếng Việt**|not started|not started|not started|done|
+|**简体中文**|not started|not started|not started|not started|
+|**繁體中文**|not started|not started|not started|not started|
 
 ### CS 229 (Machine Learning)
 | |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|**العَرَبِيَّة**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|
+|**العَرَبِيَّة**|done|done|done|done|done|done|
 |**Català**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|
 |**Deutsch**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|
 |**Español**|done|done|done|done|done|done|
@@ -64,16 +65,17 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**हिन्दी**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|not started|not started|
 |**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
-|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/173)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
+|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/207)|not started|not started|done|done|
+|**日本語**|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
 |**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|
 |**Português**|done|done|done|done|done|done|
 |**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|done|done|
 |**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
-|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/162)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|not started|not started|
-|**中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|**Tiếng Việt**|done|done|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/176)|
+|**简体中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|**繁體中文**|done|done|done|done|done|done|
 
 ### CS 230 (Deep Learning)
 | |[Convolutional Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-convolutional-neural-networks.md)|[Recurrent Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-recurrent-neural-networks.md)|[Deep Learning tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-deep-learning-tips-and-tricks.md)|
@@ -88,17 +90,18 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**עִבְרִית**|not started|not started|not started|
 |**हिन्दी**|not started|not started|not started|
 |**Magyar**|not started|not started|not started|
-|**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/155)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
+|**Bahasa Indonesia**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
 |**Italiano**|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|done|done|
+|**日本語**|done|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
 |**Polski**|not started|not started|not started|
 |**Português**|done|not started|not started|
 |**Русский**|not started|not started|not started|
 |**Türkçe**|done|done|done|
 |**Українська**|not started|not started|not started|
-|**Tiếng Việt**|not started|not started|not started|
-|**中文**|not started|not started|not started|
+|**Tiếng Việt**|done|done|done|
+|**简体中文**|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/181)|not started|
+|**繁體中文**|done|not started|not started|
 
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From 880d8b9cc8ed2b980b3543fe13ebb70e9873d0e2 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Wed, 22 Apr 2020 22:38:00 -0700
Subject: [PATCH 528/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 105 +++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 94 insertions(+), 11 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index addb9870f..86fdb26d1 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -1,14 +1,26 @@
 --ar
   Amjad Khatabi (translation of deep learning)
   Zaid Alyafeai (review of deep learning)
-  
+
   Zaid Alyafeai (translation of linear algebra)
   Amjad Khatabi (review of linear algebra)
   Mazen Melibari (review of linear algebra)
+
+  Fares Al-Qunaieer (translation of machine learning tips and tricks)
+  Zaid Alyafeai (review of machine learning tips and tricks)
   
+  Mahmoud Aslan (translation of probabilities and statistics)
+  Fares Al-Qunaieer (review of probabilities and statistics)
+
+  Fares Al-Qunaieer (translation of supervised learning)
+  Zaid Alyafeai (review of supervised learning)
+
+  Redouane Lguensat (translation of unsupervised learning)
+  Fares Al-Qunaieer (review of unsupervised learning)
+
 --de
 
---es 
+--es
   Erick Gabriel Mendoza Flores (translation of deep learning)
   Fernando Diaz (review of deep learning)
   Fernando González-Herrera (review of deep learning)
@@ -17,12 +29,12 @@
   Alonso Melgar López (review of deep learning)
   Gustavo Velasco-Hernández (review of deep learning)
   Juan Manuel Nava Zamudio (review of deep learning)
-  
+
   Fernando González-Herrera (translation of linear algebra)
   Fernando Diaz (review of linear algebra)
   Gustavo Velasco-Hernández (review of linear algebra)
   Juan P. Chavat (review of linear algebra)
-  
+
   David Jiménez Paredes (translation of machine learning tips and tricks)
   Fernando Diaz (translation of machine learning tips and tricks)
   Gustavo Velasco-Hernández (review of machine learning tips and tricks)
@@ -40,7 +52,7 @@
   Jaime Noel Alvarez Luna (translation of unsupervised learning)
   Alonso Melgar López (review of unsupervised learning)
   Fernando Diaz (review of unsupervised learning)
-  
+
 --fa
   AlisterTA (translation of convolutional neural networks)
   Ehsan Kermani (translation of convolutional neural networks)
@@ -55,7 +67,7 @@
 
   Erfan Noury (translation of linear algebra)
   Mohammad Karimi (review of linear algebra)
-  
+
   AlisterTA (translation of machine learning tips and tricks)
   Mohammad Reza (translation of machine learning tips and tricks)
   Erfan Noury (review of machine learning tips and tricks)
@@ -70,10 +82,10 @@
   Amirhosein Kazemnejad (translation of supervised learning)
   Erfan Noury (review of supervised learning)
   Mohammad Karimi (review of supervised learning)
-  
+
   Erfan Noury (translation of unsupervised learning)
   Mohammad Karimi (review of unsupervised learning)
-  
+
 --fr
   Original authors
 
@@ -81,14 +93,34 @@
 
 --hi
 
+--id
+  Prasetia Utama Putra (translation of convolutional neural networks)
+  Gunawan Tri (review of convolutional neural networks)
+
+--it
+  Alessandro Piotti (translation of linear algebra)
+  Nicola Dall'Asen (review of linear algebra)
+  
+  Nicola Dall'Asen (translation of probabilities and statistics)
+  Alessandro Piotti (review of probabilities and statistics)
+
 --ko
   Wooil Jeong (translation of machine learning tips and tricks)
   
   Wooil Jeong (translation of probabilities and statistics)
   
-  Kwang Hyeok Ahn (translation of Unsupervised Learning)
+  Kwang Hyeok Ahn (translation of unsupervised learning)
 
 --ja
+  Tran Tuan Anh (translation of convolutional neural networks)
+  Yoshiyuki Nakai (review of convolutional neural networks)
+  Linh Dang (review of convolutional neural networks)
+  
+  Taichi Kato (translation of deep learning)
+  Dan Lillrank (review of deep learning)
+  Yoshiyuki Nakai (review of deep learning)
+  Yuki Tokyo (review of deep learning)
+  
   Kamuela Lau (translation of deep learning tips and tricks)
   Yoshiyuki Nakai (review of deep learning tips and tricks)
   Hiroki Mori (review of deep learning tips and tricks)
@@ -104,13 +136,19 @@
   
   Yuta Kanzawa (translation of supervised learning)
   Tran Tuan Anh (review of supervised learning)
-
+  
+  Tran Tuan Anh (translation of unsupervised learning)
+  Yoshiyuki Nakai (review of unsupervised learning)
+  Yuta Kanzawa (review of unsupervised learning)
+  Dan Lillrank (review of unsupervised learning)
+  
 --pt
   Leticia Portella (translation of convolutional neural networks)
   Gabriel Aparecido Fonseca (review of convolutional neural networks)
 
   Gabriel Fonseca (translation of deep learning)
   Leticia Portella (review of deep learning)
+  Renato Kano (review of deep learning)
 
   Gabriel Fonseca (translation of linear algebra)
   Leticia Portella (review of linear algebra)
@@ -125,7 +163,7 @@
   Leticia Portella (translation of supervised learning)
   Gabriel Fonseca (review of supervised learning)
   Flavio Clesio (review of supervised learning)
-  
+
   Gabriel Fonseca (translation of unsupervised learning)
   Tiago Danin (review of unsupervised learning)
 
@@ -174,12 +212,54 @@
   Gregory Reshetniak (translation of probabilities and statistics)
   Denys (review of probabilities and statistics)
   
+--vi
+  Phạm Hồng Vinh (translation of convolutional neural networks)
+  Đàm Minh Tiến (review of convolutional neural networks)
+  
+  Trần Tuấn Anh (translation of deep learning)
+  Phạm Hồng Vinh (review of deep learning)
+  Đàm Minh Tiến (review of deep learning)
+  Nguyễn Khánh Hưng (review of deep learning)
+  Hoàng Vũ Đạt (review of deep learning)
+  Nguyễn Trí Minh (review of deep learning)
+  
+  Hoàng Minh Tuấn (translation of deep learning tips and tricks)
+  Trần Tuấn Anh (review of deep learning tips and tricks)
+  Đàm Minh Tiến (review of deep learning tips and tricks)
+  
+  Hoàng Minh Tuấn (translation of logic-based models)
+  Đàm Minh Tiến (review of logic-based models)
+
+  Trần Tuấn Anh (translation of machine learning tips and tricks)
+  Nguyễn Trí Minh (review of machine learning tips and tricks)
+  Vinh Pham (review of machine learning tips and tricks)
+  Đàm Minh Tiến (review of machine learning tips and tricks)
+  
+  Hoàng Minh Tuấn (translation of probabilities and statistics)
+  Hung Nguyễn (review of probabilities and statistics)
+
+  Trần Tuấn Anh (translation of recurrent neural networks)
+  Đàm Minh Tiến (review of recurrent neural networks)
+  Hung Nguyễn (review of recurrent neural networks)
+  Nguyễn Trí Minh (review of recurrent neural networks)
+
+  Trần Tuấn Anh (translation of supervised learning)
+  Đàm Minh Tiến (review of supervised learning)
+  Hung Nguyễn (review of supervised learning)
+  Nguyễn Trí Minh (review of supervised learning)
+  
+  Trần Tuấn Anh (translation of unsupervised learning)
+  Đàm Minh Tiến (review of unsupervised learning)
+  
 --zh
   Wang Hongnian (translation of supervised learning)
   Xiaohu Zhu (朱小虎) (review of supervised learning)
   Chaoying Xue (review of supervised learning)
 
 --zh-tw
+  kentropy (translation of convolutional neural networks)
+  kevingo (review of convolutional neural networks)
+
   kevingo (translation of deep learning)
   TobyOoO (review of deep learning)
 
@@ -195,3 +275,6 @@
   kevingo (translation of unsupervised learning)
   imironhead (review of unsupervised learning)
   johnnychhsu (review of unsupervised learning)
+
+  kevingo (translation of machine learning tips and tricks)
+  kentropy (review of machine learning tips and tricks)

From b0342f061664f05d266f3788f02ed2b5c91efc98 Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Wed, 22 Apr 2020 22:57:41 -0700
Subject: [PATCH 529/531] Minor fixes

---
 vi/cs-229-linear-algebra.md | 34 +++++++++++++++++-----------------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/vi/cs-229-linear-algebra.md b/vi/cs-229-linear-algebra.md
index cdc97c845..53d7f2ff4 100644
--- a/vi/cs-229-linear-algebra.md
+++ b/vi/cs-229-linear-algebra.md
@@ -22,19 +22,19 @@
 
 **4. Vector ― We note x∈Rn a vector with n entries, where xi∈R is the ith entry:**
 
-&#10230; Vector - Chúng ta kí hiệu x∈Rn là một vector với n phần tử, với xi∈R là phần tử thứ i:
+&#10230; Vectơ ― Chúng ta kí hiệu x∈Rn là một vectơ với n phần tử, với xi∈R là phần tử thứ i:
 
 <br>
 
 **5. Matrix ― We note A∈Rm×n a matrix with m rows and n columns, where Ai,j∈R is the entry located in the ith row and jth column:**
 
-&#10230; Ma trận - Kí hiệu A∈Rm×n là một ma trận với m hàng và n cột, Ai,j∈R là phần tử nằm ở hàng thứ i, cột j:
+&#10230; Ma trận ― Kí hiệu A∈Rm×n là một ma trận với m hàng và n cột, Ai,j∈R là phần tử nằm ở hàng thứ i, cột j:
 
 <br>
 
 **6. Remark: the vector x defined above can be viewed as a n×1 matrix and is more particularly called a column-vector.**
 
-&#10230; Ghi chú: Vector x được xác định ở trên có thể coi như một ma trận nx1 và được gọi là vector cột.
+&#10230; Ghi chú: vectơ x được xác định ở trên có thể coi như một ma trận nx1 và được gọi là vectơ cột.
 
 <br>
 
@@ -46,7 +46,7 @@
 
 **8. Identity matrix ― The identity matrix I∈Rn×n is a square matrix with ones in its diagonal and zero everywhere else:**
 
-&#10230; Ma trận đơn vị - Ma trận đơn vị I∈Rn×n là một ma trận vuông với các phần tử trên đường chéo chính bằng 1 và các phần tử còn lại bằng 0:
+&#10230; Ma trận đơn vị ― Ma trận đơn vị I∈Rn×n là một ma trận vuông với các phần tử trên đường chéo chính bằng 1 và các phần tử còn lại bằng 0:
 
 <br>
 
@@ -58,13 +58,13 @@
 
 **10. Diagonal matrix ― A diagonal matrix D∈Rn×n is a square matrix with nonzero values in its diagonal and zero everywhere else:**
 
-&#10230; Ma trận đường chéo - Ma trận đường chéo D∈Rn×n là một ma trận vuông với các phần tử trên đường chéo chính khác 0 và các phần tử còn lại bằng 0:
+&#10230; Ma trận đường chéo ― Ma trận đường chéo D∈Rn×n là một ma trận vuông với các phần tử trên đường chéo chính khác 0 và các phần tử còn lại bằng 0:
 
 <br>
 
 **11. Remark: we also note D as diag(d1,...,dn).**
 
-&#10230; Ghi chú: Chúng ta kí hiệu D là diag(d1,...,dn).
+&#10230; Ghi chú: chúng ta kí hiệu D là diag(d1,...,dn).
 
 <br>
 
@@ -82,43 +82,43 @@
 
 **14. Vector-vector ― There are two types of vector-vector products:**
 
-&#10230; Vector-vector ― Có hai loại phép nhân vector-vector:
+&#10230; Vectơ/vectơ ― Có hai loại phép nhân vectơ/vectơ:
 
 <br>
 
 **15. inner product: for x,y∈Rn, we have:**
 
-&#10230; Phép nhân inner: với x,y∈Rn, ta có:
+&#10230; phép nhân inner: với x,y∈Rn, ta có:
 
 <br>
 
 **16. outer product: for x∈Rm,y∈Rn, we have:**
 
-&#10230; Phép nhân outer: với x∈Rm,y∈Rn, ta có:
+&#10230; phép nhân outer: với x∈Rm,y∈Rn, ta có:
 
 <br>
 
 **17. Matrix-vector ― The product of matrix A∈Rm×n and vector x∈Rn is a vector of size Rn, such that:**
 
-&#10230; Ma trận - Vector ― Phép nhân giữa ma trận A∈Rm×n và vector x∈Rn là một vector có kích thước Rn:
+&#10230; Ma trận/vectơ ― Phép nhân giữa ma trận A∈Rm×n và vectơ x∈Rn là một vectơ có kích thước Rn:
 
 <br>
 
 **18. where aTr,i are the vector rows and ac,j are the vector columns of A, and xi are the entries of x.**
 
-&#10230; với aTr,i là các vector hàng và ac,j là các vector cột của A, và xi là các phần tử của x.
+&#10230; với aTr,i là các vectơ hàng và ac,j là các vectơ cột của A, và xi là các phần tử của x.
 
 <br>
 
 **19. Matrix-matrix ― The product of matrices A∈Rm×n and B∈Rn×p is a matrix of size Rn×p, such that:**
 
-&#10230; Ma trận - ma trận ― Phép nhân giữa ma trận A∈Rm×n và B∈Rn×p là một ma trận kích thước Rn×p:
+&#10230; Ma trận/ma trận ― Phép nhân giữa ma trận A∈Rm×n và B∈Rn×p là một ma trận kích thước Rn×p:
 
 <br>
 
 **20. where aTr,i,bTr,i are the vector rows and ac,j,bc,j are the vector columns of A and B respectively**
 
-&#10230; với aTr,i,bTr,i là các vector hàng và ac,j,bc,j lần lượt là các vector cột của A and B.
+&#10230; với aTr,i,bTr,i là các vectơ hàng và ac,j,bc,j lần lượt là các vectơ cột của A và B.
 
 <br>
 
@@ -202,7 +202,7 @@
 
 **34. Norm ― A norm is a function N:V⟶[0,+∞[ where V is a vector space, and such that for all x,y∈V, we have:**
 
-&#10230; Chuẩn (norm) ― Một chuẩn (norm) là một hàm N:V⟶[0,+∞[ mà V là một không gian vector, và với mọi x,y∈V, ta có:
+&#10230; Chuẩn (norm) ― Một chuẩn (norm) là một hàm N:V⟶[0,+∞[ mà V là một không gian vectơ, và với mọi x,y∈V, ta có:
 
 <br>
 
@@ -262,7 +262,7 @@
 
 **44. Eigenvalue, eigenvector ― Given a matrix A∈Rn×n, λ is said to be an eigenvalue of A if there exists a vector z∈Rn∖{0}, called eigenvector, such that we have:**
 
-&#10230; Giá trị riêng, vector riêng - Cho ma trận A∈Rn×n, λ được gọi là giá trị riêng của A nếu tồn tại một vectơ z∈Rn∖{0}, được gọi là vector riêng, sao cho:
+&#10230; Giá trị riêng, vectơ riêng - Cho ma trận A∈Rn×n, λ được gọi là giá trị riêng của A nếu tồn tại một vectơ z∈Rn∖{0}, được gọi là vectơ riêng, sao cho:
 
 <br>
 
@@ -306,7 +306,7 @@
 
 **51. Hessian ― Let f:Rn→R be a function and x∈Rn be a vector. The hessian of f with respect to x is a n×n symmetric matrix, noted ∇2xf(x), such that:**
 
-&#10230; Hessian - Cho f:Rn→R là một hàm và x∈Rn là một vector. Hessian của f đối với x là một ma trận đối xứng n×n, ghi chú ∇2xf(x), sao cho:
+&#10230; Hessian ― Cho f:Rn→R là một hàm và x∈Rn là một vectơ. Hessian của f đối với x là một ma trận đối xứng n×n, ghi chú ∇2xf(x), sao cho:
 
 <br>
 
@@ -336,7 +336,7 @@
 
 **56. [Matrix properties, Norm, Eigenvalue/Eigenvector, Singular-value decomposition]**
 
-&#10230; [Các thuộc tính ma trận, Chuẩn, Giá trị riêng/Vector riêng, Phân tích giá trị suy biến]
+&#10230; [Các thuộc tính ma trận, Chuẩn, Giá trị riêng/Vectơ riêng, Phân tích giá trị suy biến]
 
 <br>
 

From cb21fc1fb36849617382cf6aa86570764d6f0b0f Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Wed, 22 Apr 2020 22:58:34 -0700
Subject: [PATCH 530/531] Update [vi] progress

---
 README.md | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/README.md b/README.md
index 677c233aa..ecf7df6f9 100644
--- a/README.md
+++ b/README.md
@@ -39,7 +39,7 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |:---|:---:|:---:|:---:|:---:|
 |**Deutsch**|not started|not started|not started|not started|
 |**Español**|not started|not started|not started|not started|
-|**فارسی**|not started|not started|not started|not started|
+|**فارسی**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/200)|not started|not started|not started|
 |**Français**|done|done|done|done|
 |**עִבְרִית**|not started|not started|not started|not started|
 |**Italiano**|not started|not started|not started|not started|
@@ -47,13 +47,14 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**한국어**|not started|not started|not started|not started|
 |**Português**|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|
-|**Tiếng Việt**|not started|not started|not started|not started|
-|**中文**|not started|not started|not started|not started|
+|**Tiếng Việt**|not started|not started|not started|done|
+|**简体中文**|not started|not started|not started|not started|
+|**繁體中文**|not started|not started|not started|not started|
 
 ### CS 229 (Machine Learning)
 | |[Deep learning](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-deep-learning.md)|[Supervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-supervised-learning.md)|[Unsupervised](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-unsupervised-learning.md)|[ML tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-machine-learning-tips-and-tricks.md)|[Probabilities](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-probability.md)|[Algebra](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-229-linear-algebra.md)|
 |:---|:---:|:---:|:---:|:---:|:---:|:---:|
-|**العَرَبِيَّة**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/87)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/88)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/83)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/89)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/85)|
+|**العَرَبِيَّة**|done|done|done|done|done|done|
 |**Català**|not started|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/47)|
 |**Deutsch**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/106)|not started|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/135)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/136)|
 |**Español**|done|done|done|done|done|done|
@@ -64,16 +65,17 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**हिन्दी**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/37)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/46)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/40)|not started|not started|
 |**Magyar**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/124)|
 |**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/154)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/139)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/151)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/150)|
-|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|not started|not started|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/96)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/173)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
+|**Italiano**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/78)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/207)|not started|not started|done|done|
+|**日本語**|done|done|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/99)|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/80)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/90)|done|done|done|done|
 |**Polski**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/8)|not started|not started|
 |**Português**|done|done|done|done|done|done|
 |**Русский**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/21)|not started|not started|not started|not started|
 |**Türkçe**|done|done|done|done|done|done|
 |**Українська**|not started|not started|not started|not started|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/95)|
-|**Tiếng Việt**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/159)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/162)|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/160)|not started|not started|
-|**中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|**Tiếng Việt**|done|done|done|done|done|done|
+|**简体中文**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/12)|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/48)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/7)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/73)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/72)|
+|**繁體中文**|done|done|done|done|done|done|
 
 ### CS 230 (Deep Learning)
 | |[Convolutional Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-convolutional-neural-networks.md)|[Recurrent Neural Networks](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-recurrent-neural-networks.md)|[Deep Learning tips](https://github.com/shervinea/cheatsheet-translation/blob/master/template/cs-230-deep-learning-tips-and-tricks.md)|
@@ -88,17 +90,18 @@ Please make sure to propose the translation of **only one** cheatsheet per pull
 |**עִבְרִית**|not started|not started|not started|
 |**हिन्दी**|not started|not started|not started|
 |**Magyar**|not started|not started|not started|
-|**Bahasa Indonesia**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/155)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
+|**Bahasa Indonesia**|done|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/152)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/153)|
 |**Italiano**|not started|not started|not started|
-|**日本語**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/145)|done|done|
+|**日本語**|done|done|done|
 |**한국어**|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/109)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/107)|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/108)|
 |**Polski**|not started|not started|not started|
 |**Português**|done|not started|not started|
 |**Русский**|not started|not started|not started|
 |**Türkçe**|done|done|done|
 |**Українська**|not started|not started|not started|
-|**Tiếng Việt**|not started|not started|not started|
-|**中文**|not started|not started|not started|
+|**Tiếng Việt**|done|done|done|
+|**简体中文**|not started|[in progress](https://github.com/shervinea/cheatsheet-translation/pull/181)|not started|
+|**繁體中文**|done|not started|not started|
 
 ## Acknowledgements
 Thank you everyone for your help! Please do not forget to add your name to the `CONTRIBUTORS` file so that we can give you proper credit in the cheatsheets' [official website](https://stanford.edu/~shervine/teaching).

From 46fd8f3d311ad25f162850ad47272d103973a45b Mon Sep 17 00:00:00 2001
From: Shervine Amidi <shervinea@users.noreply.github.com>
Date: Wed, 22 Apr 2020 23:01:02 -0700
Subject: [PATCH 531/531] Update CONTRIBUTORS

---
 CONTRIBUTORS | 108 +++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 97 insertions(+), 11 deletions(-)

diff --git a/CONTRIBUTORS b/CONTRIBUTORS
index addb9870f..acedc1fa1 100644
--- a/CONTRIBUTORS
+++ b/CONTRIBUTORS
@@ -1,14 +1,26 @@
 --ar
   Amjad Khatabi (translation of deep learning)
   Zaid Alyafeai (review of deep learning)
-  
+
   Zaid Alyafeai (translation of linear algebra)
   Amjad Khatabi (review of linear algebra)
   Mazen Melibari (review of linear algebra)
+
+  Fares Al-Qunaieer (translation of machine learning tips and tricks)
+  Zaid Alyafeai (review of machine learning tips and tricks)
   
+  Mahmoud Aslan (translation of probabilities and statistics)
+  Fares Al-Qunaieer (review of probabilities and statistics)
+
+  Fares Al-Qunaieer (translation of supervised learning)
+  Zaid Alyafeai (review of supervised learning)
+
+  Redouane Lguensat (translation of unsupervised learning)
+  Fares Al-Qunaieer (review of unsupervised learning)
+
 --de
 
---es 
+--es
   Erick Gabriel Mendoza Flores (translation of deep learning)
   Fernando Diaz (review of deep learning)
   Fernando González-Herrera (review of deep learning)
@@ -17,12 +29,12 @@
   Alonso Melgar López (review of deep learning)
   Gustavo Velasco-Hernández (review of deep learning)
   Juan Manuel Nava Zamudio (review of deep learning)
-  
+
   Fernando González-Herrera (translation of linear algebra)
   Fernando Diaz (review of linear algebra)
   Gustavo Velasco-Hernández (review of linear algebra)
   Juan P. Chavat (review of linear algebra)
-  
+
   David Jiménez Paredes (translation of machine learning tips and tricks)
   Fernando Diaz (translation of machine learning tips and tricks)
   Gustavo Velasco-Hernández (review of machine learning tips and tricks)
@@ -40,7 +52,7 @@
   Jaime Noel Alvarez Luna (translation of unsupervised learning)
   Alonso Melgar López (review of unsupervised learning)
   Fernando Diaz (review of unsupervised learning)
-  
+
 --fa
   AlisterTA (translation of convolutional neural networks)
   Ehsan Kermani (translation of convolutional neural networks)
@@ -55,7 +67,7 @@
 
   Erfan Noury (translation of linear algebra)
   Mohammad Karimi (review of linear algebra)
-  
+
   AlisterTA (translation of machine learning tips and tricks)
   Mohammad Reza (translation of machine learning tips and tricks)
   Erfan Noury (review of machine learning tips and tricks)
@@ -70,10 +82,10 @@
   Amirhosein Kazemnejad (translation of supervised learning)
   Erfan Noury (review of supervised learning)
   Mohammad Karimi (review of supervised learning)
-  
+
   Erfan Noury (translation of unsupervised learning)
   Mohammad Karimi (review of unsupervised learning)
-  
+
 --fr
   Original authors
 
@@ -81,14 +93,34 @@
 
 --hi
 
+--id
+  Prasetia Utama Putra (translation of convolutional neural networks)
+  Gunawan Tri (review of convolutional neural networks)
+
+--it
+  Alessandro Piotti (translation of linear algebra)
+  Nicola Dall'Asen (review of linear algebra)
+  
+  Nicola Dall'Asen (translation of probabilities and statistics)
+  Alessandro Piotti (review of probabilities and statistics)
+
 --ko
   Wooil Jeong (translation of machine learning tips and tricks)
   
   Wooil Jeong (translation of probabilities and statistics)
   
-  Kwang Hyeok Ahn (translation of Unsupervised Learning)
+  Kwang Hyeok Ahn (translation of unsupervised learning)
 
 --ja
+  Tran Tuan Anh (translation of convolutional neural networks)
+  Yoshiyuki Nakai (review of convolutional neural networks)
+  Linh Dang (review of convolutional neural networks)
+  
+  Taichi Kato (translation of deep learning)
+  Dan Lillrank (review of deep learning)
+  Yoshiyuki Nakai (review of deep learning)
+  Yuki Tokyo (review of deep learning)
+  
   Kamuela Lau (translation of deep learning tips and tricks)
   Yoshiyuki Nakai (review of deep learning tips and tricks)
   Hiroki Mori (review of deep learning tips and tricks)
@@ -104,13 +136,19 @@
   
   Yuta Kanzawa (translation of supervised learning)
   Tran Tuan Anh (review of supervised learning)
-
+  
+  Tran Tuan Anh (translation of unsupervised learning)
+  Yoshiyuki Nakai (review of unsupervised learning)
+  Yuta Kanzawa (review of unsupervised learning)
+  Dan Lillrank (review of unsupervised learning)
+  
 --pt
   Leticia Portella (translation of convolutional neural networks)
   Gabriel Aparecido Fonseca (review of convolutional neural networks)
 
   Gabriel Fonseca (translation of deep learning)
   Leticia Portella (review of deep learning)
+  Renato Kano (review of deep learning)
 
   Gabriel Fonseca (translation of linear algebra)
   Leticia Portella (review of linear algebra)
@@ -125,7 +163,7 @@
   Leticia Portella (translation of supervised learning)
   Gabriel Fonseca (review of supervised learning)
   Flavio Clesio (review of supervised learning)
-  
+
   Gabriel Fonseca (translation of unsupervised learning)
   Tiago Danin (review of unsupervised learning)
 
@@ -174,12 +212,57 @@
   Gregory Reshetniak (translation of probabilities and statistics)
   Denys (review of probabilities and statistics)
   
+--vi
+  Phạm Hồng Vinh (translation of convolutional neural networks)
+  Đàm Minh Tiến (review of convolutional neural networks)
+  
+  Trần Tuấn Anh (translation of deep learning)
+  Phạm Hồng Vinh (review of deep learning)
+  Đàm Minh Tiến (review of deep learning)
+  Nguyễn Khánh Hưng (review of deep learning)
+  Hoàng Vũ Đạt (review of deep learning)
+  Nguyễn Trí Minh (review of deep learning)
+  
+  Hoàng Minh Tuấn (translation of deep learning tips and tricks)
+  Trần Tuấn Anh (review of deep learning tips and tricks)
+  Đàm Minh Tiến (review of deep learning tips and tricks)
+  
+  Hoàng Minh Tuấn (translation of linear algebra)
+  Phạm Hồng Vinh (review of linear algebra)
+  
+  Hoàng Minh Tuấn (translation of logic-based models)
+  Đàm Minh Tiến (review of logic-based models)
+
+  Trần Tuấn Anh (translation of machine learning tips and tricks)
+  Nguyễn Trí Minh (review of machine learning tips and tricks)
+  Vinh Pham (review of machine learning tips and tricks)
+  Đàm Minh Tiến (review of machine learning tips and tricks)
+  
+  Hoàng Minh Tuấn (translation of probabilities and statistics)
+  Hung Nguyễn (review of probabilities and statistics)
+
+  Trần Tuấn Anh (translation of recurrent neural networks)
+  Đàm Minh Tiến (review of recurrent neural networks)
+  Hung Nguyễn (review of recurrent neural networks)
+  Nguyễn Trí Minh (review of recurrent neural networks)
+
+  Trần Tuấn Anh (translation of supervised learning)
+  Đàm Minh Tiến (review of supervised learning)
+  Hung Nguyễn (review of supervised learning)
+  Nguyễn Trí Minh (review of supervised learning)
+  
+  Trần Tuấn Anh (translation of unsupervised learning)
+  Đàm Minh Tiến (review of unsupervised learning)
+  
 --zh
   Wang Hongnian (translation of supervised learning)
   Xiaohu Zhu (朱小虎) (review of supervised learning)
   Chaoying Xue (review of supervised learning)
 
 --zh-tw
+  kentropy (translation of convolutional neural networks)
+  kevingo (review of convolutional neural networks)
+
   kevingo (translation of deep learning)
   TobyOoO (review of deep learning)
 
@@ -195,3 +278,6 @@
   kevingo (translation of unsupervised learning)
   imironhead (review of unsupervised learning)
   johnnychhsu (review of unsupervised learning)
+
+  kevingo (translation of machine learning tips and tricks)
+  kentropy (review of machine learning tips and tricks)