You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix style issues causing mvn install to fail. (#453)
- Style issues were caused due to spacing issues and the presence of the chi character in unicode.
- The workflow step was also updated to include the execution of the style check. Previously, the style check was not being executed which led to the style errors being committed to master.
// at least two distinct categories are required to run the chi-square test for a categorical variable
33
27
privatevalchisquareMinDimension:Int=2
34
28
35
-
//for tables larger than 2 x 2: "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, The Practice of Statistics, p. 734)
29
+
// for tables larger than 2 x 2:
30
+
// "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater"
31
+
// - Yates, Moore & McCabe, 1999, The Practice of Statistics, p. 734
36
32
privatevaldefaultAbsThresholdYates:Integer=5
37
33
privatevaldefaultPercThresholdYates:Double=0.2
38
34
39
-
// for 2x2 tables: all expected counts should be 10 or greater (Cochran, William G. "The χ2 test of goodness of fit." The Annals of mathematical statistics (1952): 315-345.)
35
+
// for 2x2 tables:
36
+
// all expected counts should be 10 or greater
37
+
// - Cochran, William G. "The (chi)**2 test of goodness of fit."
38
+
// The Annals of mathematical statistics (1952): 315-345.
40
39
privatevaldefaultAbsThresholdCochran:Integer=10
41
40
42
-
// Default c(alpha) value corresponding to an alpha value of 0.003, Eq. (15) in Section 3.3.1 of Knuth, D.E., The Art of Computer Programming, Volume 2 (Seminumerical Algorithms), 3rd Edition, Addison Wesley, Reading Mass, 1998.
41
+
// Default c(alpha) value corresponding to an alpha value of 0.003,
42
+
// Eq. (15) in Section 3.3.1 of Knuth, D.E., The Art of Computer Programming, Volume 2 (Seminumerical Algorithms),
/** Calculate distance of categorical profiles based on different distance methods
77
75
*
78
76
* Thresholds for chi-square method:
79
-
* - for 2x2 tables: all expected counts should be 10 or greater (Cochran, William G. "The χ2 test of goodness of fit." The Annals of mathematical statistics (1952): 315-345.)
80
-
* - for tables larger than 2 x 2: "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, The Practice of Statistics, p. 734)
77
+
* - for 2x2 tables:
78
+
* all expected counts should be 10 or greater
79
+
* - Cochran, William G. "The (chi)**2 test of goodness of fit."
80
+
* The Annals of mathematical statistics (1952): 315-345.
81
+
* - for tables larger than 2 x 2:
82
+
* "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater"
83
+
* - (Yates, Moore & McCabe, 1999, The Practice of Statistics, p. 734)
81
84
*
82
-
* @paramsample1 the mapping between categories(keys) and counts(values) of the observed sample
83
-
* @paramsample2 the mapping between categories(keys) and counts(values) of the expected baseline
85
+
* @paramsample1 the mapping between categories(keys) and
86
+
* counts(values) of the observed sample
87
+
* @paramsample2 the mapping between categories(keys) and
88
+
* counts(values) of the expected baseline
84
89
* @paramcorrectForLowNumberOfSamples if true returns chi-square statistics otherwise p-value
85
90
* @parammethod Method to use: LInfinity or Chisquare
86
-
* @paramabsThresholdYates Yates absolute threshold for tables larger than 2x2
87
-
* @parampercThresholdYates Yates percentage of categories that can be below threshold for tables larger than 2x2
88
-
* @paramabsThresholdCochran Cochran absolute threshold for 2x2 tables
89
-
* @return distance can be an absolute distance or a p-value based on the correctForLowNumberOfSamples argument
91
+
* @return distance can be an absolute distance or
92
+
* a p-value based on the correctForLowNumberOfSamples argument
/** Calculate distance of categorical profiles based on Chisquare test or stats
111
112
*
112
-
* for 2x2 tables: all expected counts should be 10 or greater (Cochran, William G. "The χ2 test of goodness of fit." The Annals of mathematical statistics (1952): 315-345.)
113
-
* for tables larger than 2 x 2: "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, The Practice of Statistics, p. 734)
113
+
* for 2x2 tables:
114
+
* all expected counts should be 10 or greater
115
+
* - Cochran, William G. "The (chi)**2 test of goodness of fit."
116
+
* The Annals of mathematical statistics (1952): 315-345.
117
+
* for tables larger than 2 x 2:
118
+
* "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater"
119
+
* - Yates, Moore & McCabe, 1999, The Practice of Statistics, p. 734
114
120
*
115
-
* @paramsample the mapping between categories(keys) and counts(values) of the observed sample
116
-
* @paramexpected the mapping between categories(keys) and counts(values) of the expected baseline
121
+
* @paramsample the mapping between categories(keys) and
122
+
* counts(values) of the observed sample
123
+
* @paramexpected the mapping between categories(keys) and
124
+
* counts(values) of the expected baseline
117
125
* @paramcorrectForLowNumberOfSamples if true returns chi-square statistics otherwise p-value
118
126
* @paramabsThresholdYates Yates absolute threshold for tables larger than 2x2
119
-
* @parampercThresholdYates Yates percentage of categories that can be below threshold for tables larger than 2x2
127
+
* @parampercThresholdYates Yates percentage of categories that can be
128
+
* below threshold for tables larger than 2x2
120
129
* @paramabsThresholdCochran Cochran absolute threshold for 2x2 tables
121
-
* @return distance can be an absolute distance or a p-value based on the correctForLowNumberOfSamples argument
130
+
* @return distance can be an absolute distance or
131
+
* a p-value based on the correctForLowNumberOfSamples argument
// If less than 2 categories remain we cannot conduct the test
146
156
if (regroupedSample.keySet.size < chisquareMinDimension) {
@@ -158,30 +168,39 @@ object Distance {
158
168
159
169
/** Regroup categories with elements below threshold, required for chi-square test
160
170
*
161
-
* for 2x2 tables: all expected counts should be 10 or greater (Cochran, William G. "The χ2 test of goodness of fit." The Annals of mathematical statistics (1952): 315-345.)
162
-
* for tables larger than 2 x 2: "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater" (Yates, Moore & McCabe, 1999, The Practice of Statistics, p. 734)
171
+
* for 2x2 tables:
172
+
* all expected counts should be 10 or greater
173
+
* - Cochran, William G. "The (chi)**2 test of goodness of fit."
174
+
* The Annals of mathematical statistics (1952): 315-345.
175
+
* for tables larger than 2 x 2:
176
+
* "No more than 20% of the expected counts are less than 5 and all individual expected counts are 1 or greater"
177
+
* - Yates, Moore & McCabe, 1999, The Practice of Statistics, p. 734
163
178
*
164
-
* @paramsample the mapping between categories(keys) and counts(values) of the observed sample
165
-
* @paramexpected the mapping between categories(keys) and counts(values) of the expected baseline
179
+
* @paramsample the mapping between categories(keys) and
180
+
* counts(values) of the observed sample
181
+
* @paramexpected the mapping between categories(keys) and
182
+
* counts(values) of the expected baseline
166
183
* @paramabsThresholdYates Yates absolute threshold for tables larger than 2x2
167
-
* @parampercThresholdYates Yates percentage of categories that can be below threshold for tables larger than 2x2
184
+
* @parampercThresholdYates Yates percentage of categories that can be
185
+
* below threshold for tables larger than 2x2
168
186
* @paramabsThresholdCochran Cochran absolute threshold for 2x2 tables
169
187
* @return (sample, expected) returns the two regrouped mappings
0 commit comments