I should probably have started doing it long ago, but better late than never. So here it is (for older entries see commit history)
Forgotten maintainance related commit. Updates to inspectors and train_data interfaces, reworked nnet::init4fixedBatchFprop()
and it's callers.
- fixed both
_i_train_data
implementations to obeybatchIndex
parameter ofon_next_batch()/next_subset()
. - placeholder
imemmgr
for proper intermediate memory manager is introduced. To be done later. - logarithmic activation is implemented:
A = {log(A+1)|A>0, 0|A<=0}
- bug fixes and minor upgrades
- bug fixes and some minor updates
- weight initialization algorithm now may have a non-static implementaion. Consequently weights creation function is renamed from
init()
tomake_weights()
- layer's interface
init()
is renamed tolayer_init()
anddeinit()
tolayer_deinit()
(it was bad idea to give these functions too generic names - hard to grep 'em all). Same applies to_i_grad_works<>::init/deinit
- they are nowgw_init/gw_deinit
. on_batch_size_change()
has changed the signature to get new incoming batch size and return outgoing.- Now a layer may (technically) change the batch size (rows count of data matrices) during data propagation (allows more flexibility in algorithms). Corresponding
common_nn_data<>
members no longer applies to the whole nnet, just to an input_layer. All nests are passing, however due to huge scale of changes, it is possible that some very rarely used code not covered by tests remained unchanged (I just forgot about its existence) and therefore it won't compile. Changes required are quite straightforward though, so I don't feel it wouldn't be unsolvable anyone who dares to use it.
- cleanup of internal memory consumption.
- layers'
is_activations_shared()
no longer relyes onm_activations.bDontManageStorage()
flag, but uses it's own. This allows to substitute activation storage in a derived class without breaking layers packing code.
- proper
m_layer_stops_bprop
layer marker introduced to distinguish between realm_layer_input
marked layer and a layer that just terminates a back propagation chain (it was the job ofm_layer_input
marker earlier). - refactored common code of
LFC
&layer_output
for fully-connected forward propagation intoLFC_FProp
which also became spawnable (in case one needs thefprop()
-only part ofLFC
alone).
- Refactored and moved dataset normalization code from
_train_data_simple<>
into independent separate class_impl::td_norm<>
and partly into base class_impl::_td_base<>
. Finally normalization code is abstract enough to be used with many varioustrain_data
implementations (provided that they have 4 additional special functions implemented that does actual data-transformation work). transf_train_data<>
is introduced to easily generate training data on the fly.LSUVExt
finally works with an object of_i_train_data<>
interface (that refactoring - totally coincedentally of course! - fixed some old odd featurebug)- Upgrades to
math::smatrix<>
interface iMath_t::mTranspose()
and it's variants now properly supportsm_bBatchInRow
mode with all variations of bias/no_bias combination.- Tested with latest boost library v.1.75.0
- Some minor updates
-
slightly changed filenames format for
inspectors::dumper<>
-
m_gradientWorks
member variable ofLFC<>
andlayer_output<>
are no longer public. Just use getter methodget_gradWorks()
-
bool math::smatrix<>::m_bBatchesInRows
flag is introduced to give more flexibility to algorithms. Make sure you've read a comment for the flag, b/c it is very experimental and unstable (in a sence of support by most ofnntl
) feature. The following classes was audited and should probably work fine with the feature (note - not extensively tested yet!):- all in
interface\math\smatrix.h
inmem_train_data<>
and all it's base classesnnet<>
andlayers<>
LFC<>
andlayer_output<>
accepts only activations of previous layer inbBatchesInRows()
mode and doesn't support own activations in this mode. No general issue to support it forlayer_output<>
, however, there's one forLFC
with stripping rowvector of biases fromm_activations
before creation of dL/dZ.
- all in
- corrected semantic of
_grad_works::max_norm()
. The old function now properly named as_grad_works::max_norm2()
to represent the fact the argument is treated as square of maximum norm. New_grad_works::max_norm()
treats argument as pure norm value. - changed
DeCov
regularizer implementation to normalize it to columns/neurons count. The old implementation (as published in paper) required to fit regularizer scale to width of a layer and change it every time the width of layer changes. The new implementation is width-stable, so once a decent regularizer scale was found (big enough to work well but not so big to destroy the signal in dL/dZ), it's much safer to experiment with layer width.
- renamed
activation::linear
->activation::identity
,activation::linear_output
->activation::identity_custom_loss
and other related stuff. - small performance optimization for identity activation at
LFC::bprop()
- now you may use special loss functions that doesn't require exactly the same number of values as there are neurons in output layer. Just remember to override
_i_train_data<>::isSuitableForOutputOf()
with a proper check and supply propernnet_evaluator / training_observer
, as well as a custom loss function. - Y matrices of training data interface are no longer required to have the same base data type as X matrices. Some non-core general purpose elements of the library that built on this assumption are still rely on it, but the core was upgraded.
LsuvExt
implementation was significantly reworked and improved.- NOTE:
LsuvExt
is still supporting only mean/variance metrics.
- NOTE:
- Data normalization to std/mean algo was extracted from
LsuvExt
and now available as a generic inutils\mtx2Normal.h
. - Dataset normalization algo on basis of
mtx2Normal
was implemented as a part of_train_data_simple
and available to use asinmem_train_data<>
.
- Refactored
LPT
class: removed deprecated template parameterbExpectSpecialDataX
and movedneurons_count K_tiles
from class' template to constructor parameters. - minor updates & improvements
Huge and important update of the whole framework.
-
Refactored the idea of data feeding of the main
nnet::train()
function and therefore whole trainig/testing data usage in the library. Now what previusly was thetrain_data
fixed class now can be any class as long as it obeysi_train_data<>
interface. This makes possible many things including the following important features:- on-the-fly training data augmentation
- working with datasets that doesn't fit into available RAM
- simultaneous use of any number of different datasets (e.g. now possible to check a validation dataset during nnet performance evaluation) (--though this feature was not completely tested yet)
-
Training batchSize is no longer have to be a multiple of training set size. If it is uneven to the training set size, some random training samples will just be skipped during epoch of training.
-
Finally made an option to restrict maximum batch size for inferencing (
fprop
mode), see thennet_train_opts::maxFpropSize()
function (and example of use inTEST(Simple, NesterovMomentumAndRMSPropOnlyFPBatch1000)
ofexamples
project, filesimple.cpp
).Before this option introduced, each layer of neural network allocated as much memory as it needed to do inference on the biggest dataset, which obviously led to enormous memory consumption for sophysticated nnet architectures and/or datasets. Now just use
nnet_train_opts::maxFpropSize(maxInferenceBatch)
to restrict the biggest possible batch size to employ for inference.The last thing to note about mini-batch inferencing is... error value computed for a dataset in mini-batch mode doesn't match error computed in full-batch mode (it is much bigger). And I don't get why that happens. It should be (almost) the same if I recall the algorithm properly... Everything else seems to work absolutely correctly fine thought. Either I forgot about some algo details and that behaviour should be expected, or there's a bug I'm not yet aware about of.
-
Updated format of binary data files used to store datasets (see
bin_file.h
for details). Older.bin
files are no longer usable, create new ones with updated matlab scripts (see./_supp/matlab/
folder).MNIST
dataset archive was also updated. -
Note that many internal API's were changed.
-
And as always, some old bugs fixed (some new bugs introduced) :D