About the network I'm trying to learn itself - Input layers are 1039 in number. I use only one hidden layer and one output layer. I have two biases one for hidden and one for output layer and one other weight for the only layer I have. Thus I'm trying to learn 1042 weights. I use tangent sigmoid as my hidden layer TF and log sigmoid as my output layer TF. Have only two classes to classify into.
Scaled Conjugate Gradient - NN toolbox
3 views (last 30 days)
Show older comments
Hi,
I have used MATLAB's 'trainscg' with 'mse' as the performance function and NETLAB's 'scg' with 'mse' as the performance function for the same training data set and still don't obtain the same generalisation on a set of other data files I have.
I have used same the same Nguyen Widrow initialisation method for weight and bias initialisation. Used the same 'dividerand' method to split the data sets into training, validation and testing data.
I know the difference could be in the various parameters used. In the original paper, http://www.sciencedirect.com/science/article/pii/S0893608005800565; the lambda values are specified not as exact values but as inequalities. I have used values that don't violate the rules laid down by the author.
Also, one thing that seems a bit bizarre to me is that MATLAB stops the learning in just 23 epochs but NETLAB exceeds maximum iterations. I understand stopping criteria may be different.
Is there anyone there who has worked on both of these toolboxes and found a way of establishing same results from both of them? I want some general ideas and tips to making SCG give similar results to MATLAB's TRAINSCG.
Any help, advise will be greatly appreciated.
Thank you. Pooja
Accepted Answer
Greg Heath
on 12 Aug 2014
Your description is incorrect and confusing.
[I N ] = size(input) % = ?
[ O N ] = size(target) % = ?
Ntrn = ? % Matlab default = N-2*round(0.15*N)
Ntrneq = Ntrn*O % Number of training equations
For an I-H-O net, the number of unknown weights to be estimated is
Nw = (I+1)*H+(H+1)*O % The "1s" are for biases
To prevent overfitting choose H so that Ntrneq >= Nw.
To prevent nonrobustness w.r.t. noise and interference choose Ntrneq >> Nw
Otherwise use regularization (trainbr or msereg) or validation subset stopping.
Nw can be lowered by removing input and/or hidden nodes.
I assume you mean you have 1039 input NODES. I doubt if you need that many. You should probably use input variable reduction (e.g., help PLSregress) to obtain a more reasonable number.
Need to know N, Ntrn and H. Need to reduce I.
Hope this helps.
Thank you for formally accepting my answer
Greg
More Answers (1)
saba momeni
on 1 Feb 2019
Hi everyone
I am training my feedfoward neural network. with scale conjugate gradient.
I am not sure that scale conjugate gradient dose optimization in bach or with mini-batch training?
I just specify the Lambada and the Sigma for it , no size of batch.
I appreciate your answer.
Cheers
S
0 Comments
See Also
Categories
Find more on Deep Learning Toolbox in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!