BPNeuralNetwork.doc_文客久久网wenke99.com

资源描述

1、Control Theory and Control Engineering Ran Zhang 21210124 Intelligent Control and Intelligent Systems Page 1 System Identification with BP Neural NetworkRan Zhang 21210124AbstractThis article introduced a method of using a BP (Back-Propagation) Neural Network to realize system identification.We stud

2、ied three systems of different system functions, and analyzed the affects of different parameters of the BP neural network. key wordsMLP (Multi-Layered Perceptron), Neurons, Hidden Layer, BP Neural NetworkAlgorithm IntroductionThe Neurons or Neurodes formed the central nervous system in animals or h

3、uman beings brains. The networks in the human beings brain can deal with senior mental activities. An Artificial Neural Network, often just called a neural network, is a mathematical model inspired by biological neural networks. A neural network consists of an interconnected group of artificial neur

4、ons, and it processes information by using a connectionist approach to computation. In most cases a neural network is an adaptive system that changes its structure during a learning phase. Neural networks are used to model complex relationships between inputs and outputs or to find patterns in data.

5、BP Neural Networks is one of the basical Artificial Neural Networks. It is based on the MLP architecture. Training with the system samples, the algorithm could prduct a Neural Network model to approximate the real system.(1) MLPMulti-Layered Perceptron network is a method using supervised learning,

6、the architecture of MLP is showed in Figure1.(0)x(1)(2)x(3)120nx1y23nyeurosfthiplayI1nftidler2uosfthinlayr3eftplOut0w121Weightmarbwnpulyds2eightarbnslyd3WeigtmarbwndlyFigure 1 The structure of MLPThe signal is transfered in the certain direction. There is no connection between the neurons in the sam

7、e layer, and the neurons of adjacent layers are fully connected. And each connection between adjacent layers has a weight.In each hidden (or output) layer, every neuron has an activition function which against to the weighted sum of the out put of the previous layer. After serval iterations, then th

8、e model will generate a set of outputs.We have a lot of choices for the activition function, such as linear function, Sigmoid function and so on. Generally, we choose a Sigmoid function ( ) as 1)bsfsethe activition function.(2) BP Neural NetworkBased on the MLP network, adjust the weights of each co

9、nnection using the error of the next layer, that is the error feedback method.The BP algorithm is deduced from the steepest gradient descent method. Referring to Figure 1, for the qth sample, we define the power function asControl Theory and Control Engineering Ran Zhang 21210124 Intelligent Control

10、 and Intelligent Systems Page 2 3211()()()22nTqqqqjjjEdydyis the desired output of the qth sample; qis the real output of the network.According to the steepest gradient descent method, we can get the adjustment of the weight of each connection as follows:()()()(1)1lllljijijiwkkxFor the output layer,

11、() ()l ljqjjjdyfsFor the hidden and input layer,1()()()1lnl ljjkjfswIn the formulas above, is the activition ffunction, and is the derivative of , ()f ()fsand s is equal to the difference between the weighted sum of the inputs and the threshold of each neuron. is the learning rate.Turn to the thresh

12、old of each neuron, we can conclude the simular fomula as follows:()()()1llljjjkkWhen the network has been trained with all the samples for one time, the algorithm would finish one epoch. Then calculate the performance index . If the index fit 1QqEthe accuracy requirements, then end the training, el

13、se start another training epoch.Experiments and AnalysisBased on the algorithm introduced above, we choose three systems with different system functions as follows:(1) 0,2f(x)=sinxWe choose a MLP model with 1 hidden layer, and we applied different number of neurons of the hidden layer to study the a

14、ffects of the number of neurons.We choosed 9 sets of uniform data to be used to train the network, and then tested the network with 361 sets of uniform data. Choose Matlab as the simulating tool. Performance index is set as .01EThe results are showed below.Note：Due to the existence of zeros in the d

15、esired output, the relative error will be huge in the area neaby the zeros, and that will make the relative error useless to just the performances of the network. As a result, we compute the absolute error to characterize the performance as the desired output is same.a) 3 neurons in the hidden layer

16、 ( )0.2Figure 2 Plots of training convergence and functionsFigure 3 Absolute error between network output and desired outputb) 5 neurons in the hidden layer ( )0.2Control Theory and Control Engineering Ran Zhang 21210124 Intelligent Control and Intelligent Systems Page 3 Figure 4 Plots of training c

17、onvergence and functions Figure 5 absolute error between network output and desired outputc) 5 neurons in the hidden layer ( )0.5Figure 6 Plots of training convergence and functions Figure 7 Absolute error between actual output and desired outputd) 7 neurons in the hidden layer ( )0.2Figure 8 Plots

18、of training convergence and functions Figure 9 Absolute error between network output and desired outputThe ranges of axises are set to be the same so as to make comparsion more convient by sight.From the results showed above, we can see the network can approximate the system well. Meanwhile, we can

19、get serval conclusions:i) According to a) ,b) and d), we found that as the number of neurons increases, the algorithm converge faster; The absolute error get smaller first and lager later. ii) According to b) and c), we found that as the learning rate increases, the convergence rate gets fast but os

20、cillation may occur. When we set the learning rate to be 1, the oscillation made the network does not converge.(2) |0,2f(x)=sin()xTo model this system, we choose the same indexes with system (1):, , 4 neurons in the hidde 0.2.1Elayer, 9 sets of uniform data for training and 361 sets of uniform data

21、for testing.Results are as follows:Control Theory and Control Engineering Ran Zhang 21210124 Intelligent Control and Intelligent Systems Page 4 Figure 10 Plots of training convergence and functions Figure 11 Absolute error between network output and desired outputFrom the two figures above, we can s

22、ee that the network can approximate the system well. However, at the inflection point the network works badly. This is caused by the theory of the algorithm which comes from the steepest gradient method. As the system function is underivated, so the gradient does not exist. As a result, the algorith

23、m cannot approximate well in the nearby area of the inflection point.(3) 2211120*()(,5,f(x)=xxFrom the system function, we know the system has two inputs. As the range of ()fis out of , if we just do like before then the algorithm can hardly converge. So we need to normalize the values of to be ()fx

24、located in . In the situation, we studied 1,the effects of the number of hidden layers.Results are as follows:a) 1 hidden layerFigure 12 Surfances of system and network functionsFigure 13 Absolute error between network output and desired outputb) 2 bidden layersFigure 14 Surfances of system and netw

25、ork functionsControl Theory and Control Engineering Ran Zhang 21210124 Intelligent Control and Intelligent Systems Page 5 Figure 15 Absolute error between network output and desired outputc) 4 hidden layersFigure 16 Surfances of system and network functionsFigure 17 Absolute error between actual out

26、put and desired outputIn the figures of absolute error, we set the range of axies to be same with each other, so that it is intuitive to see the performances of each network, and convient for comparsion.From the results above, we can easily see that the network of a) and b) made a nice approximation

27、 to the real system. i) Comparing a) with b), its obvious that the b) performs better with smaller error. It suggests that increasing the number of hidden layers properrly can improve the performance of the network.ii) When turn to c), its easy to find the BP neural network does not get a good model

28、 for the system. This suggests that overmuch hidden layers may make the network over-learned and perform worse.ConclusionThe results showed in previous part have indicated that BP Neural Network is an effective method of system identification. When applying a BP Neural Network to approximate a syste

29、m function, lots of paramaters should be taken into consideration, such as the learning rate, number of hidden layers, nomalization and so on.(1) Large learning rate could make the algorithm converge faster, however, excessively large learning rate will cause oscillation which may make the algorithm

30、 does not converge;(2) The accurancy requirements for the performance index should be appropriate. High accurancy may improve the peeformance, on the other hand, excessively high accurancy can make the network over-learned, the performance will get bad.(3) When considering the number of hidden layer

31、s, we need to know the order of the system first, and then choose the number of hidden layers neaby the order (usually number less than the order). Too many layers can easilly make the network over-learned. Just like using a 3-order curve to approximate a linear function.In conclusion, there is no s

32、pecialized guidance for choosing the paramaters of the neural Control Theory and Control Engineering Ran Zhang 21210124 Intelligent Control and Intelligent Systems Page 6 network. When dealing with practical situations, we could only act based on experiences.References1Ben Krose, Patrick van der Sma

33、gt, An Introduction to Neural Networks, 1996.2 Power Xu, Intelligent Control and Intelligent Systems, China Machine Press.AppendixMatlab codes(1) 0,2f(x)=sinx% 1 hidden layer, 4 neurons in hidden layerclc;x=0:2*pi/8:2*pi;%9 uniform samplesd=sin(x);%9 samplesy=zeros(1,9);n=9;wi=rand(5,1);%weights of

34、connections between input layer and hidden layerwo=rand(5,1);% weights of connections between hiddenlayer and output layertheta1=rand(5,1);%threholds of neurons in hidden layertheta2=rand(1);%threshold of output neurons1=zeros(5,1);x1=zeros(5,1);%output of hidden layeru=0.2;% learning ratee=10; %tim

35、es of iterationsT=0;E=0; %performance index matrixwhile(e0.01)for i=1:ns1=x(i)*wi-theta1;x1=1./(1+exp(-s1);y(i)=x1*wo-theta2;delta2=(d(i)-y(i)*1;%error of the output layerdelta1=(delta2.*wo).*x1.*(1-x1);%error of the hidden layerwo=wo+u*delta2.*x1;%weight-adjustments of connections between hidden la

36、yer and output layer wi=wi+u*delta1.*x(i);%weight-adjustments of connections between input layer and hidden layertheta2=theta2-u*delta2;%threshold-adjustments of output neurontheta1=theta1-u*delta1;%threshold-adjustments of neurons in hidden layerenderror=y-d;e=1/2*sum(error.2);E=E,e;T=T+1;endx_chec

37、k=0:pi/180:2*pi;d_check=sin(x_check);y_bp=zeros(1,361);for i=1:361s_bp=x_check(i)*wi-theta1;x1_bp=1./(1+exp(-s_bp);y_bp(i)=x1_bp*wo-theta2;enderror_bp=d_check-y_bp;subplot(2,1,1);plot(E);title(The change of E as the number of Epoch increases (n=5);ylabel(E=1/2*sum(d-y);xlabel(Number of Epoch);subplo

38、t(2,1,2);plot(x_check,d_check,r.,x_check,y_bp,b-);title(Plots of system and network funtion (n=5);xlabel(x);ylabel(y);legend(desired output,network output);(2) |0,2f(x)=sin()xIts simular to (1), so we dont paste the code here.(3) 2211120*()(,5,f() x%2 hidden layers, 4 neurons in each layerclc;wi=ran

39、d(4,2);wh=rand(4,4);wo=rand(4,1);theta1=rand(4,1);theta2=rand(4,1);Control Theory and Control Engineering Ran Zhang 21210124 Intelligent Control and Intelligent Systems Page 7 theta3=rand(1);d=zeros(9,9);y=zeros(9,9);u=0.2;x1=-5:10/8:5;x2=x1;T=0;e=1;E=0;while(e0.0001)for m=1:9for n=1:9 %normalizatio

40、n d(m,n)=(100*(x2(m)-x1(n)2)2+(1-x1(m)2)/90036;s1=x1(m)*wi(:,1)+x2(n)*wi(:,2)-theta1;temp1=1./(1+exp(-s1);s2=wh*temp1-theta2;temp2=1./(1+exp(-s2);y(m,n)=temp2*wo;delta3=(d(m,n)-y(m,n)*1;delta2=delta3*wo.*temp2.*(1-temp2);delta1=temp1.*(1-temp1).*(wh*delta2);wo=wo+u*delta3.*temp2;wh=wh+u*delta2*temp1

41、;wi(:,1)=wi(:,1)+u*delta1*x1(m);wi(:,2)=wi(:,2)+u*delta1*x2(n);theta3=theta3-u*delta3;theta2=theta2-u*delta2;theta1=theta1-u*delta1;endende=1/81*1/2*sum(sum(d-y).2);E=E,e;T=T+1;endx1_check=-5:10/360:5;x2_check=x1_check;d_ori=zeros(361,361);y_bp=zeros(361,361);for m=1:361for n=1:361d_ori(m,n)=(100*(x2_check(m)-x1_check(n)2)2+(1-x1_check(m)2);s1=x1_check(m)*wi(:,1)+x2_check(n)*wi(:,2)-theta1;temp1=1./(1+exp(-s1);s2=wh*temp1-theta2;temp2=1./(1+exp(-s2);y_bp(m,n)=temp2*wo*90036;endenderror=d_ori-y_bp;subplot(2,1,1);surf(x1_check,x2_check,d_ori);subplot(2,1,2);surf(x1_check,x2_check,y_bp);

展开阅读全文