Solved – Back propagation neural network data input advice

algorithmsdatasetneural networksvalidation

I am currently doing a project which involves pothole detection and neural networks. So far, I have an Android phone that reads Accelerometer readings and writes the X,Y,Z Axis aswell as the Amplitude and current timestamp into a CSV file. The data is then normalized using min-max normalization and uses the Y axis readings from the CSV file. The problem I am facing for the neural network to learn a pothole is the fact that what data should I feed to the Back Propagation Neural Network? Shall I set a threshold and when the Y axis reaches this point, get the 5 previous points and 5 points after and then feed the network with 11 inputs? I don't want to overtrain the network nor feed it with data in different positions each time.

Training – I am also starting to gather the data collected and create a training dataset – should I put things such as readings for normal/bumpy roads/speed bumps as well as potholes? How large should a training set be? or is 'the more data the better it is' actually true?

This is what the pothole data looks like.

http://i.stack.imgur.com/4cSzt.png

This is how the speedbump data looks like.

http://i.stack.imgur.com/7BLjq.png

A sample of the data collected:

 X-Axis     Y-Axis    Z-Axis   Timestamp

-0.371827, 8.513097, 5.441484, 165401
-0.601749, 7.976613, 5.326523, 165601
-0.333506, 8.053253, 5.441484, 165801
-0.256866, 8.206534, 5.364844, 166001
0.049697, 8.398136, 5.364844, 166202
-0.371827, 8.436457, 5.211563, 166400
-0.256866, 8.551417, 5.709726, 166601
-0.256866, 8.513097, 5.403164, 166801
-0.333506, 8.474776, 5.709726, 167000
-0.563428, 8.628057, 5.594766, 167201
-0.563428, 7.401808, 4.713398, 167402
-1.981280, 5.447472, 4.406836, 167602    POTHOLE
-0.180225, 5.600753, 5.403164, 167800    POTHOLE
-0.984952, 8.053253, 4.445156, 168001
-1.214874, 8.666378, 5.671406, 168201
-0.525108, 7.210207, 3.870352, 168401
-1.138233, 7.286847, 5.824687, 168600
-0.601749, 10.045910, 5.288203, 168801
-0.180225, 8.206534, 5.173242, 169001
0.279619, 7.861651, 5.518125, 169200
0.202978, 8.934620, 5.824687, 169401
-0.065264, 8.321495, 5.364844, 169601
-0.065264, 8.628057, 5.709726, 169800
-0.716710, 8.014933, 5.748047, 170001
-0.141905, 8.513097, 5.441484, 170200
-0.026944, 8.206534, 5.594766, 170401
-0.601749, 8.168214, 5.058281, 170601

Algorithm

My proposed algorithm is to set a certain threshold such as line 12 on the sample data when the Y axis hits a certain threshold such as <7 then pass the previous 5 points and the 5 points after that to the NN.

Best Answer

Given your comment, I am guessing that what distinguishes a pothole from a speed bump is largely the vehicle's speed going into the event. I think your idea of looking for outlier Y positions and then passing a surrounding window of data is a great place to start. What this means then is that your training data set will have to have 11 values for each known pattern you have.

If your neural network library has a softmax activation function for the output layer, you can perhaps use a single network to learn and identify potholes, speedbumps, and normal roads.

Alternatively, you can train two separate networks: one network would learn/identify good vs potholes, the other good vs speedbumps. Each new pattern would be given to both networks and you'd consider the output pairs (ideally [0,1] or [1,0]).

As for how large the training set needs to be, start small (e.g. 10 examples of each case), assess out of sample accuracy (so you'll need to reserve some additional known patterns as a hold out set), and then iteratively improve by adding more training patterns and/or changing the network layout and optimization settings. If you happen to be coding this in Python, PyBrain has some nice example documentation.

Related Question