The problem stems from the fact that the "squeezenet" has so many convolution, ReLU and concatenation layers. Every time the layers are activated, at least 2 activations had to be in GPU memory. This increases because GPUs do not allow per unit variable clearing. There are a number of ways to address this issue:
1) One potential solution is to perform a reset before any batch operations (in this case, detection operation). After the detector is trained, you should reset the GPU using the reset method. This should be done every time a new image is used to detect the object:
>> d = gpuDevice;
>> reset(d);
>> testsim = imread('/myimages/im1.jpg');
>> [bboxes, scores, labels] = detect(...
2) Another potential solution is to reduce the parameter, NumStrongestRegions, of the detect function from the default value of 2000 to a lower number. Remember to reset the GPU every time the detect method is called.
>> [bboxes, scores, labels] = detect(detector, testsim, 'NumStrongestRegions', 1000)
3) Play with the MinSize and MaxSize Name-Value pairs if you know the approximate size of the objects being detected.
4) Use a GPU device with more memory.
5) Create a custom fast rcnn layers for the trainFastRCNNObjectDetector, then run the detect method. The idea is to choose a feature extraction layer way ahead of the output layer so the number of computation layers are reduced.
Best Answer