Hi all,
Here is the part of code:
XU = X'*U; % mnk or pk (p<<mn)
UU = U'*U; % mk^2
VUU = V*UU; % nk^2
V = V.*(XU./max(VUU,1e-10));XV = X*V; % mnk or pk (p<<mn)VV = V'*V; % nk^2UVV = U*VV; % mk^2
U = U.*(XV./max(UVV,1e-10)); % 3mk
......newobj = CalculateObj(X, U, V);
The original variables X, U and V are normal matrix, which use CPU to calculate. And I transform the X to gpuArray with gpuArray, which leads to all calculation use gpu. Everything goes well untill the last line, which calculate the object function of NMF. The code of CalculateObj is:
function [obj, dV] = CalculateObj(X, U, V, deltaVU, dVordU) if ~exist('deltaVU','var') deltaVU = 0; end if ~exist('dVordU','var') dVordU = 1; end dV = []; maxM = 62500000; [mFea, nSmp] = size(X); mn = numel(X); nBlock = floor(mn*3/maxM); if mn < maxM dX = U*V'-X; obj_NMF = sum(sum(dX.^2)); if deltaVU if dVordU dV = dX'*U; else dV = dX*V; end end else obj_NMF = 0; if deltaVU if dVordU dV = zeros(size(V)); else dV = zeros(size(U)); end end for i = 1:ceil(nSmp/nBlock) if i == ceil(nSmp/nBlock) smpIdx = (i-1)*nBlock+1:nSmp; else smpIdx = (i-1)*nBlock+1:i*nBlock; end dX = U*V(smpIdx,:)'-X(:,smpIdx); obj_NMF = obj_NMF + sum(sum(dX.^2)); if deltaVU if dVordU dV(smpIdx,:) = dX'*U; else dV = dU+dX*V(smpIdx,:); end end end if deltaVU if dVordU dV = dV ; end end end %obj_Lap = alpha*sum(sum((L*V).*V));
obj = obj_NMF;
I find it will take a long time to execute the line 40, which is obj_NMF = obj_NMF + sum(sum(dX.^2)). Because the the class of dX is gpuArray, but obj_NMF is a normal variable, it seems that the system needs to wait the gpu execution complete before the addtion, which will take a long time. Moreover, even if I set the obj_NMF to be a gpuArray object, it still needs to wait the gpu complete. I want to know:
- why the system needs to wati the gpu complete?
- why gpu doesn't complete after executing a line?
- Is there any solution to accelate the process?
Best Answer