You can overload * and + and include a counter:
C = builtin('plus', A, B);
Count = builtin('plus', Count, numel(C));
Now the additions are counted. This has to be done for each operation you want to examine.
But what do you expect? What's about:
Of course the indices in double type require some additions to create the both vectors. Then you need some multiplications internally like * sizeof(double) for calculating the required size of memory. Finally it matters, if the the processed blocks match into the processor cache or not. Otherwise the CPU needs some NOPs to wait for the slow RAM access. Perhaps the processor can use this time to process another thread.
Some trigonometric functions might be calculated by the CPU in Matlab, but by a library in the embedded machine. This might even dominate the total processing time.
My conclusion: There is no reliable way to estimate the performance based on any kind of counting "operations". You can guess, that a code, which uses millions of operations, counted by the overloading method above, will take more time than a version, which needs some dozens only. But this is not strong or accurate.
Best Answer