MATLAB: Profiling mex in Linux

linuxmexprofile

What are my options for profiling a MEX file in Linux?
I have a MEX function which runs a whole lot of C code and the code isn't coming even vaguely close to living up to performance expectations (same algorithm implemented in MATLAB is 10x faster). So I want to profile the C code to find where the bottlenecks are, preferably using the MEX due to the convenience of using MATLAB to get all of the input data setup.
Specifically I am using MATLAB R2016a on 64-bit Ubuntu (either 12.04 or 14.04) and gcc as the compiler.
I have seen some decent discussion of how to profile MEX files in Windows using Visual Studio, such as: http://www.mathworks.com/matlabcentral/answers/277171-profiling-mex-in-windows
I tried using MATLAB Engine within a separate C test runner file along withengEvalString() to call the MEX function directly from within MATLAB. This worked fine for running the MEX function from the command line, but it didn't work very well for profiling since MATLAB Engine runs MATLAB in a separate process.
I saw this post on StackOverflow about how to profile using Valgrind/Cachegrind and a test executable which uses MATLAB Engine to load data, but actually loads the MEX function dynamic library directly using dlopen()/dlsym(): http://stackoverflow.com/questions/11220250/how-do-i-profile-a-mex-function-in-matlab
With a a few modifications I was almost able to get that approach to work, but I get a segfault when it actually calls the mexFunction. The main modification I had to make was that I used this to compile my testMex.c function:
mex -g -client engine testMex.c -ldl
So I'm looking for any advice on how to effectively profile a MEX file on Linux.
I attached the code I am trying to use as profileMex.zip (which contains profileMex.c). For the mex file itself I'm using a really trivial "Hello World" type mex which just prints out the number of inputs and number of outputs.

Best Answer

Given the lack of answers I thought I would let people know what ended up working very well for me.
So I made the mexFunction() itself do almost nothing - just some basic validation of the inputs and outputs. I put all of the C functionality I cared about in separate myAlgorithm.h / myAlgorithm.c files with 1 public function declared in the *.h, with a prototype similar to:
mxArray* my_algorithm(mxArray *arg1, mxArray *arg2);
Then after validating the arguments, the mexFunction() makes a call to my_algorithm().
For profiling I created a completely separate testMyAlgorithm.c file which uses MATLAB Engine to load a *.mat file containing input data and then uses MATLAB engine to copy that data as mxArray pointers into the testMyAlgorithm process. Then the myAlgorithm() function containing the common code is called. This C file capable of being profiled is compiled with:
mex -g -client engine testMyAlgorithm.c myAlgorithm.c
Then to profile the code I used something similar to:
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/MATLAB/R2016a/bin/glnxa64/ valgrind --tool=callgrind ./testMyAlgorithm
This approach worked to run the code by itself as well as with gdb, valgrind, callgrind, cachegrind, etc.
Essentially it is really just C code. But MATLAB Engine and the mx library is being used for convenience so we can have common code shared between a MEX function and pure C code as well as a common way of reading input test data from MATLAB.
If anyone else has an easier way of doing this I would appreciate learning about it.