MATLAB: Code formatting in the forum

codeformatmeta

Although this forum is online in the 3rd year now and thousands of examples can be found, it is still a tedious task to suggest beginners to format their code. The experienced contributors have explained the procedure thousands of times, and less than a hand full of the beginners found the time to thank them for this.
The problem has been mentioned exhaustively in the wish-list already. It shouldn't be complicated to solve this problem by adding explicit instructions for the first 5 times users post a question. Obviously neither the "{} code" nor the "? Help" button encourage people to learn the basics in the forum. But I'd hope that they spend the time to read text instructions like:
Formatted code is a core feature of this forum. Insert a blank line before and after the code and start each line with at least 2 spaces.
Follow the "? help" button to learn more.
And when this message disappears after the 5th posting, it could even get a red background and some flashing effects.
This would be much more efficient than letting the editors and other diligent users do this ungrateful job.

Best Answer

EDIT @ 4:30pm EST: strfind -> regexp with neg. look behind for avoind matching nbsp;.
Here is a simple crawler. It is not my original idea, which was a mechanism at Mathworks level and not at a user (one of us) level. I implemented a few criteria which are not those listed above, as the crawler has to work with content that was already parsed and "preformatted" by the forum.
The criteria implemented should be improved. Typically, the function call(s)/def(s) detection is too "simple" and generates false positive when users write function names followed by parentheses in normal text.
Anyhow, this is just a simple demo.
The whole code below (both functions) should be saved in forumCrawler.m, and you can set pageDepth to control how many forum pages you want to process.
----------------------------------------------------------------------------------------------------------------
function forumCrawler
pageDepth = 1 ;
baseURL = 'http://www.mathworks.com' ;
for pageId = 1 : pageDepth
fprintf('\n=== Processing page %d..\n', pageId) ;
url = sprintf('%s/matlabcentral/answers/?page=%d', baseURL, pageId) ;
thread = regexp(urlread(url), '(?<=<h3><).*?(?=")', 'match') ;
nThread = length(thread) ;
for tId = 1 : nThread
fprintf(' - Analyzing thread %d/%d..\n', tId, nThread) ;
url = sprintf('%s%s', baseURL, thread{tId}) ;
htmlBuffer = urlread(url) ;
% - Scan question.
question = regexp(htmlBuffer, ...
'(?<=class="question-body ).*?(?=</div>)', 'match') ;
[tf, msg] = isLikelyUnformatted(question{1}) ;
if tf
fprintf(' [<a href="%s">question>] %s.\n', url, msg) ;
end
% - Scan answers.
answer = regexp(htmlBuffer, ...
'<div id="([^"]+)" class="answer-body">(.*?)</div>', 'tokens') ;
for cId = 1 : length(answer)
[tf, msg] = isLikelyUnformatted(answer{cId}{2}) ;
if tf
answerUrl = sprintf('%s#%s', url, answer{cId}{1}) ;
fprintf(' [<%s answer> ] %s.\n', ...
answerUrl, msg) ;
end
end
% - Scan comments.
comment = regexp(htmlBuffer, ...
'<div id="([^"]+)" class="comment-body">(.*?)</div>', 'tokens') ;
for cId = 1 : length(comment)
[tf, msg] = isLikelyUnformatted(comment{cId}{2}) ;
if tf
commentUrl = sprintf('%s#%s', url, comment{cId}{1}) ;
fprintf(' [<%s comment> ] %s.\n', ...
commentUrl, msg) ;
end
end
end
end
end
function [tf, msg] = isLikelyUnformatted(content)
tf = true ;
% Eliminate content within <pre>.. and |..| tags,
% so we work on what is meant to be text.
buffer = regexp(content, '
', 'split') ;
content = [buffer{:}] ;
buffer = regexp(content, '<tt.*?</tt>', 'split') ;
content = [buffer{:}] ;
% Check for a few indicators.
if ~isempty(regexp(content, '\w:\w', 'ONCE'))
msg = 'range def. found' ; return ; end
if ~isempty(regexp(content, '\w(', 'ONCE'))
msg = 'function call(s)/def(s) found' ; return ; end
if ~isempty(regexp(content, '(?<!nbsp);</p>', 'ONCE'))
msg = '";</p>" found' ; return ; end
tf = false ;
msg = '' ;
end
Related Question