MATLAB: How to Count every word with three or more syllables in each group of sentences, even if the same word appears more than once.

count vowelssyllabes

Count every word with three or more syllables in each group of sentences, even if the same word appears more than once from a text file
syllables means a unit of pronunciation having one vowel sound, with or without surrounding consonants, forming the whole or a part of a word;

Best Answer

The only way you can do this is by using a database of word-syllable. Here's one way using www.dictionary.com as the database.
FullText = 'This is a really arbitrary sentence.';
%'really' could be pronounced 'ree-uh-lee' (3 syl)
TextCell = regexp(FullText, '\w+', 'match');
TextSyl = cellfun(@(x) getSyllable(x), TextCell);
TextSyl =
1 1 1 3 4 2
OneSylWord = sum(TextSyl == 1);
OneSylWord =
3
Where getSyllable function is:
function Syl = getSyllable(Word)
if nargin == 0
Word = input('What word do you want? ', 's');
end
if isempty(Word)
Syl = 0;
return
end
Word = strrep(Word, ' ', '');
% Use dictionary.com to get the phonetic transcription of a word
% Ex: arbitrary
% [ahr-bi-trer-ee]
% WARNING: will not work for some words if dictionary.com does not have it
% listed as the main word. Example, 'awesomeness' returns 2 because
%' awesome' is the main word in the site.
try
SiteTxt = urlread(sprintf('http://www.dictionary.com/browse/%s?s=t', Word));
catch
warning('Could not determine syllable for "%s". Returning 0.', Word);
Syl = 0;
return
end
CodeSrch1 = '"pron spellpron"[\s\w\d\>]+\[\s*(?<InnerCode>[^\]]+)';
InnerCode = regexp(SiteTxt, CodeSrch1, 'tokens');
InnerCode = InnerCode{1};
CodeSrch2 = '>(?<Phonetics>[^\<]+)';
Phonetics = regexp(InnerCode, CodeSrch2, 'tokens');
Phonetics = [Phonetics{1}{:}];
if isempty(Phonetics)
Syl = 1;
else
Phonetics = cat(2, Phonetics{:});
MultWord = regexp(Phonetics, ',', 'split'); %Sometimes many ways to say a word - take 1st option
Syl = sum(MultWord{1} == '-') + 1;
end