MATLAB: Regexp with multiple lines

regexp url read

Hello, I would like to use urlread and regexp to extract a specific number from the url. Part oftThe content of the url is:
<td class="text-right">1</td>
<td data-sort="Ethereum"><img src="https://s2.coinmarketcap.com/static/img/coins/16x16/1027.png" class="logo" alt="Ethereum"><a href="/currencies/ethereum/" class="market-name">Ethereum</a></td>
<td data-sort="ETH/USD"><a href="https://www.kraken.com" target="_blank">ETH/USD</a></td>
<td class="text-right" data-sort="41450600.0">
<span class="volume" data-usd="41450600.0" data-btc="4441.21" data-native="54539.6">
$41,450,600
</span>
</td>
<td class="text-right" data-sort="760.01">
<span class="price" data-usd="760.01" data-btc="0.0814309" data-native="760.01">
$760.01
</span>
</td>
<td class="text-right" data-sort="20.679496448"><span data-format-percentage data-format-value="20.679496448">20.68</span>%</td>
<td class="text-right ">Recently</td>
</tr>
<tr>
<td class="text-right">2</td>
<td data-sort="Bitcoin"><img src="https://s2.coinmarketcap.com/static/img/coins/16x16/1.png" class="logo" alt="Bitcoin"><a href="/currencies/bitcoin/" class="market-name">Bitcoin</a></td>
<td data-sort="BTC/EUR"><a href="https://www.kraken.com" target="_blank">BTC/EUR</a></td>
<td class="text-right" data-sort="36350300.0">
<span class="volume" data-usd="36350300.0" data-btc="3894.74" data-native="3905.16">
$36,350,300
</span>
</td>
<td class="text-right" data-sort="9308.28">
<span class="price" data-usd="9308.28" data-btc="0.997331" data-native="7839.6">
$9308.28
</span>
I would like to extract this content, that begins with the word Ethereum and finishes with the number 760.01:
Ethereum</a></td>
<td data-sort="ETH/USD"><a href="https://www.kraken.com" target="_blank">ETH/USD</a></td>
<td class="text-right" data-sort="41450600.0">
<span class="volume" data-usd="41450600.0" data-btc="4441.21" data-native="54539.6">
$41,450,600
</span>
</td>
<td class="text-right" data-sort="760.01">
<span class="price" data-usd="760.01" data-btc="0.0814309" data-native="760.01">
$760.01
I'm trying to use this code, but I don't know what expression to use:
urlKraken='https://coinmarketcap.com/exchanges/kraken/';
strC=urlread(urlKraken);
expression='';
[startIndex,endIndex] = regexp(strC,expression);

Best Answer

The value if very variable, so you need to extract it first, before you can compose the appropriate expression: Ethereum.*\$760.01
%load the data
urlKraken='https://coinmarketcap.com/exchanges/kraken/';
strC=urlread(urlKraken);%#ok<URLRD> apparently you're on an old release
%find a few relevant markers
ind1=strfind(strC,'ETH/USD');
ind2=strfind(strC,'$');
ind3=strfind(strC,char(10));%#ok<CHARTEN> old releases don't have the newline function
%The value you're looking for is between the second dolar sign after the
%first mention of 'ETH/USD' up to the newline character.
ind1=ind1(1);
ind2(ind2<ind1)=[];ind2=ind2(2);
ind3(ind3<ind2)=[];
val=strC((ind2+1):(ind3(1)-1));
expression=['Ethereum.*\$' val];
[startIndex,endIndex] = regexp(strC,expression);
%if you only want the snippet you show in your question, use this:
expression=['Ethereum</a></td>.*\$' val];
[startIndex,endIndex] = regexp(strC,expression);