MATLAB: Is there a way to pull a specific link after using webread() to get the content from a page

webread

Essentially I'm using webread() to obtain the contents of a google search. If there's a Wikipedia link in the contents, I want to extract it. I've been using regexp(content,exp,'match') but I'm confused on how to create an expression that'll match the Wikipedia link. I know that doing something such as:
regexp(content,'https?://en\.?\w*\.?\w')
Will get me the 'https://en.wikipedia.org' portion of the link, but this expression seems unnecessary just for that part already. I can continue doing that for the whole link but the amount of words in the Wikipedia link will vary so I'm unsure how to contain just the link and not accidentally take text following the link.
(e.g https://en.wikipedia.org/wiki/List_of_landmark_court_decisions_in_the_United_States or https://en.wikipedia.org/wiki/Banana)
In the text that is read, it appears that the link is followed by the &amp. Perhaps I can take all the characters from http to &amp but it would be nice to get some tips on how to create an expression for that!
Thanks for the help!

Best Answer

I think I've solved it by putting '\S+' in the expression and '?=&sa'. That way the expression will match all the characters following 'https?://en' but stop at the right point.
regexp(content,'https?://en.\S+(?=&(amp);sa)','match')
This will find everything up until the '&(amp);sa'! If there's a more efficient way of doing this let me know!