Hello userbase,
I'm new to regexes. I'm working with some transistor test data and trying to extract information from .csv file names for sorting prior to further probing.
They have often a format such as this:
target = Some Test Performed [12345678987_HS1 (further info including dates and temperatures)].csvtarget = Some Other Test [123456_LS (further info including dates and temperatures)].csv
I want to extract the entire string up to the HS variant, including the optional number that follows it, as this represents the device and test. The further info relates to parameters.
The Some Test Performed section can be single or multiple words, contain special characters (&-_).
I'm looking for HS, LS, HS1, HS2, HS3, LS1, LS2, LS3.
I've tried lookbehind assertions, but it feels cludgy and I've guessed a bit:
pattern = '(?<=((HS)|(HS)\d|(LS)|(LS)\d))\s'
How can I improve this?
What does the ? normally do? (I see that here is a special case for the lookaround.)
My desired regexp(target, pattern, 'match') output would be:
match = Some Test Performed [12345678987_HS1match = Some Other Test [123456_HS
Or at least the index of the final character so I could use target{1:match} to extract my string. Is there some useful 'from start or target until match' metacharacter?
Best regards and thanks for reading, Marshall
Best Answer