Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
307 views
in Technique[技术] by (71.8m points)

python - Extracting a Regex between defined re.match intervals of a list of string elements using enumerate

I am trying to extract separate capture groups from a list (many elements). There are multiple capture groups, but the capture groups themselves have nothing unique.

my_list = ['this is a test element 1', 'I need to capture **after** this element','capture1','capture2', 'capture3','.........', 'I need to capture **before** this element and separately after this element' , 'captureA', 'captureB','captureC', 'last capture ends before this element]
my_reg = re.compile(r'.*this element.*')

code as follows:

match_indices = [i for i, s in enumerate(my_list) if my_reg.match(s)]
captured_text = my_list[min(match_indices)+1 : max(match_indices)]

match_indices gives me the list position of each matched element and captured text gets the actual elements in the positions in-between the first and last match.

I am having trouble getting captured text to read separate groups between each match position.

e.g. output as

Group1 = capture1capture2capture3

Group2 = captureAcaptureBcaptureC

rather than capture1capture2capture3captureAcaptureBcaptureC Any guidance? tks


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Using your code, returns

match_indices = [1, 6, 10]
captured_text = ['capture1', 'capture2', 'capture3', '.........', 'I need to capture **before** this element and separately after this element', 'captureA', 'captureB', 'captureC'] # all between 1 and 10

To capture groups between indices, you cannot use min() and max(). Instead iterate over each pair of adjacent indices in match_indices. The captured_text will return the list of lists.

my_list = ['this is a test element 1', 
    'I need to capture **after** this element',
    'capture1',
    'capture2',
    'capture3',
    '.........', 
    'I need to capture **before** this element and separately after this element',
    'captureA',
    'captureB',
    'captureC',
    'last capture ends before this element']

match_indices = [i for i, s in enumerate(my_list) if 'this element' in s]
captured_text = []
for i in range(1, len(match_indices)):
    start = match_indices[i-1] + 1
    end = match_indices[i]
    captured_text.append(my_list[start:end])
print(captured_text)
# captured_text = [
#    ['capture1', 'capture2', 'capture3', '.........'],
#    ['captureA', 'captureB', 'captureC']
# ]

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...