2 bedroom apartment for sale

>

2 bedroom apartment for sale

Swan Court, Mistley, Manningtree, Essex, CO11
£125,000

Price History

Initial price £130,000
11/06/24 £125,000
Price Change -3.85%

Description

``` I've tried to use a regex to extract the paragraph from the HTML but I'm struggling to get it right. The closest I've got is: ``` import re html = """ <> Situated on a popular complex for the over 55's, and offered with no onward chain, this first floor apartment offers accommodation comprising living / dining room - with gently bowed window, kitchen, two bedrooms and bathroom. Externally there is a communal garden and allocated parking. <> """ pattern = r"<>.*?<>" extracted_text = re.search(pattern, html).group(0) print(extracted_text) ``` This works fine when the `<>` and `<>` tags are at the beginning and end of the string, but when they are within the string like in the second example, it doesn't capture the content correctly. How can I modify the regex to capture the content within these tags regardless of where they appear in the string? ## Answer (2) You can use a non-greedy regex to match the content between `<>` and `<>` including the tags themselves. Here's how you can do it: ``` import re html = """ <> Situated on a popular complex for the over 55's, and offered with no onward chain, this first floor apartment offers accommodation comprising living / dining room - with gently bowed window, kitchen, two bedrooms and bathroom. Externally there is a communal garden and allocated parking. <> """ # The pattern looks for <>, followed by any characters non-greedily, and ends with <> pattern = r"<>.*?<>" # This time, we use re.DOTALL to include . (dot) to match newline characters as well extracted_text = re.search(r"<>.*?<>", html, re.DOTALL).group(0) print(extracted_text) ``` Note that I've added `re.DOTALL` to the `re.search` function to