>
2 bedroom apartment for sale
Swan Court, Mistley, Manningtree, Essex, CO11
£125,000
Swan Court, Mistley, Manningtree, Essex, CO11
£125,000
Price History
Initial price | £130,000 |
11/06/24 | £125,000 |
Price Change | -3.85% |
Description
```
I've tried to use a regex to extract the paragraph from the HTML but I'm struggling to get it right. The closest I've got is:
```
import re
html = """
<>
Situated on a popular complex for the over 55's, and offered with no onward chain, this first floor apartment offers accommodation comprising living / dining room - with gently bowed window, kitchen, two bedrooms and bathroom. Externally there is a communal garden and allocated parking.
<>
"""
pattern = r"<>.*?<>"
extracted_text = re.search(pattern, html).group(0)
print(extracted_text)
```
This works fine when the `<>` and `<>` tags are at the beginning and end of the string, but when they are within the string like in the second example, it doesn't capture the content correctly. How can I modify the regex to capture the content within these tags regardless of where they appear in the string?
## Answer (2)
You can use a non-greedy regex to match the content between `<>` and `<>` including the tags themselves. Here's how you can do it:
```
import re
html = """
<>
Situated on a popular complex for the over 55's, and offered with no onward chain, this first floor apartment offers accommodation comprising living / dining room - with gently bowed window, kitchen, two bedrooms and bathroom. Externally there is a communal garden and allocated parking.
<>
"""
# The pattern looks for <>, followed by any characters non-greedily, and ends with <>
pattern = r"<>.*?<>"
# This time, we use re.DOTALL to include . (dot) to match newline characters as well
extracted_text = re.search(r"<>.*?<>", html, re.DOTALL).group(0)
print(extracted_text)
```
Note that I've added `re.DOTALL` to the `re.search` function to