>
2 bedroom maisonette for sale
Harrow, Middlesex, HA3
£325,000
Harrow, Middlesex, HA3
£325,000
Price History
Initial price | £350,000 |
23/06/24 | £325,000 |
Price Change | -7.14% |
Description
```
I've tried the following code to summarize the property description:
```python
from transformers import pipeline
summarizer = pipeline("summarization")
summary = summarizer(texts=[property_description], max_length=130, truncation=True)
print(summary[0]['summary_text'])
```
However, the output I get is a summary of the entire list of features, rather than a single paragraph that captures the essence of the property without listing each feature individually.
How can I modify the code or approach to get a single paragraph summary that captures the essence of the property description without using lists or bullet points?
## Answer (1)
The issue with using a summarization model like GPT-3 or BART (which are commonly used in Hugging Face's Transformers library) is that they don't inherently understand the structure of the text. They simply look at the input text and try to produce a summary based on the patterns they've learned during training. Since your input text is a list, the model will treat it as such and create a summary that reflects the list format.
To get a better summary, you need to preprocess the text to remove the list structure before feeding it to the summarization model. Here's a step-by-step approach:
1. Remove the list markers (`<>`, `<>`, and the bullet points).
2. Concatenate the remaining text into a single paragraph.
3. Feed the new text to the summarization model.
Here's how you can implement this:
```python
import re
from transformers import pipeline
# Function to preprocess the property description
def preprocess_description(description):
# Remove list markers
description = re.sub(r'<>\n', '', description)
description = re.sub(r'<>\n', '', description)
description = re.sub(r'\n\n', '\n', description) # Remove extra newlines
# Remove bullet points and numbers
description = re.sub(r'-\s*', '', description)
description = re.sub(r'\s*\*\*', '', description)
description = re