π° News QnA Pipeline
Multi-model NLP system: retrieval β summarization β Q&A
| π Live Demo | GitHub Repository |
π Overview
An end-to-end NLP pipeline that fetches real-time news, summarizes articles using transformer models, and answers questions about the content. Three separate ML models working in harmony to provide instant news insights.
Key Innovation: Combines API integration, summarization, and question answering in a single unified workflow with intelligent model management.
π― Key Features
β Real-Time News Retrieval
- NewsAPI integration for live article fetching
- Search by keywords and date ranges
- Filters for language and source reliability
β Abstractive Summarization
- BART model for high-quality summaries
- Reduces long articles to key points
- Maintains original meaning and context
β Question Answering
- DistilBERT for extractive Q&A
- Find exact answers within articles
- Confidence scores for each answer
β Production Architecture
- Singleton pattern for efficient model loading
- Multi-tab Gradio interface for seamless workflow
- Comprehensive error handling
- Graceful degradation (works even if API fails)
ποΈ Architecture
User Input (keyword)
β
[LAYER 1: Data] NewsAPI Integration
- Fetch articles by keyword
- Filter by date/language/source
- Parse JSON response
β
Article Text
β
[LAYER 2: Model] BART Summarization
- Tokenize article
- Generate abstractive summary
- Post-process output
β
Summary + Original Article
β
User Question
β
[LAYER 3: Model] DistilBERT Q&A
- Encode question + context
- Extract answer span
- Calculate confidence
β
Answer + Confidence Score
π» Technical Implementation
Three-Layer Architecture
Layer 1: Data Integration
# NewsAPI client for article retrieval
newsapi = NewsApiClient(api_key=API_KEY)
def fetch_news(keyword, from_date, to_date):
articles = newsapi.get_everything(
q=keyword,
from_param=from_date,
to=to_date,
language='en',
sort_by='relevancy'
)
return articles['articles']
Layer 2: Summarization Model
# BART for abstractive summarization
summarizer = pipeline(
"summarization",
model="facebook/bart-large-cnn",
device=-1 # CPU
)
summary = summarizer(
article_text,
max_length=130,
min_length=30,
do_sample=False # Deterministic
)
Layer 3: Question Answering Model
# DistilBERT for extractive Q&A
qa_pipeline = pipeline(
"question-answering",
model="distilbert-base-cased-distilled-squad",
device=-1
)
answer = qa_pipeline(
question=user_question,
context=article_text
)
Singleton Pattern for Model Management
class ModelManager:
_instance = None
_models = {}
@classmethod
def get_instance(cls):
if cls._instance is None:
cls._instance = cls()
cls._load_models()
return cls._instance
# Load models once, reuse forever!
π οΈ Tech Stack
- HuggingFace Transformers: Model hub and pipelines
- BART: Facebookβs Bidirectional Auto-Regressive Transformer for summarization
- DistilBERT: Distilled BERT (40% smaller, 60% faster) for Q&A
- NewsAPI: Real-time news aggregation API
- Gradio: Multi-tab web interface
- Python: Core implementation
- requests: HTTP client for API calls
π Performance Metrics
Model Performance:
BART Summarization:
- ROUGE-1: 44.16 (F1 score on CNN/DailyMail)
- Parameters: 406M
- Summary Quality: High coherence and fluency
DistilBERT Q&A:
- SQuAD F1: 87.1 (compared to BERTβs 90.9)
- Parameters: 66M (vs BERTβs 110M)
- Speed: 60% faster than BERT base
Pipeline Performance:
- News Fetch: ~1-2 seconds
- Summarization: ~3-5 seconds per article
- Question Answering: ~0.5-1 second per question
- Total Workflow: ~5-8 seconds from search to answer
Resource Usage:
- Memory: ~2GB (both models loaded)
- With Singleton Pattern: Models load once (10s) then instant
- Without Singleton: Models reload every request (10s per request!)
π Key Learnings
1. Model Management is Critical
- Initial version reloaded models on every request
- Singleton pattern reduced initialization from 10s to instant
- Critical for production user experience
2. Error Handling Saves the Day
- NewsAPI can fail (rate limits, network issues)
- Implemented graceful degradation: system works with cached articles
- User never sees broken experience
3. Multi-Model Coordination Requires Care
- Different models need different preprocessing
- Shared memory management crucial
- Careful tokenization prevents truncation issues
4. DistilBERT vs BERT Trade-off is Worth It
- 40% smaller, 60% faster
- Only 3-4% accuracy drop
- Much better user experience
5. Abstractive vs Extractive Summarization
- BART (abstractive): Generates new sentences, more natural
- Extractive: Just selects existing sentences, less flexible
- Abstractive better for this use case
π Future Enhancements
- Multi-article summarization (summarize multiple articles at once)
- Sentiment analysis (add emotion detection to articles)
- Entity recognition (extract people, companies, locations)
- Trend analysis (identify trending topics over time)
- Multi-language support (expand beyond English)
- Custom news sources (add RSS feeds)
- Export summaries (PDF/CSV download)
- Conversation memory (follow-up questions)
πΈ Screenshots
News Search Interface:
Summarization Results:
Q&A Interface:
π¬ Technical Deep Dive
BART (Bidirectional Auto-Regressive Transformer)
Architecture:
- Encoder-Decoder transformer
- Trained with denoising objective
- Pre-trained on 160GB of text
Why BART for Summarization?
- Generates new sentences (not just extraction)
- Maintains coherence and flow
- Handles long documents well
Training:
Original Text β [Add Noise] β Noisy Text β [BART] β Reconstruct Original
DistilBERT
Why DistilBERT over BERT?
- 40% smaller (66M vs 110M parameters)
- 60% faster inference
- 97% of BERTβs performance retained
- Perfect for production deployment
Distillation Process:
Teacher (BERT) β [Knowledge Distillation] β Student (DistilBERT)
NewsAPI Integration
Features Used:
get_everything(): Search across all sources- Filters: date range, language, keywords
- Sorting: relevancy vs recency
- Rate Limits: 100 requests/day (free tier)
π Links
- π Try Live Demo
- π» View Source Code
- π Read Documentation
- π BART Paper
- π DistilBERT Paper