Skip to the content.

← Back to Home

πŸ“° News QnA Pipeline

Multi-model NLP system: retrieval β†’ summarization β†’ Q&A

πŸš€ Live Demo GitHub Repository

πŸ“ Overview

An end-to-end NLP pipeline that fetches real-time news, summarizes articles using transformer models, and answers questions about the content. Three separate ML models working in harmony to provide instant news insights.

Key Innovation: Combines API integration, summarization, and question answering in a single unified workflow with intelligent model management.


🎯 Key Features

βœ… Real-Time News Retrieval

βœ… Abstractive Summarization

βœ… Question Answering

βœ… Production Architecture


πŸ—οΈ Architecture

User Input (keyword)
    ↓
[LAYER 1: Data] NewsAPI Integration
    - Fetch articles by keyword
    - Filter by date/language/source
    - Parse JSON response
    ↓
Article Text
    ↓
[LAYER 2: Model] BART Summarization
    - Tokenize article
    - Generate abstractive summary
    - Post-process output
    ↓
Summary + Original Article
    ↓
User Question
    ↓
[LAYER 3: Model] DistilBERT Q&A
    - Encode question + context
    - Extract answer span
    - Calculate confidence
    ↓
Answer + Confidence Score

πŸ’» Technical Implementation

Three-Layer Architecture

Layer 1: Data Integration

# NewsAPI client for article retrieval
newsapi = NewsApiClient(api_key=API_KEY)

def fetch_news(keyword, from_date, to_date):
    articles = newsapi.get_everything(
        q=keyword,
        from_param=from_date,
        to=to_date,
        language='en',
        sort_by='relevancy'
    )
    return articles['articles']

Layer 2: Summarization Model

# BART for abstractive summarization
summarizer = pipeline(
    "summarization",
    model="facebook/bart-large-cnn",
    device=-1  # CPU
)

summary = summarizer(
    article_text,
    max_length=130,
    min_length=30,
    do_sample=False  # Deterministic
)

Layer 3: Question Answering Model

# DistilBERT for extractive Q&A
qa_pipeline = pipeline(
    "question-answering",
    model="distilbert-base-cased-distilled-squad",
    device=-1
)

answer = qa_pipeline(
    question=user_question,
    context=article_text
)

Singleton Pattern for Model Management

class ModelManager:
    _instance = None
    _models = {}
    
    @classmethod
    def get_instance(cls):
        if cls._instance is None:
            cls._instance = cls()
            cls._load_models()
        return cls._instance
    
    # Load models once, reuse forever!

πŸ› οΈ Tech Stack


πŸ“Š Performance Metrics

Model Performance:

BART Summarization:

DistilBERT Q&A:

Pipeline Performance:

Resource Usage:


πŸŽ“ Key Learnings

1. Model Management is Critical

2. Error Handling Saves the Day

3. Multi-Model Coordination Requires Care

4. DistilBERT vs BERT Trade-off is Worth It

5. Abstractive vs Extractive Summarization


πŸš€ Future Enhancements


πŸ“Έ Screenshots

News Search Interface: Search

Summarization Results: Summary

Q&A Interface: QA


πŸ”¬ Technical Deep Dive

BART (Bidirectional Auto-Regressive Transformer)

Architecture:

Why BART for Summarization?

Training:

Original Text β†’ [Add Noise] β†’ Noisy Text β†’ [BART] β†’ Reconstruct Original

DistilBERT

Why DistilBERT over BERT?

Distillation Process:

Teacher (BERT) β†’ [Knowledge Distillation] β†’ Student (DistilBERT)

NewsAPI Integration

Features Used:



← Back to Home