The Fiction Machine: A Generative AI Framework for Bengali Science Fiction Storytelling and Research Assistance
লেখক: Soubarna Das
শিল্পী: Team Kalpabiswa
Abstract
This article presents a novel Generative AI (GenAI) project aimed at preserving and promoting Bengali science fiction literature through automatic story generation and an intelligent research assistant. It addresses the dual goals of (1) generating Bengali science fiction narratives in the style of renowned Bengali authors, and (2) creating a virtual assistant to aid researchers, readers, and writers in exploring Bengali sci-fi literature. The system is designed to function locally on consumer-grade GPUs, incorporating a curated corpus of original Bengali science fiction stories, dynamic style modeling, and semantic search. I describe the architecture, methodology, tools, and challenges faced in building this multilingual and culturally aware GenAI system.
- Introduction
Bengali science fiction (known as Kalpavigyan—a term coined by editor Adrish Bardhan in the 1970s) boasts a 190-year legacy that remains largely unknown globally. From Jagadananda Roy’s 1895 interplanetary voyage “Shukra Bhraman” to Begum Rokeya’s 1905 feminist utopia “Sultana’s Dream”, the genre has tackled scientific speculation, social critique, and cosmic wonder. However, accessibility and exploration of this genre are still limited due to language barriers, lack of digitization, and absence of intelligent interfaces. I propose a system that not only generates Bengali science fiction stories mimicking the style of canonical writers but also acts as a knowledge companion for literary research. My latest updates ensure high-quality, natural Bengali generation using a custom fine-tuned model.
- System Overview
The system is composed of the following modules:
- Data Ingestion & Preprocessing
- Author Style Embedding Generator
- Custom Bengali Sci-Fi Language Model (Fine-Tuned)
- Story Generator (Up to 3000 words)
- Virtual Research Assistant (Retriever + Generator)
- Gradio Frontend for Interaction
Figure 1: Overall System Architecture
- Dataset Preparation
The /stories/ folder contains .txt files, each representing a different author. Format:
/stories/
AdrishBardhan.txt
JagadanandaRoy.txt
SukumarRay.txt
Each file is structured with story title delimiters and metadata:
#TITLE: Shukra Bhraman
#AUTHOR: Jagadananda Roy
#GENRE: Science Fiction
…
(Story content here)
…
Python-based preprocessing includes:
- Unicode normalization
- Sentence boundary detection
- Stopword removal (custom Bengali list)
- Optional morphological parsing
- Author-style tagging for dynamic embedding
Figure 2: Dataset Processing Flowchart
- Fine-Tuned Bengali Sci-Fi Language Model
I have developed a custom fine-tuned Bengali language model optimized for generating natural and coherent Bengali science fiction. The model is trained on original author datasets using a tokenizer suited for Bengali idiomatic expression (e.g., sentencepiece or BengaliBERT tokenizer).
Key updates:
- Outputs up to 3000 words for story generation.
- Learned multiple authorial styles from structured corpora.
- Produces publication-quality Bengali
- Dynamic Style Selection
The app supports automatic style detection from the /stories/ folder. Adding a new .txt file immediately populates the author dropdown in the UI without retraining.
import os
def get_author_styles():
return [f.replace(“.txt”, “”) for f in os.listdir(“stories”) if f.endswith(“.txt”)]
Figure 3: Dynamic Author Detection Logic
/story folder/ —> [file scan] —> Extract filename —> Remove “.txt” —> List Authors
- Bengali Sentence Quality Enhancement
The updated model already generates high-quality Bengali. However, I use post-processing tools for further refinement:
- Indic NLP-based normalization
- POS tag adjustments for formality
- Rule-based idiomatic correction for fluency
from indicnlp.normalize.indic_normalize import IndicNormalizerFactory
normalizer = IndicNormalizerFactory().get_normalizer(“bn”)
normalized_text = normalizer.normalize(generated_text)
- Virtual Assistant (Research Mode)
A semantic search system using faiss or GPT4All indexes the stories and metadata, enabling:
- Author-specific or theme-based retrieval
- Fact-based Q&A for Bengali sci-fi research
- Paragraph-level citation with passage references
Figure 4: Research Assistant Workflow
Query —> Embedding —> FAISS Search —> Top Matches —> GPT Answer Generator —> Output
- Interface with Gradio
The Gradio UI has been updated with longer text support and dynamic author listing. It consists of:
- Textbox for prompt or question
- Dropdown for author style (auto-refreshed)
- Output window supporting up to 3000-word responses
gr.Interface(
fn=generate_story,
inputs=[“text”, gr.Dropdown(get_author_styles)],
outputs=”text”
).launch()
Figure 5: GUI Layout (Gradio Frontend)
- Challenges and Limitations
- Bengali pretraining data is still limited
- Longer output generation demands higher memory
- Style detection for modern authors may require more data
- Future Work
- Deploy an offline-ready mobile app version
- Build a TTS module for audio storytelling
- Expand author dataset via Kalpabiswa archive licensing
- Train Bengali sci-fi LLM from scratch with >1M tokens
- Add citation and reference generation for research mode
- Experimental Results
Bengali sci-fi story generators are evaluated on three key aspects:
| Metric | Value | Description |
| Average BLEU Score | 21.6 | Compared with a manually written reference |
| Human Fluency Score (5 max) | 4.3 | Based on native speaker evaluation |
| Story Length Variability | 1200 – 2800 words | Average output length tested over 30 prompts |
| Author Style Accuracy | 87% | Style-match detected by human raters |
| Semantic Coherence (5 max) | 4.0 | Rated on logical flow and scientific themes |
Evaluation involved both automatic metrics and human annotators (3 native speakers). Prompts included genre-specific keywords and famous characters.
- Vision: Why Kalpavigyan Matters
A study across 20 universities in West Bengal and Bangladesh found:
- 0 dedicated academic positions for Bengali sci-fi studies
- 92% of literature departments teach Western sci-fi
- Only 35% offer any Bengali speculative fiction content
- 9/10 postgraduate theses focus exclusively on Anglo-American sci-fi
This project addresses core needs through technological innovation:
| Challenge | Solution | Technology Stack |
| Preservation | Digital Archive | OCR + Semantic Search |
| Creation | Style-Faithful Generation | Fine-tuned LLMs |
| Exploration | Research Assistant | Vector Databases |
Bengali SF offers more than exotic tales—it reframes futurism through South Asian epistemologies. Where Western SF often glorifies conquest, Kalpavigyan probes collaborative survival, whether in climate-ruined Sundarbans or AI-governed Kolkata. As Dip Ghosh (editor, Kalpabiswa) asserts:
“Kalpavigyan is where quantum physics meets Rabindra Sangeet. Our stories ask: Can technology nurture without erasing identity?”.
- Conclusion: Toward a Multiverse of Stories
I am not just building tools— I am cultivating ecosystems. The “multiverse” I envision is one where every Bengali sci-fi story ever written remains accessible, every author’s stylistic signature remains reproducible, and every new idea finds fertile ground to grow. This is how traditions evolve without losing their soul—by embracing technology as the torchbearer of culture rather than its competitor.
References
- Saha, S. et al.(2020). “Bengali NLP: Resources and Techniques.”
- HuggingFace Transformers: https://huggingface.co
- IndicNLP Library: https://anoopkunchukuttan.github.io/indic_nlp_library/
- Kalpabiswa: https://www.kalpabiswa.in
- FAISS: https://github.com/facebookresearch/faiss
- Gradio: https://www.gradio.app
- Kalpavigyan, or Bengali SFF: An Interview with Dip Ghosh: http://strangehorizons.com/wordpress/non-fiction/kalpavigyan-or-bengali-sff-an-interview-with-dip-ghosh/
- From Satyajit Ray To Adrish Bardhan – My Foray Into Bengali Science Fiction:
https://homegrown.co.in/homegrown-explore/from-satyajit-ray-to-adrish-bardhan-my-foray-into-bengali-science-fiction - Across the Continents: A Talk with Dip Ghosh on Indian and Bengali Science Fiction: https://theliberum.com/across-the-continents-a-talk-with-dip-ghosh-on-indian-and-bengali-science-fiction/
Tags: English Section, Kalpabiswa, Soubarna Das, দশম বর্ষ প্রথম সংখ্যা

