Our objective is to apply various traditional and methorn methods of NLP in order to gain interesting insights into the show and its characters by only looking at "what the data says". More specific, we analyze characters, relationships, sentiments and topics to identify speaking styles and developments. We want to provide additional insights both for fans and for people who did not watch the show.
Find our used data here.
This repository also contains scripts to train models to generate scenes (such as the scene above) and to classify the speaker of a line.
We uploaded the fine-tuned models to HuggingFace to make them easy accessible for everyone. There you can find the Speaker Classification and Scene Generation models and directly test them via Inference API.
- That’s what the data said (Part I): Analyzing Script Lines from the US TV-Show “The Office“
- That’s what who said (Part II): “The Office” Speaker Classification (DistilBERT) and Scene Generation (GPT)
This project was done in the course of the lecture "Intelligent Text Analysis" at Ravensburg Cooperative State University (DHBW). The paper we wrote on our results can also be found in this repository.