Can a TV Show’s Script Predict a Hit? Data Says Yes.

Ten years ago, television executives relied almost exclusively on ratings to determine whether a show was successful. Today, streaming has changed the rules. Success no longer depends solely on overnight ratings, but on sustaining subscriptions, retaining fickle viewers and competing in a crowded, fragmented marketplace.

My latest research asks a provocative question: What if executives could predict whether a show will hold its audience before it ever airs — simply by analyzing the script?

In a recent study published in the Journal of Marketing Analytics, I examined more than 25,000 episodes from 228 television series, using a TV sub caption website and pulling meta data for Nielsen ratings, year, season, genre, number of episodes and parent channel, among other measures, applying natural language processing (NLP) and machine learning to the dialogue.

By measuring features like words per sentence, emotional tone, and character confidence, I evaluated whether scripts carry signals about audience engagement. The findings suggest they do: in some cases, dialogue features alone explained as much as 50% of variation in viewership.

From Black Box to Data Asset

For decades, executives have relied on ratings as the measure of success, but ratings are lagging indicators. They only tell you what happened after a show aired. Scripts, on the other hand, are available before filming. If we can analyze them systematically, they become leading indicators of audience engagement.

The study draws on several established theories of how people process stories, including the peak-end rule, seriality and engagement theory and NLP and machine learning models.

The peak-end rule, proposed by psychologist Barbara Fredrickson and her colleague Daniel Kahneman, shows that people remember experiences based disproportionately on emotional highs and how they end. Engagement theory, meanwhile, emphasizes how serialized dialogue helps sustain immersion and transportation across episodes. And with advances in NLP and machine learning, researchers can now measure linguistic complexity and emotional dynamics at scale.

The result is a shift in how studios can think about scripts.

Importantly, scripts are strategic assets. They’re not just creative documents. They carry measurable signals that can inform decisions about greenlighting, promotion, and even retooling shows that aren’t resonating with audiences.

What Drives Viewership

To test predictability, I used a random forest — a machine learning algorithm that uses many decision trees to make better predictions — to isolate the top ten emotions across acts 1–3, and then built models using both classical methods, such as Ordinary Least Squares, and advanced machine learning techniques, including Elastic Net, Gradient Boosting, and XGBoost. After refining the features and validating accuracy, I trained models at two levels: individual series and broader genre categories.

The results were striking. At the genre level, the models explained up to 50% of viewership variation. At the series level, they explained about 41%.

In media research, where human attention is notoriously difficult to model, these levels of accuracy are significant.

The analysis uncovered patterns about what drives audience retention. Scripts with overly long or overly short sentences tended to lose viewers, while episodes ending with confident, dominant language kept them engaged. Emotional highs in the middle of an episode, such as anger or joy, also predicted whether audiences would return.

Certain shows stood out. Breaking Bad, Madam Secretary and Young Sheldon performed particularly well, with models explaining between 86% and 89% of their audiences. These shows shared tightly structured scripts, consistent pacing and clear narrative arcs that built and resolved tension.

Procedural dramas such as CSI and Cold Case also ranked high thanks to their formulaic rhythms and recurring dialogue patterns. By contrast, comedies such as Modern Family and 30 Rock, which rely less on act-based structure and more on humor or improvisation, proved harder to forecast.

Serialized dramas and procedurals provide the strongest predictive signals. Episodic comedies and action-heavy shows are less amenable to NLP forecasting because they lack the same continuity.

Why It Matters for Executives

The implications extend far beyond academic curiosity. For executives, predictive dialogue models create opportunities to reduce risk and make smarter investments. Scripts can be evaluated before production begins, helping studios decide which projects to greenlight and which to set aside. The same models can help reposition underperforming shows, by identifying weaknesses in pacing, dialogue arcs, or emotional tone before audiences tune out.

For marketers, the findings highlight when and how to emphasize emotional highs in campaigns. Trailers can spotlight peak tension moments, while targeted promotions can align with episodes that carry the strongest emotional resonance. Platforms can also schedule releases and ad placements around those episodes, stretching marketing dollars further while increasing engagement.

Dialogue itself becomes measurable. That’s a powerful shift. It allows executives to manage uncertainty with data-driven foresight.

The approach also supports benchmarking across genres, surfacing storytelling patterns and evolving audience preferences. Over time, such benchmarking could help executives understand not only what works for one show, but also what audiences expect across an entire category.

Looking Ahead

While the findings are notable, data doesn’t replace creativity. Writers still must tell compelling stories. But data can complement instincts, giving studios foresight into how audiences might respond. Preliminary research can be conducted to help elevate TV episodes (and perhaps other forms of art) and, at the very least, help guide creatives toward expression that will stabilize their audience bases.

I see a future where streaming platforms routinely run scripts through NLP models to forecast retention curves. Imagine knowing before production where viewers are most likely to stay hooked or drop off. That’s where entertainment analytics is heading, to the intersection of storytelling and data science.

Professor Anthony Palomba is the author of “Advancing Predictive Content Analysis: A Natural Language Processing and Machine Learning Approach to Television Script Data,” (2025) published in the Journal of Marketing Analytics.

About the Expert

Anthony Palomba

Assistant Professor of Business Administration

Anthony Palomba teaches leadership communication and data visualization in the MBA program as well as management communication in the MSBA program. His teaching interests are focused on how business professionals can present data results and actionable insights to key stakeholders through storytelling. In his courses, he sheds light on the way leadership communication intersects with persuasion and data-driven decision-making that lead co-workers to take actions toward reaching a shared vision or accomplishing a set of business goals.

Intellectually, Palomba is fascinated by media and entertainment companies and the way they market their products in a dynamically changing competitive landscape. As a media management scholar, Palomba's research focuses on consumer behavior, branding, and marketing behind video games, television and film. His research explores how and why audiences consume entertainment and strives to understand how consumer behavior models can be built to predict consumption patterns. Additionally, he studies how technology innovations influence competition among entertainment and media firms.

B.A., Manhattanville College; M.A., Syracuse University; Ph.D., University of Florida

READ FULL BIO