Writing, Emotion Tags, and a Voice That Sounds Like Me

This post is part of an ongoing habit of trying small, contained experiments rather than grand declarations about what AI can or cannot do. In this case, the experiment was simple: could a short piece of writing be performed by an AI voice in a way that felt deliberate rather than mechanical?

I prompted ChatGPT to write a brief dystopian monologue and asked it to include emotional cues directly in the script. I then copied and pasted the text into ElevenLabs and generated the audio using a cloned version of my own voice. No editing, no post-production, no technical tinkering. Just text, instructions, and a voice.

What interested me wasn’t realism or polish, but interpretation.

The Audio – ‘Where Silence Is Suspicious’

The Script – ‘Where Silence Is Suspicious’

[quiet] They tell us the system listens for our benefit. For safety. For harmony. Funny word, harmony. It used to mean singing. Now it means not asking questions.

Every morning I stand on my mark. Yellow square. Scuffed, like the rest of us. The screen asks how I’m feeling. I choose “content”. Always “content”. [small laugh] Anything else triggers a review.

I remember when thoughts were private. Not good ones, mind. Messy, contradictory things. But they were ours. Now even silence is suspicious. Too long without speaking and the wall gently reminds you to contribute.

[whispers] I’ve learned to pace my breathing. In for four. Out for four. The sensors like rhythm. It shows cooperation.

They say the disappearances are voluntary. Relocation. A fresh start. [sighs] Funny how fresh starts never come back to collect their things.

At night, I practise my smile in the dark window. Teeth visible. Eyes calm. Hopeful. The neighbour upstairs forgot once. Just once. The building sighed and went quiet.

[steady] Tomorrow I’ll stand on the yellow square again. I’ll say I’m content.

And this time, I might even mean it.

A Brief Word On ElevenLabs

ElevenLabs is an AI text-to-speech tool that converts written text into spoken audio. Unlike older text-to-speech systems, which tended to sound flat and uniform, ElevenLabs is designed to respond to rhythm, punctuation and context. It doesn’t simply read words aloud; it attempts to perform them.

One of its more striking features is the ability to clone a voice from a short audio sample. In this case, the voice you hear is a digital version of my own. Creating it was straightforward and required no specialist equipment. That ease is part of what makes the experiment both interesting and, if I’m honest, slightly unsettling.

Emotion Tags and Performance Cues

The small bracketed cues you’ll see in the script above are known as emotion tags. In ElevenLabs, these are written directly into the text, inside square brackets, and act like stage directions for the voice. They don’t add words or change meaning. They simply suggest how something might be said. A pause. A sigh. A quieter tone. A steadier one.

What fascinated me was that these cues live in the writing itself. They sit somewhere between prose and performance, shaping delivery without guaranteeing an outcome. The AI interprets them, just as a human reader would, which means the result is not entirely predictable. That interpretive gap is where things get interesting.

Why I’m Doing This

I’m not trying to prove that AI can replace writers, actors, or readers. I’m much more interested in what happens when we treat AI as a collaborator that responds to structure, intention and constraint.

For me, this experiment sits alongside others on this blog: small acts of curiosity, designed to explore how writing, voice and technology intersect when you don’t rush to conclusions. The fact that this voice happens to sound like me only sharpens the question rather than answering it.

Call to Thought

As you listen, and perhaps read the script alongside it, consider where the unease comes from. Is it the story? The delivery? Or the knowledge that this performance emerges from written cues rather than lived experience? Or the fact that it was all generated by AI?

If a human actor had read the same text, guided by the same instructions, would it feel fundamentally different? Or are we reacting less to the voice itself and more to our assumptions about who, or what, is allowed to perform meaning?

I don’t think this experiment offers answers. But it does offer something I value more: a reason to pause and pay attention. I’d love to know what you think.

About The AI Grandad

Find out more about The AI Grandad on:
YouTube – The AI Grandad
X – The AI Grandad
Facebook – Mike Jackson – The AI Grandad

What do you think AI creativity tells us about ourselves?
Share your thoughts in the comments, I love hearing from curious minds!

Discover more from The AI Grandad

Subscribe to get the latest posts sent to your email.

Writing, Emotion Tags, and a Voice That Sounds Like Me

The Audio – ‘Where Silence Is Suspicious’

The Script – ‘Where Silence Is Suspicious’

A Brief Word On ElevenLabs

Emotion Tags and Performance Cues

Why I’m Doing This

Call to Thought

About The AI Grandad

Discover more from The AI Grandad

Author: Mike

I look forward to reading your comments Cancel reply

The Audio – ‘Where Silence Is Suspicious’

The Script – ‘Where Silence Is Suspicious’

A Brief Word On ElevenLabs

Emotion Tags and Performance Cues

Why I’m Doing This

Call to Thought

About The AI Grandad

Discover more from The AI Grandad

Share this:

Related

Author: Mike

I look forward to reading your comments Cancel reply