Code is Poetry. This is part of the WordPress philosophy. As a coder and a poet, I have always loved this phrase. I decided to turn this phrase around and ask, Can I make poetry with code? Could I make a bot that could write original poetry? I created an experiment to find out.
First off, I knew that if my bot was to learn to write poetry, it first had to read poetry. In 2017, authors used WordPress to publish over half a million posts tagged as poetry. I reached out to some prolific poets sharing their work with WordPress and asked if they’d be willing to collaborate with me on a fun experiment: would they allow my bot to read their work so that it could learn about poetic form and structure, so that it might learn to write its own poetry? Special thanks to these intrepid writers who collaborated for the sake of science!
- O at the Edges – Robert Okaji
- Wolff Poetry – Linda J. Wolff
- Poetry, Short Prose and Walking – Frank Hubeny
- Perspectives on Life, the Universe and Everything – Aurangzeb Bozdar
What is an LSTM and how does it generate text?
I built my bot using a type of neural network called an LSTM, or Long Short Term Memory network.
A neural network uses layers to break down a problem into a series of smaller problems. For instance, suppose you were training a neural network to recognize a square. One layer might be responsible for identifying right angles, and another might recognize parallel lines. Both of these qualities must be present in order for the image to be a square. The neural network learns the qualities necessary by training on millions of images of squares. It learns what aspects of the image are important to recognizing a square, and those that aren’t.
Now suppose you are using a neural network to predict the next letter in this sequence:
As a human, this task is pretty straightforward. Chances are, you guessed e, but I’ll bet that, if you are an English speaker, you didn’t guess q. That’s because you’ve learned that th is not followed by q in the English language. The preceding letters are extremely relevant to predicting what comes next. An LSTM can “remember” its previous state to inform its current decision. For a more in-depth explanation of how an LSTM works, check out this excellent post by Chris Olah, of Google Brain.
Like many LSTM text-generation examples, my bot generates text by producing one character at a time. So to put words together in any meaningful fashion, it must first learn how to make words. To achieve this, it needs millions of example sequences that contain valid words. It’s a good thing WordPress.com has plenty of poetry!
Preparing the data set
I started by pulling all of the poetry from the sites listed above from our Elasticsearch index. I stripped out everything but the text of the poem, using a very simple rule, based on the number of words per
\n character. If a block of text contains many words but few
\n characters, this is likely to be a collection of one or more paragraphs. However, a block of text with words spread across many lines is more likely to be a poem. This is a simplistic method, for sure, and I can think of plenty of great poems that would fail this test! But for the purposes of this experiment I was specifically interested in whether the LSTM could learn about structure, like line breaks and stanzas, and other poetic devices such as rhyme, assonance, consonance, and alliteration. So, restricting the training data to fairly structured poetry made sense.
Once a block of text was determined to be a poem, I wrote it out to a text file, prefixing it with
++++\n to indicate the start of a new poem. This resulted in about 500KB of training data. Usually, I try to use at least 1MB of text to train an LSTM, so I needed to find more poetry! To supplement the featured poets, I used a random sample of public posts tagged poetry published in the last year. This is similar to what you might discover if you follow the poetry tag in the WordPress.com Reader. I limited the random poetry to one post per author.
Training the LSTM network
Once I had over 1 MB of poetry, I began building an LSTM network. I use the Python library keras for all my neural network needs, and the keras GitHub repo has dozens of example scripts to help you learn to work with several different types of neural nets, including one for text generation using LSTMs. I modelled my code after this example, and began experimenting with different model configurations. The goal of the model is to produce original poetry. In this case, overfitting — or learning the training data too specifically so the model doesn’t generalize — could result in generating text that too closely resembles the input text. (Which would be like plagiarism, and no poet likes that!) One way to prevent overfitting is to add dropout to your network. This forces a random subset of weights at each step to drop to zero. This is kind of like forcing the network to “forget” some of what it just learned. (I also added extra post processing checks to prevent poets’ work from being reproduced by the bot.)
I used FloydHub’s GPUs to do the heavy lifting of training my network. This allowed me to train the network about 10 times as fast as with my laptop. My first network featured a single LSTM layer followed by a dropout layer. This resulted in text that certainly looked like poetry! It had line breaks and stanzas, and nearly all of the character combinations were actual words. Occasionally entire lines were slightly coherent. In fact, that first iteration produced this gem:
I added LSTM layers, experimenting with the dropout level within each layer until settling on the final model below. I chose to stop at three LSTM layers, because at this point the training time started to become unreasonable and the results were pretty decent.
model = Sequential() model.add(LSTM(300, input_shape=(maxlen, len(chars)), return_sequences=True, dropout=.20, recurrent_dropout=.20)) model.add(LSTM(300, return_sequences=True, dropout=.20, recurrent_dropout=.20)) model.add(LSTM(300, dropout=.20, recurrent_dropout=.20)) model.add(Dropout(.20)) model.add(Dense(len(chars))) model.add(Activation('softmax')) model.compile(loss='categorical_crossentropy', optimizer='adam')
Here’s a plot comparing the loss curves for the models as additional LSTM layers were added.
Yikes! Spikes! What’s that all about?? Turns out that this happens pretty commonly when using the
adam optimizer. Notice that as I added LSTM layers to the network, the validation loss for the model continued to decrease overall, and at a faster rate. This means viable results in fewer epochs, but the additional layers increase the training time per epoch. Training with a single LSTM layer took about 600 seconds per epoch, finishing overnight. However, three LSTM layers needed 7000 seconds per epoch, which took several days to complete training. So, the faster drop in validation loss doesn’t actually mean faster results. Even though it took longer to train, the poetry generated by the three LSTM layer network was, in my completely subjective opinion, better.
To produce wholly original text, I also needed to change the way the text was generated. In the keras library example, the script selects a random sequence of characters from the training data as input to the trained network as a seed. I want a bot that writes its own poetry, not one that finishes other poets’ prompts! So, I experimented with different seedings of the text generation step. Since I had begun each poem in the training set with
++++\n , I thought this would suffice to create wholly original output. But the results were a nonsensical combination of
&. After some trial and error, I figured out that the seed sequence needed to have the same number of characters as the training sequences. It seems obvious in hindsight! Ultimately, I used a sequence of 300 characters, so I seeded the generation step by repeating
++++\n for exactly 300 characters. The bot was able to generate several poems per round by occasionally separating the text with
After the script generated a new round of poetry, I performed the final plagiarism check. To do this, I first created the set of all unique 4-grams (phrases containing four words) within the training set, and did the same for my bot’s poetry. Then I calculated the intersection of these two sets. For the purposes of this experiment, I manually inspected the 4-grams to make sure that the phrases that appeared in both sets were inane. Typically, this intersection consisted of things like:
- i do n’t want
- i can not be
- i want to be
- the sound of the
Then I repeated this process using 5 and 6-grams for good measure. If I were to automate this process, I would probably take a frequency-based approach, and exclude n-grams commonly found across multiple authors from being considered plagiarized.
Outputting the model weights after each epoch means we can load a snapshot of the model at several points during its training. Looking into the early epochs of the final model, it’s clear that the bot picked up on line breaks right away. I expected this, given that by design, the most prominent characteristic of the training data is few characters per line. Here’s an example of a poem generated after one epoch of training:
The pare of frowning the wand the sallt a merien
You will we mece wore and and the bite
in is what to call stor the mathing all your bray
It’s already learned some actual words, and mimics the common practice of empty space between each line. From far away, if you don’t squint too hard, that looks like a poem! After the single LSTM model’s loss converged, the model had learned stanzas as well as line breaks, and even shows some of the common poetic device repetition.
The and the beautiful specting
The flight of how the way
I am the room of the words.
I have seen to see
But your heart will see the face
The strong suit of the single LSTM model was definitely in individual lines. Aside from the title line, another one of my favorites is:
the wind is only for me
Whereas the single LSTM model didn’t quite master theme within a poem, it did seem to have a common thread throughout the full body of work that it produced. Here’s a word cloud made from all of the poetry generated by the single LSTM model.
This wouldn’t be surprising if the sun were the most prevalent topic in the training data, but it’s not! Here’s a word cloud generated from the training data.
Emily Dickinson wrote poems about nature and death. My bot writes poems about celestial bodies. To each their own!
After adding the second LSTM layer, we start to see other poetic devices like alliteration and rhyme.
Seen will be found
Mere seeds smiles
I red my the day
the day of the way
the kind be the end
It also starts to produce some pretty poetic sounding phrases. These are similar to the occasional brilliant single lines that the previous model produced, but sometimes spanning more than one line. For example,
there’s a part of the world
between the darkness
the shadows stay the day
Whoa, that’s deep.
So far, we’ve seen lines, stanzas, rhyme (both internal and at the end of lines), repetition, and alliteration. Not bad! But, occasional dramatic flair aside, the poems the bot produces at this point are generally incoherent collections of words. For the most part, there isn’t even grammatical structure to its nonsense.
This begins to change though, with the addition of a third LSTM layer. This model is much more likely to produce lines that are grammatically sound, even if still nonsensical. For instance:
The father of the light is not a fist of the bones
This sentence doesn’t make sense, but it has properly placed the parts of speech. It also has alliteration, and a general poetic feel to the noun clauses. The triple-layer LSTM model also produced these lines, that I think are pretty solid, poetically speaking:
The world was a butterfly land
I feel the mistake alone
But the crowning achievement of the triple layered LSTM model is this complete poem.
From your heart of the darkness
and struggle on the soul
This is not an excerpt from a larger chunk of text. These lines were positioned firmly between two
Well folks, humanity’s been fun. The singularity is upon us!
Special thanks to my poet collaborators for helping me with this fun experiment! Be sure to visit their sites and read their work.