This post is going to be a strange one. There are a few things that feed into the following information, all of which seem disparate but, hopefully, will come together to make some sort of sense.
As a disclaimer, I will state openly that I am new to the use of neural networks in general. On the other hand, I have nearly a decade's worth of intimate connection to Shakespeare. With all that in mind, I want to show you what I've made. Let's begin by examining one of my favorite sonnets:
That time of year thou mayst in me behold
When yellow leaves, or none, or few, do hang
Upon those boughs which shake against the cold,
Bare ruin'd choirs, where late the sweet birds sang.
In me thou seest the twilight of such day
As after sunset fadeth in the west,
Which by and by black night doth take away,
Death's second self, that seals up all in rest.
In me thou see'st the glowing of such fire
That on the ashes of his youth doth lie,
As the death-bed whereon it must expire
Consumed with that which it was nourish'd by.
This thou perceivest, which makes thy love more strong,
To love that well which thou must leave ere long.
It's beautiful. There's no question about that. I'm not here to analyze the poetry itself, though that would be fun and might be done in future posts. No - the mission today is to try and make a computer generate things that might, plausibly, be Shakespearean.
Collecting and Cleaning the DataData collection is important for any project. Thankfully, Shakespeare has been dead for a long time and his lawyers let his copyright lapse some time ago. The Gutenberg Project has a full collection of his sonnets available in text format, which is what I'll be using to train my neural network. It'll also be used for the Markov text generation, which is a separate but interesting project alongside the neural network. The Markov text generation topic will be covered later.
If you'd like a copy of the cleaned text file that I created (this basically just means removing the Sonnet headings), feel free to check out this link to my github project.
Going forward, I'll try and be as transparent as possible about where I collected my data from. As this was a fairly simple project, this section is pretty light.
Neural Networks and LearningNeural Networks are weird. I've watched a few dozen hours of videos on how, exactly, neural networks work. They're fascinating and alien in nature. It's a way for people, using algorithms, to train a computer to recognize something or process data without being given really specific instructions. The "training" is based on scoring the network using training data, so that each time it makes a guess it is told whether it was right or how badly it was wrong. To steal a graphic from Wikipedia for demonstration purposes:
The astute observer will see that a large section of this photo says "hidden". This does not mean that neural networks are magic, merely that the layers in between are not visible as the network learns.
All of this sounds neat, though it's both unlikely to be 100% accurate and also nowhere near comprehensive on the subject. As I said, I'm not an expert and likely wont be for some time (if ever). The reason I started this project was to learn about neural networks and machine learning more generally, and this is merely the first step in a long process towards doing that.
What I DidI'm using Python (as I always do) alongside some specific packages or tutorials that I found pretty easy to get used to as a beginner. The specific process for getting everything together and running was as follows:
- Install Ubuntu on my personal laptop.
- Install Anaconda on the laptop.
- Install TensorFlow GPU to a conda environment.
- Find tutorials on Neural Network text generation through training.
Installing UbuntuThis was easy. All you have to do is load Ubuntu onto a thumb drive and install from the boot menu. I used Rufus. I chose Ubuntu because I've used it before, though I was told that ArchLinux is actually better suited to easily install TensorFlow GPU.
Installing AnacondaThis was much harder for me to do on Linux than it was on Windows. For one, I'm not super familiar with Linux environments, so the installation process of anything is more convoluted merely from a lack of familiarity. Once I became more familiar with installing/uninstalling programs and python packages, I've actually found it to be as easy if not easier as on Windows.
Install TensorFlow GPU in a Conda EnvironmentPretty easy to do, given the right instructions. Testing whether it's running properly is as easy as typing "nvidia smi" into the terminal window once you're running a training model and seeing whether your GPU usage has gone up.
Find tutorials on Neural Network Text GenerationThis was marginally more difficult. I perused the net for a while before settling on a tutorial I liked, though it ended up producing results I was unhappy with. Eventually, I switched packages to something simpler that worked about the same.
ResultsHere's where the fun comes in. Let's see what some of the resulting stuff looked like. I'll comment on the process as we go and try to give some insight into what is happening. We'll first look at my initial pass on Neural Network text generation and see the results from that.
First Pass using Online TutorialThe results from this were less than stellar for a few reasons which I'll get into. If you'd like to see "my code" (which is just the code from the tutorial), check out the "training.ipynb" file on the above linked GitHub repo.
Before we start, I should explain a core concept in neural networks: Epochs. An epoch is one complete pass of a dataset against the algorithm that needs to be trained. In general, a tiny number of epochs (for example, one or two) will not be enough to train a neural network to perform how you'd like it to. You'd think that a huge number of epochs would train the model to be amazing at doing the thing you want, but this is not necessarily the case. When generating my model, I found that the loss function began increasing after a certain point in time if I ran it for enough epochs.
A loss function is, in essence, how close your neural network performs on a given epoch versus the expected answer. It's important to know that a neural network requires training data, so it's essentially already given a good batch of test information to run against and see how many it gets right - this is the loss function. It's a positive number, and the closer it is to zero the "better" the neural network is performing.
When I first ran the training model, I was pretty disappointed. I gave the model three training epochs to begin with - as per the tutorial - and got very bad results. Here's a few examples, which I've actually rerun just for this post to explain.
God tinde of wu feld merend his dy iwhyich tove, Ther welze af ogerume sferak CU thove pan, Bert therI th stp thvengnond, Wos challf aly iw than hed mey than be shre, phathe fry sadd, Nite mowjer solsyo nw par s thy IlrI woy te thais ing no ererere mee horis poraneg, by halg or mseens I dessco ghaten! pare; WThr erisud thinind wopelir bpilh susry ard mond Tlow hiis ho chongerbe, Toj pilly A the ou to byy vis ind ow tocm ka thaigr file, Theg, fone, And wfist lene wian thos buee, fous mantof thes the malat, AngAn s in shor sy witu sive moif told oude thensrenang? CRtor asiesmr whapurer high than? X X)FAn fyo slels thy int ofarn see, maperee H rowk sure, fif thauth dheurt; Set con bt my thind be whow wyor forpanded the allen e, sigl's bu: s alisneung yavif whand of in mo staty inowas trals thtind paclllft asthein mare. SaAnd mime fine co rsanthes silt mece hoy ;o f, F zo bebe, ghou imy peat ind swrang for feand fouf beres sidw, Shin in coud I Whuld at be
As you can see, my neural network was close to Shakespeare already. More seriously, my first attempted "solution" was to increase the number of epochs that my model ran through. At this point, the loss function was sitting at around 2.3, which is bad for this sort of thing in my experience. Here's an example of the model generating text with ten epochs.
God? Tikeds ip cwats uppalftuct the stiy thy say keec My be is hew, of nights that neied by me farm it rid dith my arontores, dear mait, And batt; sond shaver rinds bept ot'ar'd stownich priend; Indefrawith pliesuscige bay tot thy lesss and, For that preaugy bations gostion, that your frows will with wight. I vosting, and gater sill cansed wor ast songte Co live so sham well, For the cond tither to jeight, But fir tham thou wepth tur you what thy abmexp hanst is then mortaich courls tight thow the onrers's farme's rens, Whou aim blong shile, Then bestyoun beauty on messight, and miching eyest, And sall lenome, thene ming, reauts of thoug truppine: Be; And thy heart weal acain you destrecs, Bus braws not sum ree Of horsh abine eut stail hilds askion I pair, Leste to see wetise love, ard dis colls, Thing, awing fall wieth dishads will faserest, Prenadien'd that all the solf, atifse inore. XXXIII Ag that de sunds one wist till dingrerl'? Sall thil our farus f
Ignoring the fact that this looks like complete gibberish when you read it, you can see that it's changed two very important things: first, the sentence length is marginally more coherent, which means it has slightly more a sense of what it's meant to be doing. Second, it included a Sonnet Number, which I found funny. I decided to leave the sonnet numbers in at this point just for the laughs. If anything, it probably hurts my model, but I like to imagine that it's reading off it's best attempt at a thirty-third sonnet.
The loss function at this point was closer to 1.7, which is an improvement over the prior model which only had three epochs. Now lets go ahead and cook with gas - I'm going to show the output at thirty epochs.
Godd make you were by. Life were not so mine own husbandry? Should may I despeceive thy cruel hand deface, In sometime lofty towers I see: Sance me divinedet infatted, Which lays it hath the time and to this, As unone doth pottering days: When that I have sworn thee fair, and you would despise, When in eternal looks tre said to chase, Or at your life on thy hand: But not He so, now to decay, O'lXXV O! how when not did I not well-toofemble defent, Yourself forgoenceivion it larges place, For their image in soolly touch me, LoXVII Thy heart that fear inst the time, My beauty's sequent and inly pride. O! calage noRet retthy umour, Though in my all-the beauty thou art as travel's sink, Hath putther'd. Thee off two loves hant of glody, Which barren quill do slave, that I an all awards; Love's being mad, to make of many not appequity: But if they sleep a see ninger his gurds and pent: Sa, what merit in thy and chief do themselves fare; But now comme thoughts as
Hey, that doesn't look so bad! The problem now is that it's plagiarizing. Bits and pieces become obviously parts from the sonnets, ripped completely from the text. This becomes even more obvious if you increase the epochs further - say, to 60. Here's some of the text generated at that point:
Death to me subscribes, Since, spite of him, I'll live in this poor rime, Whilst I, whom fortune wild with all hearts, Which I by lacking have supposed dead. CVII Yet which alters when it alteration find Than this give life that himade from thee: The bloody spur cannot provoke him on, That sometimes of place, Beauty no vend of praise away, dear friend, and I assure ye, Even that thou not farther than my thoughts canst to score, The dost sweet in some in their rooter thanough not so bright As those gold candles fix'd in heaven's ase, If thou shy self this book, this lest in love call; All mine was thine, before thou hadst this more. Then, if for my love, though mounted on the wind, In wing world beisaye me gof love strange; They are but dressings of a former child! O!
Okay, this is pretty interesting, but the lines are straight ripped from the text and then reassembled. Why is that? Well, I think it has to do with how long my "chunks" are - that is to say, what it considers a learn-able piece. My model isn't word-based, it's character based. To fix this, I'll need to tinker with the character length to avoid plagiarism. That said, by this point we have a loss function of .15, which gives us something that is genuinely readable for the vast majority of the generated text.
I stepped down the "sequence length", and the results were as follows. For this test, I ran the same 60 epochs, the "sequence length" (or characters-into-chunks) were the only changed variable.
Then, let your self in your decay With means more blessed than my the sun, Strikes each in each by mutual ordar not beauty's name; But now converted at first- Where I may not him grace the day? Whence hast thou this becoming of things in in that bosom sits That on himself so clear, Yet then him with it thee again. He lends thee virtue, and he st love, Thou canst not to give away. LXXIV Why is my verse stanst even by the self-same sky, Vaunt in their youthful sap, at pleasure be it ill or thy Muse, And therefore are featts on them fir stol'n thy hair; The region cloud hath mask'd his beauty should look so. XCII But wherefore do not you a might. XXVIII O thou, my lovely kning, Or ssain what ston'd born of love? Then, gentle cheater, urge not my amiss, Les men'
Yeah, okay - that's not great, but it's plagiarizing slightly less. At this point, I'm about ready to give up on neural network text generation for a bit, given that I feel I've hit a roadblock on what it can and cannot do.
ConclusionsThis is a really interesting project for me to work on. Even with my very basic understanding of neural networks, I am enrapt by the idea of building a thing that can spit out new poetry, prose, or limerick. It'll take more time for me to fully engage with this medium, but I've enjoyed it thus far.
In reality, I wanted to put out a new blog post. I put this together over an afternoon, and might come back in the near future to update it or simply create a follow up post showing where I've gone with this. At this time, my project has morphed into using Markov chains for text generation. It's both easier (on my computer) and more sensible for text generation.
I'll be making a few more blog posts in the coming weeks/months, and each of them will have a corresponding github link for those who want to see how I do what I do. Thanks for reading!