Interest in machine learning, a form of AI that allows computers to autonomously learn from data and make predictions, has skyrocketed within the world of digital marketing – and for good reason.
Machine learning tools are powerful, they can unlock so much hidden potential, and they can be used to improve a wide range of marketing activities and tasks. This includes generating high-quality content for websites (at scale, too), which is the focus of this article. Here’s everything you need to know.
How machine learning generates content
Machine learning can be used to analyse a website’s existing content and identify patterns that can be used to generate new content in the tone and style of your existing content.
This new content can then be used to enhance your website’s overall performance, whether you’re looking to generate more leads or improve how your users understand and interact with key areas on your site.
That doesn’t mean it’s good to go once it has been created. You will still need to go through the usual quality control measures to ensure it’s on brief, that it’s tightly written and that it has been edited and proofed to the highest editorial standards.
There are a number of different machine learning algorithms that can be used for this task. The most common is called a “neural network”.
Neural networks are designed to mimic the way the human brain learns. They are particularly good at identifying patterns in data.
How this works in practice
Let’s say you have a website about dogs. A neural network could be used to analyse all of the content on your website, including blogs and product descriptions.
From this data, the neural network then identifies patterns related to dogs, such as common breed names and adjectives used to describe dogs.
Armed with this information, the machine learning algorithm is then able to generate new content that replicates the style and tone of your existing content.
There are many different ways that machine learning can be used to generate optimised copy for websites
This can be used to strengthen the performance of your website by providing you with more optimised content to rank for – the kind that search engines crawl to better understand the context and hierarchy of important pages.
This is just one simple example – there are many different ways that machine learning can be used to generate optimised copy for websites.
How do these algorithms work?
While many approaches to creating machine-generated content exist, traditional models leverage neural networks. More specifically, these are a type of neural network known as a recurrent neural network (RNN).
RNNs are designed to mimic the way humans process language. That is, they take in a sequence of words or characters, and output a prediction for the next word or character in the sequence. They do this by internally representing the sequence as a set of relationships between the words or characters.
For example, consider this sentence:
The cat sat on the mat
An RNN would internally represent this as:
The -> cat cat -> sat sat -> on on -> the the -> mat
It would then use this representation to predict the next word or character in the sequence. In this case it would be a punctuation – a full stop.
To train an RNN to generate text, we need a large corpus of text to feed into it. The RNN will then learn the relationships between the words or characters in the text and use these relationships to generate new text that mimics the relationships between words seen in the training data.
For example, a network trained on baking blogs might predict that the word that comes after “raspberry” will be “pie”.
Meanwhile, a network that has been trained on tech blogs might predict that word that follows on from “raspberry” as “pi”.
Challenges with recurrent neural networks
It’s important to note that RNNs do come with some inherent shortfalls. In particular, they often struggle to generate long sequences of text without making errors.
This is because RNNs have to maintain a complex representation of the current output in memory.
Imagine the network as an author writing a book…
As the sequence gets longer, this internal state becomes increasingly complex and difficult to maintain.
Imagine the network as an author writing a book. At the beginning, it can write freely and come up with new ideas. But as the plot progresses, the new writing must remain consistent with everything that has come before to avoid contradictions or plot holes.
Humans are generally quite good at this sort of memory problem, but representing this kind of memory mathematically is challenging. Meaning RNNs tend to be better with shorter sequences where this isn’t such an issue.
Overcoming barriers with long short-term memory networks
One way to address the issue of memory is to use a different type of RNN known as a long short-term memory network (LSTM).
LSTMs are designed specifically to deal with the problem of long-term dependencies in sequences. They do this by having an internal memory cell that can retain information for long periods of time.
LSTMs have proven to be much more effective at generating long sequences of text than traditional RNNs. However, they are also significantly more difficult to train. That’s why they’re not yet as widely used as RNNs.
At the bottom of this article you’ll find a Python code snippet. Here we have trained an LSTM network to write speeches in the unique style of former US president, Donald Trump.
Here we can see that with relatively few lines of Python code, we can create a reasonable text generator given a substantive enough body of text.
You could try swapping out the Trump speeches with a different style of text, such as the entire catalogue of Shakespeare’s plays or Tarantinos’ movies, to see how it performs.
Should I employ machine generated content in my site?
Machine-generated content won’t be right for every business – at least for now. Before deciding on whether it’s currently worth your time and money, ask yourself the following questions.
How much content do I need?
If you have relatively few pages, the overhead of developing tailored machine learning models to fill them with content is unlikely to be as cost effective as hiring a traditional copywriter.
How much content do I have?
If you’re relatively sparse on content from the offset, it will be hard for a machine learning model to pick up on your brand’s specific voice and tone.
To make this a reality, create more content – and regularly. Then you can use machine learning to scale up production.
What is the quality of my existing content?
If you have a lot of low-quality, duplicate or thin content already on your site, it will be hard for a machine learning model to improve upon it.
In this case, it’s better to focus on improving your existing content before investing in machine-generated content.
Is my content time-sensitive?
If your content has a short shelf-life then it’s presently not worth your time and investment to develop models that will quickly become outdated.
Is my content niche?
If your content is very specific to your industry or product, it’ll be challenging to find enough similar articles for a machine learning model to learn from.
In this case, it is more effective to hire a traditional copywriter – especially one with expertise in your field.
How many different types of content do I need?
If you need a lot of different types of content – e.g. product descriptions, blogs, social media posts – it will be difficult to find one machine learning model that can generate all of them equally well.
You may need to look at different models for different types of content. This can be more time-consuming to develop and manage.
How important is quality control?
If you’re not comfortable with the idea of your website publishing content without anyone human reviewing it, you will need to identify internal or external resources to provide manual validation. This is highly recommended.
Key takeaways
Machine learning can be a powerful tool for generating high-quality copy for websites – for some businesses.
Whether or not it’s currently the right solution for you, ultimately depends on your particular needs.
If, for example, you have a large amount of time-sensitive or niche content creating, it’s more effective to hire a traditional copywriter (and especially one who is a subject matter expert).
If you have a lot of content that needs to be generated quickly, well, machine-generated content could be a very good fit indeed.
It’s a very efficient and effective way of helping your content team efficiently produce briefs and draft pages for editing (delivering significant time and cost savings).
Remember, all content needs reviewing, regardless of whether it has been created by a machine or a human. Editing and proofing copy is a fundamental part of the production process.
So don’t take shortcuts. Google, for example, has no time for it. The search giant will penalise websites that don’t have the appropriate quality control measures in place. And it will come down especially hard on those who use it to “manipulate search rankings”.
It should be used in a positive, user-centric way. Do that, and the opportunities will be endless.
Try at home Trump speech generator
To run this code yourself, you’ll need a Python code editor and the libraries Numpy, Pandas and Keras installed in your local environment.
Alternatively, load up Google’s free online editor Colab to run this code without any setup.
You will also need the training corpus, which you can download here. Place this in the same directory as your Python file.
#Import required libraries import numpy as np import os import pandas as pd from keras.models import Sequential from keras.layers import Dense from keras.layers import Dropout from keras.layers import LSTM from keras.utils import np_utils #load trump speeches speeches = "trump_3.6.txt" text = (open(speeches).read()).lower() # mapping characters with integers unique_chars = sorted(list(set(text))) char_to_int = {} int_to_char = {} for i, c in enumerate (unique_chars): char_to_int.update({c: i}) int_to_char.update({i: c}) unique_chars = sorted(list(set(text))) char_to_int = {} int_to_char = {} for i, c in enumerate (unique_chars): char_to_int.update({c: i}) int_to_char.update({i: c}) X = [] Y = [] for i in range(0, len(text) - 50, 1): sequence = text[i:i + 50] label = text[i + 50] X.append([char_to_int[char] for char in sequence]) Y.append(char_to_int[label]) X_modified = np.reshape(X, (len(X), 50, 1)) X_modified = X_modified / float(len(unique_chars)) Y_modified = np_utils.to_categorical(Y) # defining the LSTM model model = Sequential() model.add(LSTM(300, input_shape=(X_modified.shape[1], X_modified.shape[2]), return_sequences=True)) # input layer model.add(Dropout(0.2)) model.add(LSTM(300)) # hidden layer model.add(Dropout(0.2)) model.add(Dense(Y_modified.shape[1], activation='softmax')) # output layer model.compile(loss='categorical_crossentropy', optimizer='adam') # fitting the model model.fit(X_modified, Y_modified, epochs=5, batch_size=1028) # picking a random seed start_index = np.random.randint(0, len(X)-1) new_string = X[start_index] print(new_string) # generating characters out_string = "" for i in range(1000): x = np.reshape(new_string, (1, len(new_string), 1)) x = x / float(len(unique_chars)) #predicting pred_index = np.argmax(model.predict(x, verbose=0)) char_out = int_to_char[pred_index] seq_in = [int_to_char[value] for value in new_string] out_string = out_string + char_out new_string.append(pred_index) new_string = new_string[1:len(new_string)] print(out_string)
Run the code above and share the speeches you generate with us.
Interested to see how machine-generated content could fit snugly into your business operations and support your marketing efforts? Get in touch today to set up a call [email protected].
More insight from Chris…