When we first talked about large-scale keyword research a few years ago, we were thinking about hundreds of thousands. As we’ve finally perfected our process, the largeness of that scale is getting, well, larger with each project. The Melt SEO team has just finished a project outputting over 2 million keywords, close to doubling the scale of our last one. Enough about scale. The new developments brought us back to thinking about how we got here, and we thought we’d share a few quick lessons we’ve learned about the process.

Less-than-perfection in = garbage out

As any product manager will tell you, cleanliness is well above godliness when it comes to data inputs. But when we started developing our process we figured there must be a margin for error on the input data for keyword research at huge scale. Not so. After a few initial attempts we found that one of the most important parts of the process was defining a watertight seed keyword list that won’t let any noise (i.e. irrelevant keywords) creep in. That’s why close consultation at the outset has become a key part of our process.

Python really is your friend

That was initially going to read If you’re doing SEO without Python, you’re doing it wrong. But we realised that was probably going a bit far. There are plenty of SEO tasks and, sometimes, whole projects that you can carry out manually. But we’ve also increasingly found that the day-to-day of SEO just isn’t feasible without the massive time gains we get from automating tasks using Python scripts. There’ll be more from us on this soon, but Search Engine Journal has a good article for starters.

Not all keyword tools are created equal

Pretty much since the concept of keyword research was first invented Google has been placing more and more restrictions on the data available to SEOs from its own tools (like Keyword Planner). Accordingly, the SEO industry has become more and more reliant on alternative keyword tools to actually return keywords along with their search volume and other associated data. And, like most SEOs (the good ones, anyway), we’ve found that there are keyword tools and there are keyword tools. Being able to separate the former from the latter is make or break for quality keyword research that provides valuable insight in a cost-effective way for clients.

There are so many unknown unknowns

You know that Donald Rumsfeld quotation. It’s often ridiculed, but it’s just often recognised as having a kernel of validity. And that’s true of SEO. We tend to think we know a query space after a couple of keyword research projects related to it. The truth is that those projects (your average traditional keyword research project of a few thousand) are heavily sampled and often relatively limited in the broadest possible context of a search topic. When you’re getting a million keywords from a project, on the other hand, you see patterns and whole behaviours that you’ve never seen a hint of before.

NLP is great – for certain things

Natural language processing (NLP) is a fundamental part of our process, and it’s completely revolutionised the way we do keyword research. But it’s not a silver bullet. For a while a few years ago, we thought it might be. Then we started working with it and we quickly understood the reasons for the hype – but also that, like most hyped things, the reality was different. When we started out developing this process, our hope was to create a machine learning tool that we could give a few inputs to, hit a button and get a ton of useful stuff back. In fact, we’ve found what lots of agencies and consultancies working with machine learning have found – that the real legwork is in defining the inputs and training machine learning models. But after a lot of that legwork we’ve got it working.

Find out more about what you can actually do with our process with our recent case studies on it. Or you can get in touch with us directly.