John Mueller’s Site No Longer In Google Search

Everyone in the SEO community is talking about how John Mueller’s website is no longer listed in the Google Search results. It seems like it was deindexed and removed from Google completely. The question is why? John Mueller is one of the most recognized Google Search spokespeople on the internet, so for his site, johnmu.com, with a ton of geeky topics on it, to be not showing up in the Google Search results – SEOs wonder…

If you do a site command for [site:johnmu.com] – no results come up:

Johnmu Site Google Deindexed

Of course, first thing every SEO does it inspect the johnmu.com/robots.txt – so have fun going through that. Then you look for other meta tags that might prevent it from being indexed.

We do not have access to John’s Google Search Console to see if there was a manual action, like so many other sites received last week, but I doubt his site was hit by one…

I spotted this via:

Here is how John is responding to the reaction of this information thus far:

Here is Fabrice Canel from the Bing team – how funny:

What do you all think is going on. It seems John is having fun with this one…

Be nice please.

Forum discussion at X.

Update: John posted on LinkedIn more details, he wrote:

My site’s robots.txt file was making the rounds. It’s awkward – isn’t a robots.txt file a bit like a website’s underwear? I would have put on a clean file if I had known.

But, what’s up with the file? And why is your site deindexed?

Someone suggested it might be because of the links to Google+. It’s possible. And back to the robots.txt… it’s fine – I mean, it’s how I want it, and crawlers can deal with it. Or, they should be able to, if they follow RFC9309.

The comment on top – this is of course for you, and a way of catching a hard-to-spot mistake: a double UTF BOM. Specific text file types have a special starting character. Having one is fine, you usually don’t need it. Most systems (browsers, editors) hide it. For robots.txt if you have a directive on top, and you have an accidental *second* BOM, then that will be seen as part of the directive, and the directive won’t be processed. Having a comment on top means that in the worst case, the comment will be ignored. That’s fine. You could also just have a blank line on top. Or make a clean robots.txt file. Anyway, this is a post, not a cop.

“disallow: /robots.txt” – does this make robots spin in circles? Does this deindex your site? No. My robots.txt file just has a lot of stuff in it, and it’s cleaner if it doesn’t get indexed with its content. This purely blocks the robots.txt file from being crawled for indexing purposes. I could also use the x-robots-tag HTTP header with noindex, but this way I have it in the robots.txt file too.

The length. JOHN. WHAT’S UP WITH THE SIZE OF THIS FILE? I am purposely refraining from making any jokes, do not think them in your head. This is Linkedin, we’re here for srs bzns, folks. NO JOKES. The size comes from tests of the various robots.txt testing tools that my team & I have worked on. The RFC says a crawler should parse at least 500 kibibytes (bonus likes to the first person who explains what kind of snack that is). You have to stop somewhere, you could make pages that are infinitely long (and I have, and many people have, some even on purpose). In practice what happens is that the system that checks the robots.txt file (the parser) will make a cut somewhere. I added a “disallow: /” on top of that section, so hopefully that gets picked up as a blanket disallow. It’s possible that the parser will cut off in an awkward place, like a line that has “allow: /cheeseisbest” and it stops right at the “/”, which would put the parser at an impasse (and, trivia! the allow rule will override if you have both “allow: /” and “disallow: /”). This seems very unlikely though. In practice, parsers that have to go through this will send me lightning bolts with their robot eyes. And stop crawling, if they’re polite. There are a lot of crawlers that are impolite or that put on masks when they crawl, that’s a topic for another day though.

There you have it – some robots.txt quirks – now leave my robots.txt alone πŸ™‚

And, what’s your favorite web quirk?

John implied the site should come back quick, he wrote, “I used the Search Console tool to try something out. I might make a fast recovery if I hit the right button :-).” So the pages are in the index but they are hidden, like when you use the URL removal tool.

Recommended For You

Leave a Reply