On occasion, something may end up on your site that you later decide shouldn’t be there. Even if you delete or replace that asset (whether it be an image or an entire page), a copy of it may live on in Google's index for a long time.
If you want to make sure that an image or text from your site isn’t visible in Google, here are the steps you need to take to get that asset or snippet cleared from the search engine’s cache: 1) remove the asset from the site and 2) submit a URL removal request in Google Webmaster Tools .
You can’t simply tell Google to take anything you don’t like out of the index. One of the things Google needs to know before it acts on a request to delete something from the index is that the owner of the site really doesn’t want it to be visible to search engines and that the site’s owner has taken steps to prevent it from surfacing.
There are three different steps you can take to meet this requirement:
Only the first option also ensures that no human beings will see it either. The second two options are requests to robots not to include the files in their index – but not all robots honor those requests – and they don’t prevent any human visitors from seeing the content if they know the URL. Also, anyone can look inside a robots.txt file to see all the directories and files you don’t want to expose, so that’s not a great way to hide something you really don’t want anyone to see.
If you don’t want users or search engines to be able to see or find a particular file on your site, then the best thing is to delete it outright and make sure that the server tells the browser the asset no longer exists.
Once you’ve deleted the asset from your site, you can wait for Googlebot to recrawl your site and discover the asset is gone. Eventually it’ll disappear from the cache.
But what if you can’t wait that long?
Luckily you can speed things along by logging into your Google Webmaster Tools account and filing a removal request. In our experience, you can get pages removed from Google’s cache in a matter of hours, rather than the days or weeks it could take if you simply waited. This tool is just one of the many reasons why I highly recommend every site owner register a GWT account.
To initiate a removal request:
If you’re trying to get all references to a single undesirable asset removed, pick the first option.
Your URL removal request will now show up in the “pending” list until the request has been processed. Check back in a couple hours to see if the request was approved and then go to Google to make sure the page or image is indeed gone. (It’s important to understand that this is an automated process--not a manual review --so be sure to select the appropriate options and follow the instructions).
Diagram: Google’s Webmaster Tools console
(1) Site Configuration Tools: Crawler Access
(2) Remove URL tab
(3) “New removal request” button
As I mentioned before, this process is not for requesting that Google remove something that’s actually on someone else’s site (unless that asset already meets the requirement about returning a 404/410 or is blocked from crawlers).
If you’re still seeing your images in Google’s search index after you’ve successfully followed all the steps above, you may actually be seeing a copy of it on a different site or you didn’t submit all the right URLs. With web pages, it’s very easy to see where Google is pulling the text from (just check the URL below the snippet in the main search results); make sure that what you’re looking at is on your site. To see the originating location for an image that’s cached in Google Image search, click the thumbnail to open up the cached image. Then on the right side of the pane, click “full size image” to see the full URL/file name of that asset. If that’s your file, the process outlined above should work.
If the URL of the image is not hosted on your site but someplace else, then you’ll need to work with the site owner to have them remove it.
One common example of when this might happen is when your site generates a feed that goes directly into Merlin. When you publish assets for consumption by Merlin, the system does not create a copy of the entire page, but it does create a copy of your images and stores them on a server in order to be able to resize them and make them available in places like the PBS homepage and topics pages.
If you need to change/remove an image from the Google cache that is coming from a Merlin-fed page, here are the steps you take:
If you change your mind later, you can always republish the URL or image and then click the “reinclude” link in GWT to request a recrawl and reinclusion of that specific asset.
Let us know if you have any questions about these instructions or they don’t work for you.