Removing my own content from Google
To remove content or prevent search engines from crawling content on your site, you will need to use one of the following:
* A robots.txt file. A robots.txt file restricts access to your site by search engine robots that crawl the web. (Note, however, that while Google won't crawl or index the content of pages blocked by robots.txt, we may still index the URLs if we find them on other pages on the web.) To use a robots.txt file, you'll need to have root access to your server. More information about creating a robots.txt file.
* A noindex meta tag. When we see a noindex meta tag on a page, Google will completely drop the page from our search results, even if other pages link to it. If the content is currently in our index, we will remove it after the next time we crawl it. The meta tag allows you to control access on a page-by-page basis, which is useful if you don't have root access to your server. (You'll need to be able to edit the source HTML of your page.)
If you do not control the content you want removed, see Removing someone else's content from search results.
What do you want to remove?
My entire site or directory
To prevent robots from crawling your site, add the following directive to your robots.txt file:
User-agent: *
Disallow: /
To prevent just Googlebot from crawling your site in the future, use the following directive:
User-agent: Googlebot
Disallow: /
Each port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, to allow Googlebot to index all http pages but no https pages, you'd use the robots.txt directives below.
For your http protocol (http://yourserver.com/robots.txt):
User-agent: *
Allow: /
For the https protocol (https://yourserver.com/robots.txt):
User-agent: *
Disallow: /
A web page
To prevent all robots from indexing a page on your site, use a noindex meta tag. Place the following into the section of your page:
To allow other robots to index the page on your site, preventing only Google's robots from indexing the page:
Note that because we have to crawl your page in order to see the noindex meta tag, there's a small chance that Googlebot won't see and respect the noindex meta tag. If your page is still appearing in results, it's probably because we haven't crawled your site since you added the tag. (Also, if you've used your robots.txt file to block this page, we won't be able to access this page and see the tag.)
An image
To remove an image from Google's image index, add a directive to your robots.txt file. For example, if you want Google to exclude the dogs.jpg image that appears on your site at www.example.com/images/dogs.jpg, add the following:
User-agent: Googlebot-Image
Disallow: /images/dogs.jpg
To remove all the images on your site from our index, add the following directive to your robots.txt file:
User-agent: Googlebot-Image
Disallow: /
Disallow patterns may include "*" to match any sequence of characters, and patterns may end in "$" to indicate the end of a name. To remove all files of a specific file type (for example, to include .jpg but not .gif images), use the following robots.txt entry:
User-agent: Googlebot-Image
Disallow: /*.gif$
By specifying Googlebot-Image as the User-agent, the images will be excluded from Google Image Search. If you would like to exclude the images from all Google searches (including Google web search and Google Images), specify User-agent Googlebot.
A cached page
Google automatically takes a "snapshot" of each page it crawls and archives it. This "cached" version allows a webpage to be retrieved for your end users if the original page is ever unavailable (due to temporary failure of the page's web server). The cached page appears to users exactly as it looked when Google last crawled it, and we display a message at the top of the page to indicate that it's a cached version. Users can access the cached version by choosing the "Cached" link on the search results page.
Before you begin, the page owner must have done one of the following:
* To update the cached version of a page, change the content of the page. The next time Google crawls the page, we'll update the cached version.
* To remove cached versions of a page from Google's index and prevent Google from caching the page in the future, you must add a noarchive meta tag to that page. The next time we crawl that site, we'll see the tag and remove the page.
Once this is complete, you can use the URL removal tool in Webmaster Tools to request expedited removal of the current cached content until Google crawls and caches the new version of the page.
In the URL removal tool, you may be asked to specify the search query that returns the cached page you want removed. None of the words in the search query should appear anywhere on the live page. (You don't need to include common words such as "and", "the", etc.)
For example, if you want to remove a cached page containing the words "Susan's cats are ugly hairballs", and the page still contains the words "Susan's cats are beautiful puffballs", a cache removal request for "Susan's cats are ugly" will be unsuccessful (because the terms "Susan's cats are" remain on the page).
To prevent all search engines from showing a "Cached" link for your site, place this tag in the section of your page:
To prevent only Google from displaying one, use the following tag:
Note: Using a noarchive metatag removes only the "Cached" link for the page. Google will continue to index the page and display a snippet.