index sitemap advanced

Search Engines Index

Ever wondered how a Search Engines Index can identify your eBusiness Website as being suitable for inclusion in their database? Well here is a brief explanation. Study it, it will help you formulate the right kind on content for your web pages.                 

When you sit at your computer and search (as an example) on Google, you're presented with a list of results from all over the web. How does Google use ebusiness search engines index to find web pages matching your query, and determine the order of search results?

In simple terms, you compare the task of searching the web as if you were looking in a large book with an search engines index informing you where everything is located. When you search Google, their programs check their index to determine the most relevant search results to be returned ("served") to you.

How Search Works

The three processes for delivering search results:
  • Crawling: Does Google know about your eBusiness site? Can they find it?   

  • Indexing: Can Search Engines Index your eBusiness site?

  • Serving: Does the site have good, useful content that is relevant to the user's search?    

Crawling is the process by which Googlebot discovers new and updated ebusiness pages to be added to the Google index.

Google use a set of computers to fetch (or "crawl") billions of pages on the web. The program that does the fetching is called Googlebot (also known as a robot, bot, or spider). Googlebot uses an algorithmic process: computer programs determine which eBusiness sites to crawl, how often, and how many pages to fetch from each site.

Google's crawl process begins with a list of web page URLs, generated from previous crawl processes, and augmented with Sitemap data provided by webmasters. As Googlebot visits each of these ebusiness websites it detects links on each page and adds them to its list of pages to crawl. New ebusiness sites, and changes to existing ebusiness sites, and dead links are noted and used to update the search engines index.

Google doesn't accept payment to crawl a site more frequently, and they keep the search side of their business separate from their revenue- generating AdWords service.

Googlebot processes each of the pages it crawls in order to compile a massive index of all the words it sees and their location on each page. In addition, they process information included in key content tags and attributes, such as Title tags and ALT attributes. Googlebot can process many, but not all, content types. For example, when search engines  index eBusiness Websites they cannot process the content of some rich media files or dynamic pages.

When a user enters a query, their machines use the Search Engines index for matching pages and return the results they believe are the most relevant to the user. Relevancy is determined by over 200 factors, one of which is the Page Rank for a given page. Page Rank is the measure of the importance of a page based on the incoming links from other pages. In simple terms, each link to a page on your site from another site adds to your site's Page Rank. Not all links are equal: Google works hard to improve the user experience by identifying spam links and other practices that negatively impact search results. The best types of links are those that are given based on the quality of your content.

Google's Related Searches, Spelling Suggestions, and Google Suggest features are designed to help users save time by displaying related terms, common misspellings, and popular queries. Like the google.com search results, the ebusiness search keywords used by these features are automatically generated by their web crawlers and search algorithms. They display these suggestions only when they think they might save the user time. If an ebusiness site ranks well for a keyword, it is because they have algorithmically determined that its content is more relevant to the user's query.

So when Search Engine Spiders come looking for you site just make sure you are ready to oblige them so you can be placed in their search engines index.


Googles new search index:
Caffeine Google have a new web indexing system called Caffeine.

Caffeine provides 50 percent fresher results for web searches than their last index, and it's the largest collection of web content they have ever offered. Whether a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before. When you search Google, you're not searching the live web.

Instead you're searching Google's index of the web which, like the list in the back of a book, helps you pinpoint exactly the information you need. (Here's a good explanation of how it all works.) So why did we build a new search indexing system? Content on the web is blossoming. It's growing not just in size and numbers but with the advent of video, images, news and real-time updates, the average webpage is richer and more complex. In addition, people's expectations for search are higher than they used to be.

Searchers want to find the latest relevant content and publishers expect to be found the instant they publish. To keep up with the evolution of the web and to meet rising user expectations, Google have built Caffeine. Googles old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks.

To refresh a layer of the old index, they would analyze the entire web, which meant there was a significant delay between when they found a page and made it available to you. With Caffeine, Google analyze the web in small portions and update our search index on a continuous basis, globally.

As they find new pages, or new information on existing pages, they add these straight to the index. That means you can find fresher information than ever beforeā€”no matter when or where it was published.

Caffeine lets Google index web pages on an enormous scale.

In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.

Google have built Caffeine with the future in mind. Not only is it fresher, it's a robust foundation that makes it possible for them to build an even faster and comprehensive search engine that scales with the growth of information online, and delivers even more relevant search results to you. Posted by Carrie Grimes, Software Engineer

?] Subscribe To This Site

Add to Google
Add to My Yahoo!
Add to My MSN
Subscribe with Bloglines

footer for eBusiness Traffic page