Search Engines Index
Ever wondered how
a Search Engines Index can identify your
eBusiness Website as being suitable for inclusion in their database?
Well here is a brief explanation. Study it, it will help you formulate
the right kind on content for your web
When you sit at your
computer and search (as an example) on Google, you're presented with a
list of results from all over the web. How does Google use ebusiness
search engines index to find web pages matching your query, and determine the order
of search results?
In simple terms, you
compare the task of searching the web as if you were looking in a large
book with an search engines index informing you where everything is located. When you
search Google, their programs check their index to determine the most
relevant search results to be returned ("served") to you.
How Search Works
The three processes
for delivering search results:
Crawling: Does Google know
about your eBusiness site? Can they find it?
Indexing: Can Search Engines
Index your eBusiness site?
Serving: Does the site have
good, useful content that is relevant to the user's
Crawling is the process
by which Googlebot discovers new and updated ebusiness pages to be
added to the Google index.
Google use a set of
computers to fetch (or "crawl") billions of pages on the web. The
program that does the fetching is called Googlebot (also known as a
robot, bot, or spider). Googlebot uses an algorithmic process: computer
programs determine which eBusiness sites to crawl, how often, and how
many pages to fetch from each site.
Google's crawl process
begins with a list of web page URLs, generated from previous crawl
processes, and augmented with Sitemap data provided by webmasters. As
Googlebot visits each of these ebusiness websites it detects links on
each page and adds them to its list of pages to crawl. New ebusiness
sites, and changes to existing ebusiness sites, and dead links are
noted and used to update the search engines index.
Google doesn't accept
payment to crawl a site more frequently, and they keep the search side
of their business separate from their revenue- generating AdWords
Googlebot processes each of the
pages it crawls in order to compile a massive index of all
the words it sees and their location on each page. In addition, they
process information included in key content tags and attributes, such
as Title tags and ALT attributes. Googlebot can process many, but not
all, content types. For example, when search engines index
eBusiness Websites they cannot process the content of some rich media
files or dynamic pages.
When a user enters a query,
their machines use the Search Engines index for matching pages and
return the results they believe are the most relevant to the user.
Relevancy is determined by over 200 factors, one of which is the Page
Rank for a given page. Page Rank is the measure of the importance of a
page based on the incoming links from other pages. In simple terms,
each link to a page on your site from another site adds to your site's
Page Rank. Not all links are equal: Google works hard to improve the
user experience by identifying spam links and other practices that
negatively impact search results. The best types of links are those
that are given based on the quality of your content.
Google's Related Searches,
Spelling Suggestions, and Google Suggest features are designed to help
users save time by displaying related terms, common misspellings, and
popular queries. Like the google.com search results, the ebusiness
search keywords used by these features are automatically generated by
their web crawlers and search algorithms. They display these
suggestions only when they think they might save the user time. If an
ebusiness site ranks well for a keyword, it is because they have
algorithmically determined that its content is more relevant to the
So when Search Engine
Spiders come looking for you site just make sure
you are ready to oblige them so you can be placed in their search
Googles new search index:
Google have a new web indexing system called Caffeine.
Caffeine provides 50 percent fresher results for web searches than their last index, and it's the largest collection of web content they have ever offered. Whether a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before.
When you search Google, you're not searching the live web.
Instead you're searching Google's index of the web which, like the list in the back of a book, helps you pinpoint exactly the information you need. (Here's a good explanation of how it all works.)
So why did we build a new search indexing system? Content on the web is blossoming. It's growing not just in size and numbers but with the advent of video, images, news and real-time updates, the average webpage is richer and more complex. In addition, people's expectations for search are higher than they used to be.
Searchers want to find the latest relevant content and publishers expect to be found the instant they publish.
To keep up with the evolution of the web and to meet rising user expectations, Google have built Caffeine.
Googles old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks.
To refresh a layer of the old index, they would analyze the entire web, which meant there was a significant delay between when they found a page and made it available to you.
With Caffeine, Google analyze the web in small portions and update our search index on a continuous basis, globally.
As they find new pages, or new information on existing pages, they add these straight to the index. That means you can find fresher information than ever before—no matter when or where it was published.
Caffeine lets Google index web pages on an enormous scale.
In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.
Google have built Caffeine with the future in mind. Not only is it fresher, it's a robust foundation that makes it possible for them to build an even faster and comprehensive search engine that scales with the growth of information online, and delivers even more relevant search results to you.
Posted by Carrie Grimes, Software Engineer