How search engines read a web page, how does spiders read a website page.

How search engines read a webpage

Search engines collect data about a unique website by sending an electronic spider to visit the site and copy its content which is stored in the search engine’s database.

When you type the words you?re looking for into a search box, the engine tries to match your words with the words from webpages it has analyzed, and it then delivers a list of matches. The engine organizes that list from best to worst, ranking the results according to a variety of criteria.

Search engines do not see Web pages like you do. They cannot process images, and translate them into content. Search engines crawl your website by reading the code created with HTML, ASP, PHP and other code languages. A page made up mostly of images displays mostly blank to a search engine.

Sometimes, what you see as text on a page isn’t really text. Some people create Web page designs in an image editor program and instead of recreating the design as code, they simply post their image to look like HTML. Having a page on your site with lots of images, or with a lot of flash animations, can be fine depending on your target customer.

When the search engine arrives to the website in looks in the root (main) folder of the site for a file called robots.txt. In the robots.txt file it looks for what directories and files it is allowed to look at and index.

Once a web page is found by the search engine spider (robot, web crawler) it takes a look at the head section (the tags between the <head> and </head> tags) of the web page for:

The title of the page
The keyword and description meta tags
The robots Meta tag.

Some use the robots meta tag to override the instructions in the robots.txt file or if they cannot create a robots.txt file the instructions for the search engine spiders (robots, web crawlers) are placed here.

If there is no robots.txt file in place and no robots meta tag in the page(s) it finds, it will follow and index all the links found.

Search engines like Google read through all this information and create an index of it. The index allows a search engine to take a query from users and show all the pages on the web that match it.

Even though people and search engines scan webpages differently, there are some similarities:

? Page title. Both people and search engines need to know at a glance what a page is about. The page title, sometimes called the <title> tag, is inserted in the code of a webpage. You?ll see it in the top bar of a Web browser.

? Headlines, emphasized words, and lists. Both people and search engines know that anything called out in headlines or subheadings, in boldface or italics, or in bulleted lists is likely to be important. Make sure headings, links, and lists in your Web copy are called out with HTML tags.

? Introduction and conclusion. Readers will scan your opening paragraph or your summary for quick information. And search engines, to understand what the subject of a page is, look for keywords throughout that page, including at the top (the introduction) and the bottom (the conclusion). But don?t just shove keywords into the top or the bottom of your page?distribute them evenly throughout.

?Related links. Humans appreciate options for more information. Search engines, too, like to see that you?ve linked to other websites and that other websites have linked to yours.

Search engines and people both like:

? Verbosity. In the search engine world, verbosity means substantial, relevant, original content. Do fill your page with words, but write succinctly: Make sure that every word you write is relevant to your audience and to the topic you?re addressing.

? Good writing. To a search engine, good writing means using variations of your keywords, including those with different endings. For example, if you are targeting the phrase job interview, use the singular, plural, -ing, and -ed forms, such as job interviews and job interviewing.

Search engines and people both dislike:

? Bad writing. Search engines are more likely to penalize your website when you stuff your copy with unrelated keywords, strand a list of keywords at the bottom of your page, and rely too much on headlines and links. Your entire page should be relevant: Like a muffin with the right amount of blueberries, it should have juicy keywords distributed evenly throughout, but not so many that they overwhelm the whole.

? Broken links. Search engines want to provide a great experience for their customers by directing them to a useful and informative website that works properly. Broken links tell people and search engines that a site is poorly maintained and will give people a bad experience.

Hence, crawler-based search engines have plenty of experience now with webmasters who constantly rewrite their web pages in an attempt to gain better rankings.

How search engines read a webpage

Share this:

Like this:

Related