Education Resources from WritersServices.com

Get a print estimate

 

 

Vast areas of the web are not accessible to most browsers. Estimates of the scale of the invisible web range as high as 100 to 1. So for every page that you can see, there are 100 you can’t find. So ‘what’ and ‘where’ are these hidden resources?

It is not due to a conspiracy of silence or a plot to keep you in the dark. Many sites are run by businesses, universities and government departments and are designed for internal use. They go to great lengths to exclude those who are not insiders, while hackers take great pride in penetrating the electronic defences.

A decade ago, hacking into these computers was the stuff of Hollywood films. They were simplistic but they did highlight a serious issue. Important sites have got much better at protecting themselves. Now we have firewalls and secure servers. It was so much easier and, I have to confess, quite a lot of fun in those old days. But I digress.

Much of the information in these sites is very boring. You will not get to read the page containing Buckingham Palace’s laundry account unless you are given access or you get an inside job with false references. You are not going to gain access to closed systems without the proper authority.

Next, there are many companies which make their living from selling information. Bloomberg sells financial data, as does Reuters, which also runs various news services. To obtain access to a terminal you need a load of money and the necessary wiring.

There are many businesses that make their living by providing health, lifestyle, legal and practical advice, often as a perk to firms’ employees. The clients pay substantial subscriptions for the privilege of accessing this quality information, so you won’t get to see it as you search the web.

So, as an ordinary web user, you are excluded from confidential internal as well as some quality material. But there is also great deal of material in the public domain which you are unlikely to find.

The web is based on HyperText Markup Language, HTML, but many articles are published as PDFs (Portable Document Format). The software that compiles the search engines does not always penetrate these PDF documents. The same applies to most other file formats. Academic articles are supplied in PDF. The users find them by using keyword searches or catalogues. A new search tool is being tested to get round this issue and might come into use during 2004/05. GOOGLE offers the ‘advanced’ option of locating pages in the format specified which can look inside many popular formats including PowerPoint® and Word®.

Search engines can only check static pages. An increasing number of web pages are generated from data held in a database. If the web address ends in ASP, PHP or CGI (instead of HTM), then you know it has been composed from a database. The page has been composed in response to your enquiry, which is why you get an ‘error’ if you try to store it as a ‘favourite’. The data itself cannot be access by search engines and so they will not find any of the information. This means that a list of agents or authors in a database will not be found by a search engine.

The English language dominates the web, but this is changing. At the moment all the material written in the other great languages is likely to be invisible. This problem of access is being addressed by search engines which will translate the content. Those who produce quality sites in other languages will need to embed some words in English in the ‘meta-tags’ to attract the search engines.

So, just as the volume of private correspondence dwarfs the word-count in all the newspapers of the world, there is an invisible web. The invisible web will be of concern to you when material you want is in an inaccessible or hard-to-find form. However, it is possible to track down event these articles and data if you use the appropriate search strategy.

© Charles Jones 2003  

These articles are provide by writersservices.com for use as course material.

The articles are in a print-friendly format. Having trouble?

©WritersServices.com 2000-7