3.6 UNIFORM RESOURCE LOCATOR

Uniform Resource Locator or URL in short is a reference to a resource available on the internet. This term is very common in the internet terminology. URL is used to reach to a specific HTML page or some similar resources from the web. Further categories of the URL are absolute and relative URL; both of them are well used while developing the web pages.
In other words, URL is a accessible name given to some web resource based on where it resides. Most of the URLs refer to a file stored either on the same computer system of on some different machine. This external URL may access from any kind of network. However, remember that URLs also can point to other resources on the network, such as query from a database or accessing some other resource such as audio, video or some other document.

Components of URL are necessary to understand. A standard URL has two components which are protocol identifiers and name of the resource to where URL implies to. Protocol identifier is the name of the protocol which is generally http and is popularly used in our browsers.

Consider the following URL:
“http://mydomainname.com”
In the above domain name http is a protocol identifier while “mydomainname” is a name of the resource. Noteworthy here in the given URL above is, http and the mydomainname.com both are separated by a colon sign (:) followed by two forward slashes (//). Here, protocol identifier (http) indicates the name of the protocol to be used to get one instance of the resource.
Http usually serves the http documents on the web. There are also some other types of resources on the internet used as protocol identifier such as ftp, gopher etc. On the other hand, a resource name is the complete address of the resource which is to be accessed using http. The format of the resource name depends entirely on which protocol is being used. Further, for most of the protocols the resource name contains one or more of the following components:

• Host name
• File name
• Port name
• Reference

Where host name indicates name of the computer system in which the web content resource is residing. Further, file name is the pathname to the file residing on some machine, port name is the port number to build connection and finally reference which is the optional component is referred to a named anchor which usually identifies a specific location within a file.
Generally, for most of the protocols hostname and the file name are needed, whether on the other hand the port number and reference are not forced. For instance the resource name for an HTTP url must specify a server on the network (Host Name) and the path to the document on that machine (Filename). This can also specify a port number and a reference too but are not the forced routines, as said above.

Characters not allowed in a URL-
There could be a simple question that if spaces are allowed in between the URL characters. It is better to know that in RFC 1738 it is documented that the URL string can only contain alphanumeric characters and the !$-_ +*’(), characters. In other words, if some characters other than the given are needed in the URL then they must be encoded. For example, encoding refers to change the characters into equivalent programming code.