Network Programming with Java: The basics

What Is Network Programming?

A network is a group of two or more computers or other types of electronic devices such as printers that are linked together with a goal to share information.

Each device linked to a network is called a node. A computer that is linked to a network is called a host.

Network programming in Java involves writing Java programs that
facilitate the exchange of information between processes running on different computers on the network.

Java makes it easy to write network programs.

Sending a message to a process running on another computer is as simple as writing data to a local file system.
Similarly, receiving a message that was sent from a process running in another computer is as simple as reading data from a local file system.

Most of the programs we deal with involve reading and writing data over the network, and they are similar to file I/O. You have to learn about a few new that facilitate the communication between two computers on a network.
You do not need to have advanced level knowledge of networking technologies to understand or write Java programs. Here we cover high-level details of a few concepts that are involved in network communication.

A network can be categorized based on different criteria.
Based on the geographical area that a network is spread over, it is categorized as follows:

Local Area Network (LAN): It covers a small area such as a building or a block of buildings.
Campus Area Network (CAN): It covers a campus such as a university campus, interconnecting multiple LANs within that campus.
Metropolitan Area Network (MAN): It covers more geographical area than a LAN. Usually, it covers a city.
Wide Area Network (WAN): It covers a larger geographical area such as a region of a country or multiple regions in different countries in the world.

When two or more networks are connected using routers (also known as gateways), it is called internetworking, and the resulting combined network is called an internetwork, in short, internet.
The global internetwork, which encompasses all networks in the world connected together, is referred to as the Internet.

Based on the topology (the arrangement of nodes in a network), a network may be categorized as

star,
tree,
ring,
bus,
hybrid, etc.

Based on the technology a network uses to transmit the data, it can be categorized as

Ethernet,
LocalTalk,
Fiber Distributed Data Interface (FDDI),
Token Ring,
Asynchronous Transfer Mode (ATM), etc.

For any details about the different kinds of networks refer to any standard textbook on networks to learn more about networks and network technologies in detail.

Communication between two processes on a computer is simple and it is achieved using InterProcess Communication(IPC) as defined by the operating system.
It is a very tedious task when two processes running on two different computers on an internet need to communicate. You need to consider many aspects of the communication before the such two processes may start communicating. Some of the points that you need to consider are as follows:

The two computers may be using different technologies such as different operating systems, different hardware, etc.
They may be on two different networks that use different network technologies.
They may be separated by many other networks, which may be using different technologies. That is, two computers are not on two networks that are interconnected directly. You need to consider not just two networks, but all networks that the data from one computer must pass to reach another computer.
They may be a few miles apart or on other sides of the globe. How do you transmit the information efficiently without worrying about the distance between the two computers?
One computer may not understand the information sent by the other computer.
The information sent over a network may be duplicated, delayed, or lost. How should the receiver and the sender handle these abnormal situations?

Simply put,

two computers on a network communicate using messages (sequences of 0s and 1s).
There must be well-defined rules to handle the previously mentioned issues (and many more). The set of rules to handle a specific task is known as a protocol. Many types of tasks are involved in handling network communication. There is a protocol defined to handle each specific task. There is a stack of protocols (also called protocol suite) that are used together to handle a network communication.

Network Protocol Suite

Modern networks are called packet switching networks because they transmit data in chunks called packets.

Each packet is transmitted independent of other packets. This makes it easy to transmit the packets from the same computer to the same destination using different routes.

However, it may become a problem if a computer sends two packets to a remote computer and the second packet arrives before the first one.

For this reason, each packet also has a packet number along with its destination address.

There are rules to rearrange the out-of-order arrival of the packets at the destination computer. The following discussion attempts to explain some of the mechanisms that are used to handle packets in a network communication.

Figure 4-1 shows a layered protocol suite called the Internet Reference Model or TCP/IP Layering Model.
This is the most widely used protocol suite. Each layer in the model performs a well-defined task.

The main advantage of having a layered protocol model is that any layer can be changed without affecting others. A new protocol can be added to any layer without changing other layers.

Each layer knows about only the layer immediately above and below it. Each layer has two interfaces—one for the layer above it and one for the layer below it.

For example, the transport layer has interfaces to the application layer and internet layer. That is, the transport layer knows how to communicate only with the application layer and the internet layer. It knows nothing about the network interface layer or the physical layer.

A user application such as a Java program uses the application layer to communicate to a remote application.

The user application has to specify the protocol that it wants to use to communicate with the remote application.

A protocol in an application layer defines the rules for formatting messages and associating the meaning to the information contained in the messages such as the message type, describing
whether it is a request or a response, etc.
After the application layer formats the message, it hands over the message to the transport layer.
The examples of protocols in an application layer are

Hypertext Transfer Protocol (HTTP),
File Transfer Protocol (FTP),
Gopher,
Telecommunication Network (Telnet),
Simple Mail Transfer Protocol (SMTP), and
Network News Transfer Protocol (NNTP).

The transport layer protocol handles the ways messages are transported from one application on one computer to another application on the remote computer.
It controls

the data flow,
error handling during data transmission, and
connections between two applications.

For example,

a user application may hand over a very large chunk of data to the transport layer to transmit to a remote application.
The remote computer may not be able to handle that large amount of data at once. It is the responsibility of the transport layer to pass a suitable amount of data at a time to the remote computer, so the remote application can handle the data according to its capacity.
The data passed to the remote computer over a network may be lost on its way due to various reasons.
It is the responsibility of the transport layer to re-transmit the lost data. Note that the application layer passes data to be transmitted to the transport layer only once. It is the transport layer (not the application layer) that keeps track of the delivered and the lost data during a transmission.
There may be multiple applications running, all of which use different protocols and exchange information with different remote applications.
It is the responsibility of the transport layer to hand over messages sent to a remote application correctly. For example, you may be browsing the Internet using the HTTP protocol from one remote web server and downloading a file using the FTP protocol from another FTP server. Your computer is receiving messages from two remote computers and they are meant for two different applications running on your computer—one web browser to receive HTTP data and one FTP application to receive FTP data.
It is the responsibility of the transport layer to pass the incoming data to the appropriate application.

You can see how different layers of the protocol suite play different roles in data transmission over the network. Depending on the transport layer protocol being used, the transport layer adds relevant information to the message and passes it to the next layer, which is the internet layer.
The examples of protocols used in the transport layer are

Transmission Control Protocol (TCP),
User Datagram Protocol (UDP), and
Stream Control Transmission Protocol (SCTP).

The internet layer accepts the messages from the transport layer and prepares a packet suitable for sending over the internet. It includes the Internet Protocol (IP).
The packet prepared by the IP is also known as an IP datagram.
It consists of a header and a data area, apart from other pieces of information.
The header contains

the sender’s IP address,
destination IP address,
time to live (TTL, which an integer),
a header checksum, and many other pieces of information specified in the protocol.

The IP prepares the message into datagrams, which are ready to be transmitted over the internet.

The TTL in the IP datagram header specifies how long, in terms of the number of routers, an IP datagram can keep traveling before it needs to be discarded. Its size is one byte and its value could be between 1 and 255. When an IP datagram reaches a router in its route to the destination, the router decrements the TTL value by 1. If the decremented value is zero, the router discards the datagram and
sends an error message back to the sender using Internet Control Message Protocol (ICMP).
If the TTL value is still a positive number, the router forwards the datagram to the next router.

The IP uses an address scheme, which assigns a unique address to each computer. The address is called an IP address. The internet layer hands over the IP datagram to the next layer, which is the network interface layer. The examples of protocols in an internet layer are

Internet Protocol (IP),
Internet Control Message Protocol (ICMP),
Internet Group Management Protocol (IGMP), and
Internet Protocol Security (IPsec).

The network interface layer prepares a packet to be transmitted on the network. The packet is called a frame. The network interface layer sits just on top of the physical layer, which involves the hardware.

Note that the IP layer uses the IP address to identify the destination on a network.

An IP address is a virtual address, which is completely maintained in software.

The hardware is unaware of the IP address and it does not know how to transmit a frame using an IP address. The hardware must be given the hardware address, also called Media Access Control (MAC) address, of the destination that it needs to transmit the frame to.
This layer resolves the destination hardware address from the IP address and places it in the frame header. It hands over the frame to the physical layer.
The examples of protocols in a network interface layer are

Open Shortest Path First (OSPF),
Point-to-Point Protocol (PPP),
Point-to-Point Tunneling Protocol (PPTP), and
Layer 2 Tunneling Protocol (L2TP).

The physical layer consists of the hardware. It is responsible for converting the bits of information into signals and transmitting the signal over the wire.

■Tip
Packet is a generic term that is used to mean an independent chunk of data in network programming.
Each layer of protocol also uses a specific term to mean the packet it deals with. For example,

a packet is called a segment in the TCP layer;
it is called a datagram in the IP layer;
it is called a frame in the network interface and physical layers.

Each layer adds a header (sometimes also a trailer) to the packet it receives from the layer before it, while preparing the packet to be transmitted over the network.
Each layer performs the reverse action when it receives a packet from the layer below it. It removes the header from the packet; performs some
actions, if needed; and hands over the packet to the layer above it.
When a packet sent by an application reaches the remote computer, it has to pass through the same layer of protocols in the reverse order. Each layer will remove its header, perform some actions, and pass the packet to the layer immediately above it. Finally, the packet reaches the remote application in the
same format it started from the application on the sender’s computer.
Figure below shows the transmission of packets from the sender and the receiver computer. P1, P2, P3, and P4 are the packets in different formats of the same data. A protocol layer at a destination receives the same packet from the layer immediately below it, which the same protocol layer had passed to the layer immediately below it on the sender’s computer.

IP Addressing Scheme

IP uses a unique address, called an IP address, to route an IP datagram to the destination.

An IP address uniquely identifies a connection between a computer and a router.

Normally, it is understood that an IP address identifies a computer. However, it should be emphasized that it identifies a connection between a computer and a router, not just a computer. A router is also assigned an IP address.

A computer can be connected to multiple networks using multiple routers and each connection between the computer and the router will have a unique IP address. In such cases, the computer will be assigned multiple IP addresses and the computer is known as multi-homed.

Multi-homing increases the availability of the network connection to a computer. If one network connection fails, the computer can use other available network connections.

An IP address contains two parts

a network identifier (prefix) and
a host identifier (suffix).

The prefix identifies a network on the Internet uniquely; the suffix identifies a host uniquely within that network. It is possible for two hosts to have IP addresses with the same suffix as long as they have a different prefix.

There are two versions of Internet Protocol—IPv4 (or simply IP) and IPv6, version 4 and version 6.
IPv6 is also known as Internet Protocol next generation (IPng).

Note that there is no IPv5. When IP was in its full swing of popularity, it was at version 4. Before IPng was assigned a version number 6, version 5 was already assigned to another protocol called Internet Stream Protocol (ST).
Both IPv4 and IPv6 use an IP address to identify a host on a network. However, the addressing schemes in the two versions differ significantly.

Since an IP address must be unique, its assignment is controlled by an organization called Internet Assigned Numbers Authority (IANA).

IANA assigns a unique address to each network that belongs to an organization.
The organization uses the network address and a unique number to form a unique IP address for each host on the network.
IANA divides the IP address allocations to five Regional Internet Registry(RIR) organizations, which allocate IP addresses in specific regions as listed in Table 4-1. You can find more information on how to get a network address in your area from IANA at www.iana.com.

IPv4 Addressing Scheme

IPv4 (or simply IP) uses a 32-bit number to represent an IP address. An IP address contains two parts—a prefix and a suffix. The prefix identifies a network and the suffix identifies a host on the network, as shown in Figure.

It is not easy for humans to remember a 32-bit number in binary format. IPv4 allows you to work with an alternate form using four decimal numbers. Each decimal number is in the range from 0 to 255. The decimal number format of IPv4 is called dotted decimal format because a dot is used to separate two decimal numbers. Each decimal number represents the value contained in 8 bits of the 32-bit number. For example, an IPv4 address of

11000000 10101000 00000001 11100111

in thhttps://en.wikipedia.org/wiki/Dot-decimal_notatione binary format can be represented as 192.168.1.231 in the dotted decimal format.

How do you know that 192.168.1 represents a prefix in an IPv4 address 192.168.1.231? A rule governs the value of a prefix and a suffix in an IPv4.
How does an IPv4 address divide its 32 bits between a prefix and a suffix? IPv4 address space is divided in five categories called network classes, named A, B, C, D, and E.
A class type defines how many bits of the 32 bits will be used to represent the network address part of an IP address.

The leading bit (or bits) in the prefix defines the class of the IP address. This is also known as a self-identifying or classful IP address because you can tell which class it belongs to by looking at the IP address.

The table lists the five network classes and their characteristics in IPv4. The leading bits in an IP address identify the class of the network.
For example,

if an IP address looks like 0XXX, where XXX is the last 31 bits of the 32 bits, it belongs to the class A network; There can be only 128(2^7) networks of class A type and each network can have 16777214((2^31/128)-2) hosts. The number of hosts that a class A network can have is very big and it is very unlikely that a network will have that many hosts.
if an IP address looks like 110XXX, where XXX is the last 29 bits of 32 bits, it belongs to the class C network. In a class C type of network, the maximum number of hosts that a network can have is limited to 254(2^8 - 2).

[to be continued...]

Getthebox

Search This Blog

Network Programming with Java: The basics

Comments

Post a Comment