What happens when you type google.com in your browser and press Enter

Inputting any website domain name into your browser accepts a set of standards that apply globally. When you put google.com into your browser and hit Enter, what happens? It will be broken down step by step, but first, let’s start with a more straightforward illustration.

Suppose you were to type google.com into a browser. This is easier because, in the case of high-traffic sites like google.com or facebook.com, your browser already knows popular sites’ Internet Protocol (IP) addresses.

The IP address is like your home address; the specific combination of numbers and letters points to your house. Every website has an IP address, and the domain name is a memorable name for the IP address. For example, typing 35.226.21.195 into your browser is the same as typing google.com`, but it is easier to remember google.com than a string of digits.

The domain name is linked to your IP address via an A Record, which I will cover when I discuss DNS. For now, understanding that 35.226.21.195 is the same as google.com is enough.

Typically, to query a domain name’s IP Address, your Computer must access the Internet to find a database filled with the site’s domain name matched to their IP addresses.

The browser doesn’t care what the domain name is, so long as it has an IP address matched to a given domain.

Given how popular google.com is, it would not make sense for your browser to query the record of the A Record matching google.com IP address. So your browser comes equipped with domain name to IP address matches for all the famous sites such as Google, Facebook, Youtube, NYTimes.com, etc. This record is kept in the browser cache memory. Cache memory is simply the Computer’s method which makes retrieving data from the Computer’s memory more efficient.

The browser’s cache memory holds popular sites, as does the Computer’s Operating System (OS). If the browser does not have the IP address of the domain name in the cache, it will query the OS to see if the IP address is on the OS’s cache memory.

Domain Name System

What happens if neither the OS nor the browser holds the IP address of a domain name in cache memory? Enter the Domain Name System.

Let’s use google.com as an example. It is a site used by hundreds of applicants to Holberton School every year and is probably visited a few thousand times a year. Given this, it is unlikely that your browser or OS will have the IP address that matches google.com in cache memory.

The act of pressing Enter with google.com now begins a complex set of communications that span the globe and involve dozens of stops along the way. This all takes place within the blink of an eye.

The process I am about to explain occurs over a few hundred milliseconds.

Once you’ve pressed Enter on google.com and the IP address is not in the Computer’s cache memory, the browser connects to the Domain Name System (DNS) through the Internet.

The Domain Name System is, in simple terms, the Internet’s phonebook. DNS is not one thing but a system of bookkeeping which allows browsers to match the IP address of any domain name. That’s a lot of domain names to keep track of, so the DNS works in steps.

Step 1: Recursive DNS resolver

Though your OS or browser may not have the IP address of google.com in its cache, the DNS Resolver may have it in its Database. Put; the Recursive DNS resolver is a database run by your Internet Service Provider (ISP). This could be a system run by Comcast or Verizon, but it is only the first stop in the DNS.

Let’s assume that the Recursive DNS resolver does not have the IP address for google.com. We move on to Step two.

Note that some domain names have more than one IP address. Though the IP address may change, the domain never will not. The Domain Name System is an up-to-date index of those IP addresses.

Step 2: DNS Root Name server

The DNS Root Name server is stepped one if the ISP Resolver doesn’t have the IP address in storage. The Root is the .com or .gov, for example, but a site can have dozens of Root names. google.com uses the most popular Root name, .com. Root names are also referred to as Top Level Domain (TLD) because it is the first part of the lookup process using DNS.

There are 13 Root Name servers worldwide. 13 physical locations of servers with databases connect the domain name of whatever root name is in the domain name to its IP address.

Whatever the TLD is, your browser has been directed (or referred) to a TLD Name server. Now in step 3, we move to the Authoritative name server.

Step 3: Authoritative Name Server

Once the Root name is known, the browser is directed to the Authoritative Name Server (ANS). This is the last stop.

When the good people at Holberton School created google.com, they registered the name Holberton school with a domain hosting company. This essentially took an unused IP address and mapped it to the name Holberton school with a Top Level Domain of .com. This mapping is called an A Record.

The A Record is the primary domain name that points to the IP address used. You can put a www in front of the TLD, but www is a convention. Indeed, the Third Level Domain can be anything you like.

When the name was registered and given an A Record, it now mapped to the correct IP address, which the hosting company in a database record. The Root level name took us to the correct ANS, and the ANS has the IP address for google mapped to its IP address.

Now that IP and domain name are mapped, the results are returned to the browser. The browser substitutes the domain name for the IP address, and we are shown the contents of the static web page. This web server hosts the contents in a server with the IP Address of google.com.

Now that your browser knows the correct IP address, it puts google.com = IP Address in its cached memory, so the next time you type google.com into your browser, it checks if it knows the IP address of google.com. Now that it does type google.com in your browser and press Enter to query the browser’s cached memory and goes to the affiliated IP address.

A simplified diagram showing the DNS and connecting to the Server via the IP address.

Servers, how do they work?

A Rack of Servers

On the other end of the Internet from the user (often referred to as the Client side) is a physical server at the IP address for google.com. The Server can be at the physical address of the site’s administrator. Still, it is most likely on a server farm along with thousands of other servers for smaller organizations.

A server has several components, which I will cover one at a time. For an example of how a server is broken down, let’s look at Amazon.com.

Amazon sells thousands of products. The template for any product page includes a product description, reviews, and the ability to conduct a transaction over the Internet.

Static Amazon site filled in with Codebase and Database hosted content

The parts page that never changes, such as the header, footer, and look and feel of a page, are served by the Web Server as the static website.

It is essential to understand that the converse of this would be that a developer would draw out each Amazon page at a time. This would be an incredibly inefficient system, as each aspect of the site’s pages would be a copy of the main template with the product-specific information and reviews added manually.

Inside Server. The three parts of the Server are virtual, so they are physically running on the OS.

Of course, Amazon knows that this wouldn’t be feasible, so they employ a Server model that joins the static web template with code functionality and Database drove content creation.

Web Server

The web server is a distinct part of a server setup. Often used interchangeably with Server, the Web Server is responsible for displaying the markup language, HTML and CSS, as the static website. Since every browser automatically runs HTML and CSS, this server part displays the website’s look but rarely includes any functionality. That comes with the Application Server.

Application Server

The Application Server is responsible for compiling the site’s Codebase, any non-HTM, CSS, or Javascript code that the site needs to run its non-static components. The reason that Javascript is omitted is that it is standard for every browser to run Javascript without any 3rd party tools.

The Codebase is often PHP, but Java Applets were run through the application server in the past. Technically, any language could be run through the App Server, even C, but in practice, most sites use the newest languages for their Codebase.

In our Amazon example, the Codebase is seen in the e-commerce functionality, and how reviews are handled and pictures are displayed.

That brings us to the Database.

Database

The Database (DB) is the last component of a server. Simply put, a DB is a structured set of data held in a computer, especially one that is accessible in various ways.

Most often, the DB holds data needed to make a site functional. For Amazon, the product description, images, and reviews are kept in a large database. When a particular product is chosen, the associated information and reviews are queried in the Database by the Application Server.

Together the DNS tells the browser what the IP address is, and the Client is connected to a Server via its IP address. Then the different server components work together to present a live, functional website.

Scalability

So far, we have covered the Domain Name System and Servers, but there are additional elements on the server end. These elements include the load balancer, SSL Certificate, and Firewall.

Once a site becomes more highly trafficked, it is time to add additional power to your site. This is done by adding more servers to the site. This helps increase the number of visits that the site can accommodate and helps keep downtime to a minimum when updating the Codebase (When one Server is done, a backup server is still hosting the site).

When a website is scaled up with more servers, additional concerns become balancing the workload between servers. Also, increased traffic means increased chances of infiltration by hackers, so security becomes paramount.

Load Balancer

A load balancer is used when one website (domain name or IP address) uses more than one Server. The load balancer is the front-facing program that acts as a filter between the Client, the Internet and the Server.

A load balancer acts as the “traffic cop” sitting in front of your servers and routing client requests across all servers, capable of fulfilling those requests in a manner that maximizes speed and capacity utilization and ensures that no one server is overworked, which could degrade performance. If a single server goes down, the load balancer redirects traffic to the remaining online servers. When a new server is added to the server group, the load balancer automatically starts to send requests to it.

Typical load balancer setup

Load balancers use special algorithms to direct traffic between servers. These are some of the algorithm designs used in load balancing:

Round Robin (sometimes called “Next in Loop”). Weighted Round Robin — as Round Robin, but some servers get a larger share of the overall traffic. Random. Source IP hash. Connections are distributed to backend servers based on the source IP address. Suppose a Webnode fails and is taken out of service; the distribution changes. As long as all servers are running, a given client IP address will always go to the same web server. URL hash. Much like source IP hash, except hashing is done on the URL of the request. Useful when load balancing in front of proxy caches, as requests for a given object will always go to just one backend cache. This avoids cache duplication, having the same object stored in several / all caches, and increases the effective capacity of the backend caches. Least connections, weighted least connections. The load balancer monitors the number of open connections for each Server and sends it to the least busy Server. The least traffic weighted the least traffic. The load balancer monitors the bitrate from each Server and sends it to the Server that has the least outgoing traffic. Least latency. Perlbal makes a quick HTTP OPTIONS request to backend servers and sends the request to the first Server to answer. Whichever algorithm is used, the load balancer’s job is the same: schedule the workload of servers.

Sources

https://www.cloudflare.com/learning/dns/what-is-dns/

What Is Load Balancing? How Load Balancers Work

Load balancing refers to efficiently distributing incoming network traffic across a group of backend servers, also…

www.nginx.com

Database

A database is an organized collection of data, generally stored and accessed electronically from a computer system…

en.wikipedia.org

Server (computing)

In computing, a server is a computer program or a device that provides functionality for other programs or devices…

en.wikipedia.org

What is a DNS Resolver?

A DNS resolver, also known as a resolver, is a server on the Internet that converts domain names into IP addresses…

www.computerhope.com

What kind of load-balancing algorithms are there

The most common load balancing algorithms for HTTP load balancers are IMHO: Round Robin (sometimes called “Next in…

serverfault.com