From the course: AWS Certified DevOps Engineer Professional (DOP-C02) Cert Prep
CloudFront overview
From the course: AWS Certified DevOps Engineer Professional (DOP-C02) Cert Prep
CloudFront overview
- [Instructor] In this lecture, we'll talk about Amazon CloudFront and the core concepts behind content delivery networks and edge locations. We will also cover its different components in other features. Basically, Amazon CloudFront is a content delivery network service. A content delivery network is also known as CDN, and it is technically a network of servers and data centers scattered around the world that temporarily store the pictures, videos, documents, and other static files of your website. The process of temporarily storing these static files is called caching, which improves the performance of your web applications that are accessed by users globally. CloudFront can significantly decrease the latency of delivering your web content to your end users. In other words, it reduces the lag time of loading your website. Before we dig deeper into Amazon CloudFront, let's discover why we need a CDN in the first place and how it works. This will provide us a strong foundation of the underlying technology used by CloudFront. So what really is a content delivery network? Simply put, CDN is a network of data centers that delivers content to the viewers or users of your website. Imagine it this way. Suppose you have a server running somewhere on the west coast, specifically in Los Angeles, California. Your server in LA hosts a website that shares travel photos, videos, and other high-resolution static media files to millions of users in Asia. The photos and videos are viewed and downloaded by your customers in the Philippines, India, Taiwan, Singapore, Indonesia, Malaysia, and other Asian countries. The question is, how long can these high-resolution travel photos be transmitted and viewed by your users? We have to consider the size of these files and the distance between the source server and its viewers. For example, there's a user named Stacy who is located in Manila, Philippines. She visited your website by typing the domain name in the web browser. At this point, the journey of her HTTP request to your server begins. The request will start from her laptop, then to her home wifi router, and then go through the different routers of her local internet service provider, or ISP. Now, since her server is located in California, how will her request reach the United States if it originated from the Philippines? There are roughly two ways for her HTTP request to reach your server in LA. It's either by satellite or by sea. If Stacy has satellite-based internet access like Starlink, then her traffic will be beamed from her antenna dish to a SpaceX satellite that's orbiting the earth. The traffic will then be routed to the US network, and it'll be forwarded continuously until it reaches your host server in Los Angeles. If Stacy is using a traditional internet service provider, then traffic may reverse different routers and networking components across the Pacific Ocean via undersea internet cables, which are also known as submarine cables. There's a series of undersea cables that run through the Pacific Ocean called Transpac, or trans-Pacific cables linking Asia to the mainland, United States. You have the Asia-America Gateway, or AAG, the Pacific Light Cable Network, or PLCN, the Tata TGN-Intra Asia Cable System, or TGNIA, Southern Cross Cable and many others. These submarine cables span tens of thousands of kilometers in length. So from the Philippines, the request might travel to Guam, to Hawaii, to the United States West Coast until it reaches your server in LA. That distance is over 10,000 kilometers, or about 7,000 miles. This may cause several seconds of delay in fetching the content in your LA server back to your user in Manila. For photos, the lag time could be about a few seconds, but for high-quality videos, it may take several minutes to load before Stacy can view the content. This poor level of performance is definitely not desirable and may cause you to lose clients or high-value online transactions. The same thing goes if you have a user in Europe who tries to access your website. The HTTP request will have to travel a long way to deliver the content to the user. There is a series of trans-Atlantic submarine cables too that goes through the Atlantic Ocean, connecting the United States to Ireland, the UK, France, Germany, and other European countries. The traffic may reverse the TAT fiber optic cable routes, which are built by joint ventures between a number of telecommunication companies, or via private cable routes like MAREA, which is owned by Facebook, Microsoft and Telefonica. Of course, Amazon and Google have their own undersea cables too. From your server in LA, the response traffic will have to travel to the Midwest, to New York, and all the way through the Atlantic before the media content is distributed to your European users who just want to view your travel photos and videos. This long geographic distance within your origin server and your users is a common root cause as to why it takes a lot of time for your content to load and why your website is too slow. This is the primary reason why you need a content delivery network. With a CDN in place, the user requests won't be traveling thousands of miles across the Pacific or the Atlantic Ocean just to fetch your data from your origin server. Your data can be stored, or cached in the vast network of data centers and co-location facilities that are located in nearby proximity to your end users. These data centers are also called Points of Presence, or PoP, which consists of edge locations and regional edge caches. A PoP is a physical location that acts as a demarcation point for which two or more networks share a connection. Going back to our example, let's supposed you set your travel website as an origin to an Amazon CloudFront distribution. Your photos and videos will eventually be stored in different edge allocations worldwide and not just in Los Angeles, California. Your content could be cashed in Indonesia, Malaysia, and even in the Philippines. In this way, your users like Stacy will be able to view the media files in less than a second since the data is already available nearby. Essentially, an edge location is simply a site that Amazon CloudFront uses to cache or store the copies of your content for faster delivery. Remember that this involves different internet service provider from various countries and territories. Each of these ISPs or private networks has an edge, or a boundary where other networks can connect to. An edge location AWS is powered by a physical edge router located at a network boundary that enables an internal network to connect to external networks. The connection to other networks is done in an internet exchange point, or IXP. There are also edge servers that caches content and perform computations in close proximity to the users. An edge server is just a standard direct server or a device that provides an entry point for a particular network to connect to a different network. These edge servers are used to cache static content, which effectively reduces the latency to your website. It can also cache dynamic content too, such as the response data of API calls as well as web socket connections. Once again, Amazon CloudFront is a content delivery network service in AWS. You might get overwhelmed by the advanced networking concepts that we mentioned here, but don't worry. Just keep in mind that a CDN is primarily used to deliver the static and dynamic content of our websites faster. CloudFront is a network of data centers, point of presence and edge locations that distribute content to the viewers of your website. Amazon CloudFront has three basic components, which are the origin, the distribution, and the viewer. Basically, an origin is where the content originates or comes from. The distribution is an actual AWS resource that you can launch and configure to control how to distribute your content to your users. Lastly, a viewer is the actual website visitors or users who view your content. In your CloudFront distribution, you can set an Amazon S3 bucket, an Elastic Load Balancer, an AWS Elemental service, an HHTP server running in an Amazon EC2 instance or in another external host as the origin. You can also set your on-premises server as the custom origin of your distribution. Amazon CloudFront has a lot of caching and security features that you can use to distribute your content effectively and securely. For example, if your origin is an Amazon S3 bucket, you can restrict access to S3 content by using an origin access identity user, or OAI. You can restrict access to your content based on the country or geographic location of your viewers. This can be done with its jurisdiction feature that lets you control the distribution of your content at the country level. Amazon CloudFront has a feature called Lambda@Edge, which lets you do certain computations in proximity to your users and not just deliver static content. Lambda@Edge allows you to run custom code closer to the end users of your application to improve your application performance and reduce latency. Since the traffic doesn't go to the underlying origin server, this saves you from high data transfer costs as well. You can also improve the availability of your web applications with CloudFront by adding two origins instead of one. You can set up an origin failover by creating an origin group that contains your primary and secondary sources. If your primary origin is unavailable, the traffic will fail over to the secondary origin. Serving private content to specific users is also possible with its signed URLs and signed cookies features. You can distribute your content securely via HTTPS by adding an SNI custom SSL or a dedicated IP custom SSL. AWS WAF can be integrated as well with your CloudFront distribution to safeguard your application from common web vulnerabilities. CloudFront has a lot of other features that you can use for your workloads. We'll discuss these in detail in our succeeding lectures.