How to Inspect the HTML DOM and View the Source Code of Any Website

2020-10-21 19:10:24 | #programming #python #automation | Part 2 of 3

Tested On

  • Linux Ubuntu 20.04
  • Windows 10
  • macOS Catalina

Websites are composed of HTML (HyperText Markup Language; data), CSS (Cascading Style Sheets; styling), and JS (JavaScript; dynamic code). This means that most of what you can see rendered to the browser, can be extracted, quite easily.

Understanding the structure of a web page's HTML code gives you the ability to view and extract website content. This is the basic idea behind web scraping, where you can write a program that navigates to pages of a website, and target specific elements containing useful data.

In its most basic form, a web page is just an HTML file, that contains static data and metadata with the following node/tree-like structure:

<html>
  <head>
    <title>Web Page Title</title>
  </head>
  <body>
    <section class="container">
      <h1 id="main-title">Title</h1>
      <h2>Subtitle</h2>
      <p>Hello world!</p>
    </section>
  </body>
</html>

If you saved this to a file called index.html on a public server with an IP address or domain name (www.somesite.com), you'd have a website up and running. It would be a very basic looking web page, but HTML gets the job done because it contains enough information for the browser to render something to the screen.

<html> is the top-level DOM (Document Object Model) node/tag/element that contains all other child nodes. "Document" refers to the web page, and "Object Model" refers to the nodes (a data model of nodes/objects). At minimum, a web page requires an <html> tag, a <head> and <body> tag. Everything in the <head> tag gets processed before the page is displayed to the screen, and everything in the <body> is processed during/after.

Each tag has its own set of standards for what can be added as an attribute or nested within. Here are some examples:

  • The <title> tag can only contain text, and gets rendered to the web page's browser tab and in search results.
  • Tags in the <body> section can be given attributes, such as id="main-title" in the <h1> tag, and class="container" in the <section> tag.
  • <section> and <div> tags can contain child tags, such as <h1>, <h2>, <p>, <ul>, etc.
  • A <p> tag cannot contain a <ul> tag.

There are many more in the HTML Living Standard, if you'd like to review. For this tutorial, we just need to know enough to read and target DOM elements, not write them.

Viewing a Web Site's Source Code

Web page with the Chrome browser inspector enabled, showing the HTML element that matches the element in the source code

A good way to get familiar with HTML is to open up any website, and view its source code. Examples for how to do this in each browser, are listed below.

Chrome

  1. Right click the blank part of a webpage
  2. Click "View page source"

Firefox

  1. Right click the blank part of a webpage
  2. Click "View Page Source"

Safari

  1. Select "Safari" in the OS X menu bar
  2. Select "Preferences"
  3. Under "Advanced" select "Show Develop menu in menu bar"
  4. Under the, now visible, "Develop" menu select "Show Page Source"

Edge

  1. Right click the blank part of a webpage
  2. Click "View source"

HTML Tags

You should see a white page with hundreds of HTML nodes. You'll already be familiar with <html> <head> and <body>, but we'll explain some of the most common ones, below.

Tag Description
<meta> Metadata for search engines, social sharing, RSS readers, etc.
<script> Script that contains either inline JavaScript code or imports it from an external file
<main> Contains the main content of a document. Pages should only have one main tag.
<div> Container element
<section> Container element
<h1> - <h5> Are various sizes of headers/titles
<p> Paragraph
<ul> Unordered list (bullets)
<ol> Ordered list (numbered)
<li> List item
<table> Table
<form> Form that can submit form data to a back end service/API
<input> Input field that can accept text, numbers, emails, etc., depending on its type attribute.

HTML Atributes

Attributes define metadata for an HTML tag. This metadata could come in the form of a unique identifier or type, among other kinds of metadata.

Tag Description
id The unique identifier of an html tag. No two html tags should have matching IDs.
class A category given to an html tag. Multiple html tags can belong to the same class.
src Points to an external resource, such as an image, CSS file, JavaScript file, etc.
type Defines an input tag's type (text, number, email, password, etc.)

Targeting Individual Elements

Browsers also give you the power to hand pick individual elements for analysis. Using Chrome, Safari, Firefox, or Edge, you can right click an element on the page, and select "Inspect", which opens the browser console, containing the target element in relation to all other elements. Chrome's browser inspector looks like the following:

Example source code for a web page

Conclusion

Understanding how to navigate the HTML DOM tree is an invaluable skill for web designers, developers, QA, automation engineers, and other disciplines within IT. We hope this tutorial was useful. Subscribe to get notified of future tutorials, where you'll learn how to automate specific operations, such as clicks, keyboard input, form submissions, and so on.

If you're interested in programs that carry out your computer tasks for you, take our Automation the Easy Way with Python course. This course teaches CSV and Excel file generation, API requests, website scraping, email delivery, task scheduling, and browser click, mouse, and keyboard automation. Automate your daily tasks, free up time, and get ahead, today.

Want To See More Exercises?

View Exercises View Courses

Comments

You must log in to comment. Don't have an account? Sign up for free.

Subscribe to comments for this post

Want To Receive More Free Content?

Would you like to receive free resources, tailored to help you reach your IT goals? Get started now, by leaving your email address below. We promise not to spam. You can also sign up for a free account and follow us on and engage with the community. You may opt out at any time.



Tell Us About Your Project









Contact Us

Do you have a specific IT problem that needs solving or just have a general IT question? Use the contact form to get in touch with us and an IT professional will be with you, momentarily.

Hire Us

We offer web development, enterprise software development, QA &amp; testing, google analytics, domains and hosting, databases, security, IT consulting, and other IT-related services.

Free IT Tutorials

Head over to our tutorials section to learn all about working with various IT solutions.

Contact