Requests Tutorial



Introduction

Requests is an elegant and simple HTTP library for Python, built for human beings (HTTP fir Humans)

The Requests module in Python lets you make HTTP requests so thant can interact with any website or API directly from your Python app.

GET Request

  • To use Requests Python module, we have to install it:
pip install requests
  • To make a simple GET request to a website, use this code:
import requests

url = "https://www.example.com"
response = requests.get(url)
  • To show the result, add:
# it will show the HTTP status code
print(response)

HTTP Status Codes

HTTP status code:

Code Meaning
2XX Success
3XX Redirection
4XX Client Errors
5XX Server Errors
  • To get the status code attribute of the response object:
print(response.status_code)

Request Content

Once you know your request is successful, you will probably next want to access the actual content from the endpoint.

  • To show the response content, we use the content attribute:
import requests

response = requests.get("https://www.example.com")
print(repsonse.content)

POST Request

GET requests are useful if you just want to read some information on a website. But if you want to send data to website, for example, create a post or submit a blog, you need to use a POST request. Example:

data = {"name": "Salah", "message": "Hello!"}
url = "https://httpbin.org/post"

response = requests.post(url, json=data)

https://httpbin.org is a website that gives a bunch of endpoints that we can use for testing HTTP requests.

  • To view the JSON response data, use json() method on the response object:
response_data = response.json()
# Shows the data as a dictionary
print(response_data)

Handling Errors

When you work with HTTP requests, not everything is always going to be working well all the time. It’s important to handle errors properly so that your app can figure out what to do next.

  • To check for error codes using the status code:
import requests

# here we use an endpoint that always gives a 404 status error
response = requests.get("https://httpbin.org/status/404")
# if status code is not 200 (successful response), then show error message
if response.status_code != 200:
    print(f"HTTP Error: {response.status_code}")

Setting a Timeout

An app having a HTTP request could stack because a network issue and you haven’t set a timeout on that request. A timeout will force a request to fail if it doesn’t respond after a number of seconds. By default, the timeout value is set to none, which means that it will wait forever. This is not good because most HTTP requests shouldn’t take longer than a few seconds to process.

  • To set a timeout for your request, you can pass the timeout parameter to the request method:
url = "https://httpbin.org/delay/10"

try:
    response = requests.get(url, timeout=5)
except requests.exceptions.Timeout as err:
    print(err)

If you change the delay to 2 seconds, it will succeed. Most of the time, it is better to just fail explicitly rather than wait and see and not know what’s going on with the request.

HTTP Request Headers

Setting headers in a request allows you to include additional information in the HTTP header. Setting headers is typically used for authentication or indicating what type of data will be send or expected to receive.

  • To set headers, you can create a dictionary with the headers values you want to use:
auth_token = "XXXXXXXX"

# here we set the authorization header with the 'bearer token' for authentication purposes.
headers = {
    "Authorization": f"Bearer {auth_token}"
}

url = "https://httpbin.org/headers"
response = requests.get(url, headers=headers)
print(response.json())

Web Scraping with BeautifulSoup

Now that you have learned the basics of making HTTP requests with the requests module, you can use it for web scrapping.

Web Scraping is the process of extracting data directly from websites. You can use this to scrape anything from financial data, job posts, ecommerce listings, and so on.

  • To scrape a website, you can start by using the get request method to receive the raw HTML content of the page:
import requests

url = "https://www.example.com"
# this will get all the HTML, javascript, css code
response = requests.get(url)

But the problem that it is pretty hard to read in its HTML format. For example, if you want to know the title of the page or the text content of the page or you want to know what links there are and what they lead to, then you are going to first need to parse all of this information, which would turn it into a data structure that makes it easier for us to use.

  • To do that, we will use another module called BeautifulSoup:
pip install beautifulsoup4
  • Once installed, you can make a get request and retrieve the HTML content, then you turn that into a soup object, which is data structure that makes searching and traversing the HTML content much easier:
import requests
from bs4 import BeautifulSoup

url = "https://www.example.com"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
  • Here is a basic example of how you can get the title, the content on the page and the links available on the page:
title = soup.title.text
content = soup.find("p").text
links = [a["href"] for a in soup.find_all("a")]

print(title, content, links)

You can customize this code to scrape different websites and different types of data based on things element IDs, types, class names, and so on.

Requests vs urllib

Python has a built-in module called urllib that you can also use to make requests as well. The main difference between this ans the requests module is the level of abstraction they offer to you as a user, which impacts how easy they are to use.

  • Here’s an example post request in urllib:
import urllib.request
import urllib.parse

data = urllib.parse.urlencode({"key"; "value"}).encode("utf-8")
req = urllib.request.Request("https://www.example.com", data=data, method="post")
with urllib.request.urlopen(req) as response:
    html = response.read().decode("utf-8")
print(html)
  • requests and urllib comparaison:
Feature requests urllib
Ease of Use 🟢 🔴
Built-in 🔴 🟢

By Wahid Hamdi