Requests Tutorial
- Introduction
- GET Request
- HTTP Status Codes
- Request Content
- POST Request
- Handling Errors
- Setting a Timeout
- HTTP Request Headers
- Web Scraping with BeautifulSoup
- Requests vs urllib
Introduction
Requests is an elegant and simple HTTP library for Python, built for human beings (HTTP fir Humans)
The Requests module in Python lets you make HTTP requests so thant can interact with any website or API directly from your Python app.
GET Request
- To use Requests Python module, we have to install it:
pip install requests
- To make a simple GET request to a website, use this code:
import requests
url = "https://www.example.com"
response = requests.get(url)
- To show the result, add:
# it will show the HTTP status code
print(response)
HTTP Status Codes
HTTP status code:
Code | Meaning |
---|---|
2XX | Success |
3XX | Redirection |
4XX | Client Errors |
5XX | Server Errors |
- To get the status code attribute of the response object:
print(response.status_code)
Request Content
Once you know your request is successful, you will probably next want to access the actual content from the endpoint.
- To show the response content, we use the
content
attribute:
import requests
response = requests.get("https://www.example.com")
print(repsonse.content)
POST Request
GET requests are useful if you just want to read some information on a website. But if you want to send data to website, for example, create a post or submit a blog, you need to use a POST request. Example:
data = {"name": "Salah", "message": "Hello!"}
url = "https://httpbin.org/post"
response = requests.post(url, json=data)
https://httpbin.org is a website that gives a bunch of endpoints that we can use for testing HTTP requests.
- To view the JSON response data, use
json()
method on the response object:
response_data = response.json()
# Shows the data as a dictionary
print(response_data)
Handling Errors
When you work with HTTP requests, not everything is always going to be working well all the time. It’s important to handle errors properly so that your app can figure out what to do next.
- To check for error codes using the status code:
import requests
# here we use an endpoint that always gives a 404 status error
response = requests.get("https://httpbin.org/status/404")
# if status code is not 200 (successful response), then show error message
if response.status_code != 200:
print(f"HTTP Error: {response.status_code}")
Setting a Timeout
An app having a HTTP request could stack because a network issue and you haven’t set a timeout on that request. A timeout will force a request to fail if it doesn’t respond after a number of seconds. By default, the timeout value is set to none, which means that it will wait forever. This is not good because most HTTP requests shouldn’t take longer than a few seconds to process.
- To set a timeout for your request, you can pass the
timeout
parameter to the request method:
url = "https://httpbin.org/delay/10"
try:
response = requests.get(url, timeout=5)
except requests.exceptions.Timeout as err:
print(err)
If you change the delay to 2 seconds, it will succeed. Most of the time, it is better to just fail explicitly rather than wait and see and not know what’s going on with the request.
HTTP Request Headers
Setting headers in a request allows you to include additional information in the HTTP header. Setting headers is typically used for authentication or indicating what type of data will be send or expected to receive.
- To set headers, you can create a dictionary with the headers values you want to use:
auth_token = "XXXXXXXX"
# here we set the authorization header with the 'bearer token' for authentication purposes.
headers = {
"Authorization": f"Bearer {auth_token}"
}
url = "https://httpbin.org/headers"
response = requests.get(url, headers=headers)
print(response.json())
Web Scraping with BeautifulSoup
Now that you have learned the basics of making HTTP requests with the requests
module, you can use it for web scrapping.
Web Scraping is the process of extracting data directly from websites. You can use this to scrape anything from financial data, job posts, ecommerce listings, and so on.
- To scrape a website, you can start by using the get request method to receive the raw HTML content of the page:
import requests
url = "https://www.example.com"
# this will get all the HTML, javascript, css code
response = requests.get(url)
But the problem that it is pretty hard to read in its HTML format. For example, if you want to know the title of the page or the text content of the page or you want to know what links there are and what they lead to, then you are going to first need to parse all of this information, which would turn it into a data structure that makes it easier for us to use.
- To do that, we will use another module called
BeautifulSoup
:
pip install beautifulsoup4
- Once installed, you can make a get request and retrieve the HTML content, then you turn that into a soup object, which is data structure that makes searching and traversing the HTML content much easier:
import requests
from bs4 import BeautifulSoup
url = "https://www.example.com"
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")
- Here is a basic example of how you can get the title, the content on the page and the links available on the page:
title = soup.title.text
content = soup.find("p").text
links = [a["href"] for a in soup.find_all("a")]
print(title, content, links)
You can customize this code to scrape different websites and different types of data based on things element IDs, types, class names, and so on.
Requests vs urllib
Python has a built-in module called urllib
that you can also use to make requests as well. The main difference between this ans the requests
module is the level of abstraction they offer to you as a user, which impacts how easy they are to use.
- Here’s an example post request in
urllib
:
import urllib.request
import urllib.parse
data = urllib.parse.urlencode({"key"; "value"}).encode("utf-8")
req = urllib.request.Request("https://www.example.com", data=data, method="post")
with urllib.request.urlopen(req) as response:
html = response.read().decode("utf-8")
print(html)
- requests and urllib comparaison:
Feature | requests | urllib |
---|---|---|
Ease of Use | 🟢 | 🔴 |
Built-in | 🔴 | 🟢 |
By Wahid Hamdi