Accessing password-protected pages
Sometimes a web page is not open to the public but protected in some way. The simplest aspect of protection is to use basic HTTP authentication, which is integrated into virtually every web server and implements a user/password schema.
Getting ready
We can test this kind of authentication in https://httpbin.org.
It has a path, /basic-auth/{user}/{password}
, which forces authentication, with the user and password stated. This is very handy for understanding how authentication works.
How to do it...
- Import
requests
:>>> import requests
- Make a
GET
request to the URL with the wrong credentials. Notice that we set the credentials on the URL to beuser
andpsswd
:>>> requests.get('https://httpbin.org/basic-auth/user/psswd', auth=('user', 'psswd')) <Response [200]>
- Use the wrong credentials to return a 401 status code (
unauthorized
):>>> requests.get('https://httpbin.org/basic-auth/user/psswd', auth=('user', 'wrong')) <Response [401]>
- The credentials can also be passed directly as part of the URL, separated by a colon and an
@
symbol before the server, like this:>>> requests.get('https://user:psswd@httpbin.org/basic-auth/user/psswd') <Response [200]> >>> requests.get('https://user:wrong@httpbin.org/basic-auth/user/psswd') <Response [401]>
How it works...
As HTTP basic authentication is supported everywhere, support from requests
is very easy.
Steps 2 and 4 in the How to do it… section show how to provide the proper password. Step 3 shows what happens when the password is wrong.
Remember to always use HTTPS to ensure that the sending of the password is kept secret. If you use HTTP, the password will be sent in the open over the internet, allowing it to be captured by listening elements.
There's more...
Adding the user and password to the URL works on the browser as well. Try to access the page directly to see a box asking for the username and password:
Figure 3.9: User credentials page
When using a URL containing the user and password, https://user:psswd@httpbin.org/basic-auth/user/psswd
, the dialog does not appear, and it authenticates automatically.
If you need to access multiple pages, you can create a session in requests
and set the authentication parameters to avoid having to input them everywhere:
>>> s = requests.Session()
>>> s.auth = ('user', 'psswd')
>>> s.get('https://httpbin.org/basic-auth/user/psswd')
<Response [200]>
See also
- The Downloading web pages recipe, earlier in this chapter, to learn the basics of requesting web pages.
- The Accessing web APIs recipe, earlier in this chapter, to learn how to access APIs that are behind an authentication wall.