HTTPでWebのコンテンツやWebページを取得する

概要

PythonでHTTPを利用してWebページを取得する場合には、urllib urllib.request を利用します。

書式

アノテーションありの場合

req: urllib.request.Request = urllib.request.Request([取得するページのURL])
response: HTTPResponse = urllib.request.urlopen(req)
htmltext: str = resp.read().decode('utf-8')

アノテーションなしの場合

req = urllib.request.Request([取得するページのURL])
resp = urllib.request.urlopen(req)
htmltext = resp.read().decode('utf-8')

プログラム

コード例

下記のPythonコードを作成します。

import urllib.request
from http.client import HTTPResponse

url: str = 'https://www.ipentec.com'
req: urllib.request.Request = urllib.request.Request(url)
resp: HTTPResponse = urllib.request.urlopen(req)

htmltext: str = resp.read().decode('utf-8')
print(htmltext)

resp.close()

解説

importを利用してライブラリを参照します。HTTP接続を利用してWebページを取得するurllibを参照します。

import urllib.request
from http.client import HTTPResponse

取得するWebページのURLを用意し、urllib.request.Request メソッドを呼び出し Requestオブジェクトを取得します。

url: str = 'https://www.ipentec.com'
req: urllib.request.Request = urllib.request.Request(url)

urllib.request.urlopen を呼び出します。引数には先ほど取得したRequestオブジェクトを与えます。戻り値はWebサーバーからのレスポンスを格納する Response オブジェクトになります。

resp: HTTPResponse = urllib.request.urlopen(req)

Responseオブジェクトのread()メソッドを呼び出すとHTTPで取得したコンテンツを取得できます。今回はUTF-8でデコードするため、decodeメソッドも呼び出して文字をデコードします。デコードしたページの文字列をhtmltext変数に代入します。その後 print 文で取得したWebのページを画面に表示します。

htmltext: str = resp.read().decode('utf-8')
print(htmltext)

実行結果

プロジェクトを実行します。実行するとすぐに https://www.ipentec.com にアクセスしトップページのHTML情報をダウンロードし画面にコードを表示できています。
HTTPでWebのコンテンツやWebページを取得する:画像1

参考:アノテーションなしのコード

import urllib.request

url = 'https://www.ipentec.com'
req = urllib.request.Request(url)
resp = urllib.request.urlopen(req)

htmltext = resp.read().decode('utf-8')
print(htmltext)

resp.close()

著者

Penta

iPentecのメインプログラマー
C#, ASP.NET の開発がメイン、少し前まではDelphiを愛用

作成日: 2020-02-04

目次

HTTPでWebのコンテンツやWebページを取得する

目次

概要

書式

プログラム

コード例

解説

実行結果

参考:アノテーションなしのコード

関連するページ