request基本使用教程
request基本使用教程
request使用
一.基本用法
1.准备工作安装request库,pip安装或再pycharm内安装。
2.实例引入renquest库中方法清晰简单,获取网页直接使用get方法就能直接实现:
代码:
import requests
response = requests.get(‘http://www.baidu.com/’)
print(type(response))
print(response.status_code)
print(type(response.text))
print(response.cookies)
运行结果:
<class ‘requests.models.Response’>
200
<class ‘str’>
<RequestsCookieJar[<Cookie BDORZ=27315 for .baidu.com/>]>
request中我们使用其他方法也很简单:
r = request.post(‘url/post’)
r = request.put(‘url/put’)
r = request.delete(‘url/delete’)
r = request.head(‘url/get’)
r = request.post(‘url/get’)
**3.**GET请求
http中*常见的请求是get请求,对get请求进行详解。
首先是一个简单的get的请求,请求一个链接http://httpbin.org/get,该网站会判断如果客户端发起get请求的话,它会返回相应信息:
这是一个专门测试的网站 http://httpbin.org/
import requests
response = requests.get(‘http://httpbin.org/get’)
print(response.text)
运行结果:
{
“args”: {},
“headers”: {
“Accept”: “*/*”,
“Accept-Encoding”: “gzip, deflate”,
“Host”: “httpbin.org”,
“User-Agent”: “python-requests/2.25.1”,
“X-Amzn-Trace-Id”: “Root=1-6076af01-5222d2ca535f9b6039280a85”
},
“origin”: “18.167.102.111”,
“url”: “http://httpbin.org/get”
}
可以看到我们发起了请求,返回的结果能看到我们的请求头,链接,ip等信息。
我们还可以向get请求中传入参数,构造一个附加额外信息的get请求。
直接构造:
r = request.get(‘http://httpbin.org/get?name=germey&age=22’)
1
?后面接参数开始 &连接参数,显然这样太复杂因此我们引入params参数。
params参数示例:
import requests
data = {
‘name’: ‘germey’,
‘age’: ’22’
}
response = requests.get(‘http://httpbin.org/get’,params=data)
print(response.text)
运行结果:
{
“args”: {
“age”: “22”,
“name”: “germey”
},
“headers”: {
“Accept”: “*/*”,
“Accept-Encoding”: “gzip, deflate”,
“Host”: “httpbin.org”,
“User-Agent”: “python-requests/2.25.1”,
“X-Amzn-Trace-Id”: “Root=1-6076b273-24b1eebd6542e66659560793”
},
“origin”: “18.167.102.111”,
“url”: “http://httpbin.org/get?name=germey&age=22”
}
根据控制台返回的信息我们知道request已经自动为我们构造了一个url链接。我们还可以调用json方法,将字符串类型的json格式转换为字典类型的json。
print(response.json)
抓取网页
我们以知乎为例来抓取知乎的界面信息
import requests
import re
headers = {
‘cookie’: ‘填入你的cookies’,
‘user-agent’: ‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36’
}
r = requests.get(‘https://www.zhihu.com/’, headers=headers)
print(r.text)
print(r.status_code)
抓取二进制数据
我们抓取了一个知乎界面但其实我们知道它是一个html文档,如果我们向抓取二进制数据图片、视频、音频又应该怎么办呢。
我们以知乎的图标为例:
import requests
import re
headers = {
‘cookie’: ‘填入你的cookies’,
‘user-agent’: ‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36’
}
r = requests.get(‘https://static.zhihu.com/heifetz/assets/apple-touch-icon-152.a53ae37b.png’, headers=headers)
print(r.text) # 直接打印会乱码 因为将二进制强制以字符格式输出
with open(‘zhihu.png’,’wb’) as f : # 写入本地可再与.py文件相同的地址查看图片
f.write(r.content)
4.post请求
我们了解了*基本的get请求,post也是应该常见的请求方式,就是把相关信息返回。
import request
data = {‘name’: ‘germey’, ‘age’: ’22’}
r = requests.post(‘http://httpbin.org/post’,data=data)
print(r.text)
运行结果:
{
“args”: {
“age”: “22”,
“name”: “germey”
},
“headers”: {
“Accept”: “*/*”,
“Accept-Encoding”: “gzip, deflate”,
“Host”: “httpbin.org”,
“User-Agent”: “python-requests/2.25.1”,
“X-Amzn-Trace-Id”: “Root=1-6076f804-446c1b1d538e216f066e33ee”
},
“origin”: “111.17.194.60”,
“url”: “http://httpbin.org/get?name=germey&age=22”
}
我们可以看到数据的提交。
5.响应
import requests
headers = {
‘cookie’: ‘url’,
‘user-agent’: ‘Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36’
}
r = requests.get(‘https://www.zhihu.com/’, headers=headers)
print(‘状态码:’,r.status_code)
print(‘请求头:’,r.headers)
print(‘cookies:’,r.cookies)
运行结果:
状态码: 200
请求头: {‘Server’: ‘CLOUD ELB 1.0.0’, ‘Date’: ‘Wed, 14 Apr 2021 14:18:07 GMT’, ‘Content-Type’: ‘text/html; charset=utf-8’, ‘Vary’: ‘Accept-Encoding’, ‘set-cookie’: ‘tst=; path=/; expires=Thu, 01 Jan 1970 00:00:00 GMT; httponly, KLBRSID=ed2ad9934af8a1f80db52dcb08d13344|1618409886|1618409886; Path=/’, ‘content-security-policy’: “default-src * blob:; img-src * data: blob: resource: t.captcha.qq.com cstaticdun.126.net necaptcha.nosdn.127.net; connect-src * wss: blob: resource:; frame-src ‘self’ *.zhihu.com mailto: tel: weixin: *.vzuu.com mo.m.taobao.com getpocket.com note.youdao.com safari-extension://com.evernote.safari.clipper-Q79WDW8YH9 zhihujs: captcha.guard.qcloud.com pos.baidu.com dup.baidustatic.com openapi.baidu.com wappass.baidu.com passport.baidu.com *.cme.qcloud.com vs-cdn.tencent-cloud.com t.captcha.qq.com c.dun.163.com; script-src ‘self’ blob: *.zhihu.com g.alicdn.com qzonestyle.gtimg.cn res.wx.qq.com open.mobile.qq.com ‘unsafe-eval’ unpkg.zhimg.com unicom.zhimg.com resource: captcha.gtimg.com captcha.guard.qcloud.com pagead2.googlesyndication.com cpro.baidustatic.com pos.baidu.com dup.baidustatic.com i.hao61.net ‘nonce-395681d5-2009-4f24-ba9b-2f4de9719d15’ hm.baidu.com zz.bdstatic.com b.bdstatic.com imgcache.qq.com vs-cdn.tencent-cloud.com ssl.captcha.qq.com t.captcha.qq.com cstaticdun.126.net c.dun.163.com ac.dun.163.com/ acstatic-dun.126.net; style-src ‘self’ ‘unsafe-inline’ *.zhihu.com unicom.zhimg.com resource: captcha.gtimg.com ssl.captcha.qq.com t.captcha.qq.com cstaticdun.126.net c.dun.163.com ac.dun.163.com/ acstatic-dun.126.net”, ‘x-frame-options’: ‘SAMEORIGIN’, ‘strict-transport-security’: ‘max-age=15552000; includeSubDomains’, ‘surrogate-control’: ‘no-store’, ‘cache-control’: ‘no-cache, no-store, must-revalidate, private, max-age=0’, ‘pragma’: ‘no-cache’, ‘expires’: ‘0’, ‘x-content-type-options’: ‘nosniff’, ‘x-xss-protection’: ‘1; mode=block’, ‘X-Backend-Response’: ‘0.435’, ‘Referrer-Policy’: ‘no-referrer-when-downgrade’, ‘X-SecNG-Response’: ‘0.43999981880188’, ‘X-UDID’: ‘AFDupPWGbBCPTjG_3TfIUhnlEgcac_LzB2M=’, ‘x-lb-timing’: ‘0.441’, ‘x-idc-id’: ‘2’, ‘Content-Encoding’: ‘gzip’, ‘Transfer-Encoding’: ‘chunked’, ‘X-NWS-LOG-UUID’: ‘7093856299592433989’, ‘Connection’: ‘keep-alive’, ‘X-Cache-Lookup’: ‘Cache Miss’, ‘x-edge-timing’: ‘0.462’, ‘x-cdn-provider’: ‘tencent’}
cookies: <RequestsCookieJar[<Cookie KLBRSID=ed2ad9934af8a1f80db52dcb08d13344|1618409886|1618409886 for www.zhihu.com/>]>
当然还有其他的各种响应信息不再列举。
来源于《python3网络爬虫开发实战》笔记