一个神奇的蜘蛛?项目,适用于数据采集任务,源码结构简洁明了。


index页面示例:
---
项目地址https://github.com/lixi5338619/magical_spider
使用指南1、配置settings.py,启动flask服务
2、参考demo文件夹中的代码进行测试,主要通过runflow.py运行。
代码语言:javascript代码运行次数:0运行复制```javascript import requestshost = 'http://127.0.0.1:5000'def magical_start(project_name,base_url = 'https://www.php.cn/link/b6f05a7baab2fe0eea07e59bd5b0b317'): # 1、创建浏览器并选择session_id result = requests.post(f'{host}/create',data={'name':project_name,'url':base_url}).json() session_id,process_url = result['session_id'],result['process_url'] return session_id,process_urldef magical_request(session_id,process_url,request_url): # 2、请求浏览器_xhr data = {'session_id':session_id,'process_url':process_url, 'request_url':request_url,'request_type':'get'} result = requests.post(f'{host}/xhr',data=data).json() return result['result']def magical_close(session_id,process_url,process_name): # 4、关闭浏览器 close_data = {'session_id':session_id,'process_url':process_url,'process_name':process_name} requests.post(f'{host}/close',data=close_data).json()
3、测试代码GET请求
代码语言:javascript代码运行次数:0运行复制
javascript from demo.runflow import magical_start,magical_request,magical_closeproject_name = 'cnipa'base_url = 'https://www.cnipa.gov.cn'session_id,process_url = magical_start(project_name,base_url)print(len(magical_request(session_id, process_url,'https://www.cnipa.gov.cn/col/col57/index.html')))magical_close(session_id,process_url,project_name)
POST请求
代码语言:javascript代码运行次数:0运行复制javascript from demo.runflow import magical_start,magical_request,magical_closeimport jsonproject_name = 'chinadrugtrials'base_url = 'http://www.chinadrugtrials.org.cn'session_id,process_url = magical_start(project_name,base_url)data = {"id": "","ckm_index": "","sort": "desc","sort2": "","rule": "CTR","secondLevel": "0","currentpage": "2","keywords": "","reg_no": "","indication": "","case_no": "","drugs_name": "","drugs_type": "","appliers": "","communities": "","researchers": "","agencies": "","state": ""}formdata = json.dumps(data)print(magical_request(session_id=session_id, process_url=process_url, request_url='https://www.php.cn/link/44f95c37a24495521a98b63f0bbf4268', request_type='post',formdata=formdata ))magical_close(session_id,process_url,project_name)
4、index页面可以查看和管理当前运行中的任务,同时可以查看系统的内存和磁盘使用情况。
5、demo文件夹包含任务流程汇总runflow.py,以及抖音和药监局的案例,提供了单任务和多任务的示例。
BJXSHOP网上购物系统 - 书店版
BJXSHOP购物管理系统是一个功能完善、展示信息丰富的电子商店销售平台;针对企业与个人的网上销售系统;开放式远程商店管理;完善的订单管理、销售统计、结算系统;强力搜索引擎支持;提供网上多种在线支付方式解决方案;强大的技术应用能力和网络安全系统 BJXSHOP网上购物系统 - 书店版,它具备其他通用购物系统不同的功能,有针对图书销售而进行开发的一个电子商店销售平台,如图书ISBN,图书目录
下载
linux部署1.安装chrome(选择安装位置) yum install https://www.php.cn/link/6b17d006a2ed6f12f07c7ea60b8002b5
2.检查chrome版本 google-chrome --version
3.安装对应版本的chromedriver_linux64 例如,我的chrome版本是104.0.5112.79 wget https://www.php.cn/link/5bb5f8d030f5e6e4b6d137c6990f0772
4.解压chromedriver unzip chromedriver_linux64
5.授权chromedriver chmod 777 chromedriver
6.修改项目代码settings.py中的chromedriver路径
7.安装python依赖并启动flask项目
Python依赖:flask、sqlite3、selenium、websockets、opencv-python、numpyflask启动方式:python3 server.py8.开启服务器端口访问权限
9.运行项目测试









