100字范文,内容丰富有趣,生活中的好帮手!
100字范文 > JSoup模拟登录新版正方教务系统(内网-教务系统)获取信息过程详解

JSoup模拟登录新版正方教务系统(内网-教务系统)获取信息过程详解

时间:2022-06-29 22:39:04

相关推荐

JSoup模拟登录新版正方教务系统(内网-教务系统)获取信息过程详解

新版正方教务系统登录界面:

目录

一、需求分析二、模拟登录内网三、模拟登录教务系统四、爬取成绩和课表信息参考文章

一、需求分析

需要访问教务系统,爬取出课表成绩等信息,并在自己所写的APP上进行展示。由于访问教务系统需要连接校园网,所以本次爬取采用了“内网-教务系统”两级爬取策略,即先模拟登录校园内网,然后携带内网cookies登录教务系统,最终爬取相关信息。

二、模拟登录内网

内网登录界面:

URL:https://webvpn./users/sign_in

主要步骤:

填好用户名以及登录密码,按下F12,并在Elements中搜索action:

可以看到,我们输入的表单数据最终被提交到了"/users/sign_in"里。

点击登录,在Network里面找到sign_in,可以看到我们模拟登录需要的各种信息:

开始写代码。

第一步主要是获取表单信息以及cookie,可能有的网站这里必须动态请求网页数据来获得form data,具体原因请参考:Exception in thread “main“ org.jsoup.HttpStatusException: HTTP error fetching URL. Status=422, URL=

Connection connection = Jsoup.connect(URL);connection.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36");Response res = connection.execute(); //获取res.cookies(),后面要用到Document d = Jsoup.parse(res.body());List<Element> elements = d.select("form");Map<String, String> datas = new HashMap<>();for (Element element : elements.get(0).getAllElements()) {if (element.attr("name").equals("user[login]")) {element.attr("value", "************");}if (element.attr("name").equals("user[password]")) {e.attr("value", "******");}if (element.attr("name").length() > 0) {datas.put(e.attr("name"), e.attr("value"));}}

USER_AGENT等信息都在这里面:

我们可以打印一下datas:

{user[dymatice_code]=unknown, utf8=?, commit=登录 Login, user[login]=马赛克, user[password]=马赛克, authenticity_token=+BD3FgRXj+LsvgUpS81EKyU7SOF1B6eshSzfo3aMOSHD3LoMsx8ZP85vWNbm1PbPJGbgJqHVbFkTvHuSzDwI8A==}

第二步就是提交表单信息以及cookies,进行模拟登录:

Connection connection2 = Jsoup.connect("https://webvpn./users/sign_in");connection2.header(USER_AGENT, USER_AGENT_VALUE);Response response = connection2.ignoreContentType(true).followRedirects(true).method(Method.POST).data(datas).cookies(res.cookies()).execute();

最后一步:打印一下获得的html以及获得的cookies:

System.out.println(response.body());Map<String, String> map = response.cookies();for (String s : map.keySet()) {System.out.println(s + " : " + map.get(s));}

三、模拟登录教务系统

我们模拟登录进入到了内网界面:

现在我们要模拟登录到新教务系统这个网页,进入到它的登录页面,也就是文章一开始给出的界面:

主要步骤如下:

按照模拟登录校园内网的方式,查看需要提交哪些表单数据,这里就不再演示了,直接上代码:

//登录public boolean beginLogin() throws Exception{connection = Jsoup.connect(url+ "/jwglxt/xtgl/login_slogin.html").cookies(cookies_innet);connection.header("Content-Type","application/x-www-form-urlencoded;charset=utf-8");connection.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36");connection.data("csrftoken",csrftoken);connection.data("yhm",stuNum);connection.data("mm",password);connection.data("mm",password);connection.cookies(cookies).ignoreContentType(true).method(Connection.Method.POST).execute();response = connection.followRedirects(true).execute();document = Jsoup.parse(response.body());//登录成功//System.out.println(document);if(document.getElementById("tips") == null){System.out.println("欢迎登录");System.out.println(response.cookies());return true;}else{System.out.println(document.getElementById("tips").text());System.out.println(response.cookies());return false;}}

代码里面的cookies_innet就是模拟登录内网获得的cookies。csrftoken需要额外获取,另外这里面的密码是加密了的,所以我们也需要获取对当前输入密码加密后的密码,代码如下:

// 获取csrftoken和Cookies,并没有出错private void getCsrftoken(){try{connection = Jsoup.connect(url+ "/jwglxt/xtgl/login_slogin.html?language=zh_CN&_t="+new Date().getTime()).cookies(cookies_innet);connection.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36");response = connection.followRedirects(true).execute();cookies = response.cookies();//保存csrftokendocument = Jsoup.parse(response.body());csrftoken = document.getElementById("csrftoken").val();}catch (Exception ex){ex.printStackTrace();}}// 获取公钥并加密密码public void getRSApublickey() throws Exception{connection = Jsoup.connect(url+ "/jwglxt/xtgl/login_getPublicKey.html?" +"time="+ new Date().getTime()).cookies(cookies_innet);connection.header("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36");response = connection.cookies(cookies).ignoreContentType(true).followRedirects(true).execute();JSONObject jsonObject = JSON.parseObject(response.body());modulus = jsonObject.getString("modulus");exponent = jsonObject.getString("exponent");password = RSAEncoder.RSAEncrypt(password, B64.b64tohex(modulus), B64.b64tohex(exponent));password = B64.hex2b64(password);}

附加的B64.java与RSAEncoder.java代码:

import static java.lang.Integer.parseInt;public class B64 {public static String b64map="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";private static char b64pad = '=';private static String hexCode = "0123456789abcdef";// 获取对应16进制字符public static char int2char(int a){return hexCode.charAt(a);}// Base64转16进制public static String b64tohex(String s) {String ret = "";int k = 0;int slop = 0;for(int i = 0; i < s.length(); ++i) {if(s.charAt(i) == b64pad) break;int v = b64map.indexOf(s.charAt(i));if(v < 0) continue;if(k == 0) {ret += int2char(v >> 2);slop = v & 3;k = 1;}else if(k == 1) {ret += int2char((slop << 2) | (v >> 4));slop = v & 0xf;k = 2;}else if(k == 2) {ret += int2char(slop);ret += int2char(v >> 2);slop = v & 3;k = 3;}else {ret += int2char((slop << 2) | (v >> 4));ret += int2char(v & 0xf);k = 0;}}if(k == 1)ret += int2char(slop << 2);return ret;}// 16进制转Base64public static String hex2b64(String h) {int i , c;StringBuilder ret = new StringBuilder();for(i = 0; i+3 <= h.length(); i+=3) {c = parseInt(h.substring(i,i+3),16);ret.append(b64map.charAt(c >> 6));ret.append(b64map.charAt(c & 63));}if(i+1 == h.length()) {c = parseInt(h.substring(i,i+1),16);ret.append(b64map.charAt(c << 2));}else if(i+2 == h.length()) {c = parseInt(h.substring(i,i+2),16);ret.append(b64map.charAt(c >> 2));ret.append(b64map.charAt((c & 3) << 4));}while((ret.length() & 3) > 0) ret.append(b64pad);return ret.toString();}}

import java.math.BigInteger;import java.util.Random;public class RSAEncoder {private static BigInteger n = null;private static BigInteger e = null;public static String RSAEncrypt(String pwd, String nStr, String eStr){n = new BigInteger(nStr,16);e = new BigInteger(eStr,16);BigInteger r = RSADoPublic(pkcs1pad2(pwd,(n.bitLength()+7)>>3));String sp = r.toString(16);if((sp.length()&1) != 0 )sp = "0" + sp;return sp;}private static BigInteger RSADoPublic(BigInteger x){return x.modPow(e, n);}private static BigInteger pkcs1pad2(String s, int n){if(n < s.length() + 11) {// TODO: fix for utf-8System.err.println("Message too long for RSAEncoder");return null;}byte[] ba = new byte[n];int i = s.length()-1;while(i >= 0 && n > 0) {int c = s.codePointAt(i--);if(c < 128) {// encode using utf-8ba[--n] = new Byte(String.valueOf(c));}else if((c > 127) && (c < 2048)) {ba[--n] = new Byte(String.valueOf((c & 63) | 128));ba[--n] = new Byte(String.valueOf((c >> 6) | 192));} else {ba[--n] = new Byte(String.valueOf((c & 63) | 128));ba[--n] = new Byte(String.valueOf(((c >> 6) & 63) | 128));ba[--n] = new Byte(String.valueOf((c >> 12) | 224));}}ba[--n] = new Byte("0");byte[] temp = new byte[1];Random rdm = new Random(47L);while(n > 2) {// random non-zero padtemp[0] = new Byte("0");while(temp[0] == 0)rdm.nextBytes(temp);ba[--n] = temp[0];}ba[--n] = 2;ba[--n] = 0;return new BigInteger(ba);}}

四、爬取成绩和课表信息

终于进入到了教务系统界面,接下来就是爬取成绩和课表信息,然后在自己写的APP中进行展示,效果如下:

不过我们还是得一步步来:

获取成绩信息。与前面类似,也需要提交表单数据,过程一模一样,需要提交哪些数据可以参照这篇博文:爬虫时怎么查看需要提交哪些表单数据?

这里直接上代码:

// 获取成绩信息public void getStudentGrade(int year , int term) throws Exception {Map<String,String> datas = new HashMap<>();datas.put("xnm",String.valueOf(year));datas.put("xqm",String.valueOf(term * term * 3));datas.put("_search","false");datas.put("nd",String.valueOf(new Date().getTime()));datas.put("queryModel.showCount","80");datas.put("queryModel.currentPage","1");datas.put("queryModel.sortName","");datas.put("queryModel.sortOrder","asc");datas.put("queryModel.sortName","");datas.put("time","0");System.out.println(datas);connection = Jsoup.connect(url+ "/jwglxt/cjcx/cjcx_cxDgXscj.html?gnmkdm=N305005&layout=default&su=" + stuNum);connection.header("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/0101 Firefox/29.0");response = connection.cookies(cookies_innet).cookies(cookies).method(Connection.Method.POST).data(datas).ignoreContentType(true).execute();connection = Jsoup.connect(url+ "/jwglxt/cjcx/cjcx_cxDgXscj.html?doType=query&gnmkdm=N305005");connection.header("User-Agent", "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/0101 Firefox/29.0");response = connection.cookies(cookies_innet).cookies(cookies).method(Connection.Method.POST).data(datas).ignoreContentType(true).execute();System.out.println(response.body());JSONObject jsonObject = JSON.parseObject(response.body());//System.out.println(jsonObject);JSONArray gradeTable = JSON.parseArray(jsonObject.getString("items"));//System.out.println(gradeTable);for (Iterator iterator = gradeTable.iterator(); iterator.hasNext();) {JSONObject lesson = (JSONObject) iterator.next();System.out.println(lesson.getString("kcmc") + " " +lesson.getString("jsxm") + " " +lesson.getString("bfzcj") + " " +lesson.getString("jd") + " " +lesson.getString("kcxzmc"));}}

有一点需要注意:提交参数中的showCount最好大一点,因为我们默认只爬取了第一页的数据,在第一页显示所有成绩信息才能一次性爬取完。

成绩信息展示

安卓ExpandableListView的详细使用教程(附代码解析过程)

参考文章

这些文章都是我自己写的,算是对前面零散知识点的一点总结吧:

JSoup模拟登录网站(以校园内网为例)JSoup利用获得的cookies访问该网页中的其它链接爬虫时怎么查看需要提交哪些表单数据?JSoup携带cookies连续跳转登录多个界面Java爬虫简单判断是否模拟登录成功(以JSoup为例)安卓ExpandableListView的详细使用教程(附代码解析过程)

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。