100字范文,内容丰富有趣,生活中的好帮手!
100字范文 > java爬虫破解滑块验证码

java爬虫破解滑块验证码

时间:2018-11-15 19:11:35

相关推荐

java爬虫破解滑块验证码

使用技术:java+Selenium

废话:

有爬虫,自然就有反爬虫,就像病毒和杀毒软件一样,有攻就有防,两者彼此推进发展。而目前最流行的反爬技术验证码,为了防止爬虫自动注册,批量生成垃圾账号,几乎所有网站的注册页面都会用到验证码技术。其实验证码的英文为 CAPTCHA(Completely Automated Public Turing test to tell Computers and Humans Apart),翻译成中文就是全自动区分计算机和人类的公开图灵测试,它是一种可以区分用户是计算机还是人的测试,只要能通过 CAPTCHA 测试,该用户就可以被认为是人类。由此也可知道破解滑块验证码的关键即是让计算机更好的模拟人的行为

破解无缺口滑块

无缺口滑块如下图:

滑块代码:

<!DOCTYPE html><html><head><meta charset="utf-8"><meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate"><meta http-equiv="Pragma" content="no-cache"><meta http-equiv="Expires" content="0"><meta http-equiv="X-UA-Compatible" content="IE-Edge,chrome=1"><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,minimum-scale=1,user-scalable=no"><meta content="yes" name="apple-mobile-web-app-capable"><meta content="black" name="apple-mobile-web-app-status-bar-style"><meta content="telephone=no" name="format-detection"><meta content="email=no" name="format-detection"><title>拖动滑块验证</title><meta name="description" content=""><meta name="keywords" content=""><link rel="stylesheet" type="text/css" href=""><style>* {margin: 0;padding: 0;}body {font: 12px/1.125 Microsoft YaHei;background: #fff;}ul, li {list-style: none;}a {text-decoration: none;}.ani {transition: all .3s;}.wrap {width: 300px;height: 350px;text-align: center;margin: 150px auto;}.inner {padding: 15px;}.clearfix {overflow: hidden;_zoom: 1;}.none {display: none;}#slider {position: relative;background-color: #e8e8e8;width: 300px;height: 34px;line-height: 34px;text-align: center;}#slider .handler {position: absolute;top: 0px;left: 0px;width: 40px;height: 32px;border: 1px solid #ccc;cursor: move;}.handler_bg {background: #fff url("") no-repeat center;}.handler_ok_bg {background: #fff url("") no-repeat center;}#slider .drag_bg {background-color: #7ac23c;height: 34px;width: 0px;}#slider .drag_text {position: absolute;top: 0px;width: 300px;-moz-user-select: none;-webkit-user-select: none;user-select: none;-o-user-select: none;-ms-user-select: none;}.unselect {-moz-user-select: none;-webkit-user-select: none;-ms-user-select: none;}.slide_ok {color: #fff;}</style></head><body><div class="wrap"><div id="slider"><div class="drag_bg"></div><div class="drag_text" onselectstart="return false;" unselectable="on">拖动滑块验证</div><div class="handler handler_bg"></div></div></div><script>(function (window, document, undefined) {var dog = {//声明一个命名空间,或者称为对象$: function (id) {return document.querySelector(id);},on: function (el, type, handler) {el.addEventListener(type, handler, false);},off: function (el, type, handler) {el.removeEventListener(type, handler, false);}};//封装一个滑块类function Slider() {var args = arguments[0];for (var i in args) {this[i] = args[i]; //一种快捷的初始化配置}//直接进行函数初始化,表示生成实例对象就会执行初始化this.init();}Slider.prototype = {constructor: Slider,init: function () {this.getDom();this.dragBar(this.handler);},getDom: function () {this.slider = dog.$('#' + this.id);this.handler = dog.$('.handler');this.bg = dog.$('.drag_bg');},dragBar: function (handler) {var that = this,startX = 0,lastX = 0,doc = document,width = this.slider.offsetWidth,max = width - handler.offsetWidth,drag = {down: function (e) {var e = e || window.event;that.slider.classList.add('unselect');startX = e.clientX - handler.offsetLeft;console.log('startX: ' + startX + ' px');dog.on(doc, 'mousemove', drag.move);dog.on(doc, 'mouseup', drag.up);return false;},move: function (e) {var e = e || window.event;lastX = e.clientX - startX;lastX = Math.max(0, Math.min(max, lastX)); //这一步表示距离大于0小于max,巧妙写法console.log('lastX: ' + lastX + ' px');if (lastX >= max) {handler.classList.add('handler_ok_bg');that.slider.classList.add('slide_ok');dog.off(handler, 'mousedown', drag.down);drag.up();}that.bg.style.width = lastX + 'px';handler.style.left = lastX + 'px';},up: function (e) {var e = e || window.event;that.slider.classList.remove('unselect');if (lastX < width) {that.bg.classList.add('ani');handler.classList.add('ani');that.bg.style.width = 0;handler.style.left = 0;setTimeout(function () {that.bg.classList.remove('ani');handler.classList.remove('ani');}, 300);}dog.off(doc, 'mousemove', drag.move);dog.off(doc, 'mouseup', drag.up);}};dog.on(handler, 'mousedown', drag.down);}};window.S = window.Slider = Slider;})(window, document);var defaults = {id: 'slider'};new S(defaults);</script></body></html>

分析

1.查看滑块按钮大小

2.查看滑块大小

从上面2张图得出拖动距离为(300-40)px

爬虫代码

public static void main(String[] args) throws Exception {System.setProperty("webdriver.chrome.driver","D:\\demo\\selenumDemo\\src\\main\\resources\\chromedriver.exe");WebDriver driver = new ChromeDriver();try {driver.get("file:///C:/Users/Administrator/Desktop/index.html");WebElement Slider = driver.findElement(By.cssSelector(".handler.handler_bg"));// 拿到滑块按钮Thread.sleep(2000L);// 实例化鼠标操作对象ActionsActions action = new Actions(driver);action.dragAndDropBy(Slider,260,0).perform();// 移动一定位置Thread.sleep(5000L);} catch (InterruptedException e) {e.printStackTrace();}finally{// driver.close();// 关闭页面driver.quit();// 释放资源}}

注意:有的网站拖完后可能验证成功,有的可能失败,失败的童鞋也不要慌张,因为被网站检测出你用的是爬虫操作的,我有妙计!接着往下看!

先分分析一波!1.使用驱动打开浏览器

public static void openChrome(){System.setProperty("webdriver.chrome.driver","D:\\demo\\selenumDemo\\src\\main\\resources\\chromedriver.exe");// 1.打开Chrome浏览器chromeDriver = new ChromeDriver();chromeDriver.get("url...");}

2.然后 f12打开console控制台输入:window.navigator.webdriver

发现值是true,但是我们正常手动打开浏览器他却是false或者undefined,如下图

所以得出结论网站通过代码获取这个参数,返回值undefined或者false是正常浏览器,返回true说明用的是Selenium模拟浏览器,所以解决还是要从驱动浏览器解决,在启动Chromedriver之前,来隐藏它

public static void openChrome(){// 隐藏 window.navigator.webdriverChromeOptions option = new ChromeOptions();option.setExperimentalOption("useAutomationExtension", false);option.setExperimentalOption("excludeSwitches", Lists.newArrayList("enable-automation"));option.addArguments("--disable-blink-features=AutomationControlled");//主要是这句是关键System.setProperty("webdriver.chrome.driver","D:\\demo\\selenumDemo\\src\\main\\resources\\chromedriver.exe");// 1.打开Chrome浏览器chromeDriver = new ChromeDriver(option);chromeDriver.get("URL...");}

然后再次启动查看就变成了false

破解缺口滑块

缺口滑块如下图:

分析

我拿某网站的滑块源代码来分析,如下图可以看出缺口滑块图是由canvas绘制的。

1.我们要做的是找到缺口的X坐标,所以需要拿到完整图片和缺口图片进行计算,但是我们只能看见一张缺口图片,但是我们只要在canvas的css加一行代码style="display:none"

然后再看就出现了没有拼图阻挡的缺口图

2.然后在下面的canvas 修改style="display:block"就可以看到完整图片如下下图

然后再看发现看到了完整的图

3.然后使用selenium的截图方法,把原图和缺口图保存下来,然后再拿着像素对比可以算出按钮位置与缺口X坐标

爬虫代码

public class ElementLocate {private static ChromeDriver chromeDriver;public static void main(String[] args) throws InterruptedException, IOException {openChrome();// 打开浏览器等操作try {chromeDriver.manage().window().maximize();// 浏览器最大化// 等待滑块加载完毕new WebDriverWait(chromeDriver, 5).until(ExpectedConditions.visibilityOfElementLocated(By.xpath("//div[@aria-label='点击按钮进行验证']")));// 点开滑块chromeDriver.findElementByXPath("//div[@aria-label='点击按钮进行验证']").click();// 点开验证框operateSlider();// 操作滑块} finally {chromeDriver.quit();//测试完要停止 不然卡成球}}private static void openChrome() {// 配置浏览器ChromeOptions option = new ChromeOptions();option.setExperimentalOption("useAutomationExtension", false);option.setExperimentalOption("excludeSwitches", Lists.newArrayList("enable-automation"));option.addArguments("--disable-blink-features=AutomationControlled");//主要是这句是关键,防止网站js检测出爬虫// set浏览器驱动System.setProperty("webdriver.chrome.driver", "D:\\demo\\selenumDemo\\src\\main\\resources\\chromedriver.exe");// 打开Chrome浏览器chromeDriver = new ChromeDriver(option);// 访问百度chromeDriver.get("/login?lgtype=1&waytype=603&fromurl=https%3A%2F%%2F");}// 操作元素属性private static void setAttribute(WebDriver driver, WebElement element, String attributeName, String value) {JavascriptExecutor js = (JavascriptExecutor) driver;js.executeScript("arguments[0].setAttribute('" + attributeName + "', '" + value + "')", element);}//删除元素属性private void removeAttribute(WebDriver driver, WebElement element, String attributeName) {JavascriptExecutor js = (JavascriptExecutor) driver;js.executeScript("argument[0].removeAttribute(argumentp[1]),argument[2]", element, attributeName);}// 截图private static File captureElement(File screenshot, WebElement element) {try {BufferedImage img = ImageIO.read(screenshot);int width = element.getSize().getWidth();int height = element.getSize().getHeight();//获取指定元素的坐标Point point = element.getLocation();//从元素左上角坐标开始,按照元素的高宽对img进行裁剪为符合需要的图片BufferedImage dest = img.getSubimage(point.getX(), point.getY(), width, height);ImageIO.write(dest, "png", screenshot);} catch (IOException e) {e.printStackTrace();}return screenshot;}// 操作滑块private static void operateSlider() throws InterruptedException, IOException {Thread.sleep(1000);// 重复获取元素必须sleep,否则会报错!//修改元素属性,显示缺口滑块图,这里需要等图片加载出来,如果网络慢没加载出来会报错WebElement que1 = chromeDriver.findElementByXPath("//div[@class='geetest_slicebg geetest_absolute']/canvas[@class='geetest_canvas_slice geetest_absolute']");setAttribute(chromeDriver, que1, "style", "display:none");// 截图滑块缺口图片WebElement quekou = chromeDriver.findElementByXPath("//canvas[@class='geetest_canvas_bg geetest_absolute']");File src = chromeDriver.getScreenshotAs(OutputType.FILE);FileUtils.copyFile(src, new File("D:\\result.png"));FileUtils.copyFile(captureElement(src, quekou), new File("D:\\test.png"));// 修改元素属性,显示完整滑块图WebElement que2 = chromeDriver.findElementByXPath("//canvas[@class='geetest_canvas_fullbg geetest_fade geetest_absolute']");setAttribute(chromeDriver, que2, "style", "display:block");// 截图滑块完整图WebElement wanzheng = chromeDriver.findElementByXPath("//canvas[@class='geetest_canvas_bg geetest_absolute']");File src2 = chromeDriver.getScreenshotAs(OutputType.FILE);FileUtils.copyFile(src2, new File("D:\\result1.png"));FileUtils.copyFile(captureElement(src2, wanzheng), new File("D:\\test1.png"));// 还原滑块WebElement huanyuan1 = chromeDriver.findElementByXPath("//canvas[@class='geetest_canvas_fullbg geetest_fade geetest_absolute']");setAttribute(chromeDriver, huanyuan1, "style", "display:none");WebElement huanyuan2 = chromeDriver.findElementByXPath("//canvas[@class='geetest_canvas_slice geetest_absolute']");setAttribute(chromeDriver, huanyuan2, "style", "display:block");// 计算缺口滑块图和完整滑块图者差距,5为滑块按钮和滑块图左边的差5pxint moveDistance = getMoveDistance() - 5;// 拿到滑块按钮WebElement btn = chromeDriver.findElementByXPath("//div[@class='geetest_slider_button']");// 拿到鼠标操作,实例化ActionsActions actions = new Actions(chromeDriver);// 把滑块->缺口距离分成多份int[] nums = split(moveDistance);// 移动滑块按钮Random random = new Random();String time = "35";for (int i = 0; i < nums.length; i++) {actions.clickAndHold(btn).moveByOffset(nums[i], 0).build().perform();int times = Integer.parseInt(time + random.nextInt(10));Thread.sleep(times);}// 模拟人操作actions.clickAndHold(btn).moveByOffset(-1, 0).release().build().perform();Thread.sleep(3000);// 滑块完成等待2秒判断是否验证成功// 是否滑块成功String attribute = chromeDriver.findElementByXPath("//div[@class='geetest_radar_tip']").getAttribute("aria-label");System.out.println("attribute = " + attribute);if (attribute.equals("网络不给力") ) {chromeDriver.findElementByXPath("//div[@class='geetest_radar_tip']").click();// 再次滑块operateSlider();}}// 整数拆分private static int[] split(int num) {int[] nums = new int[5];Random rand = new Random();for (int i = 0; i < nums.length - 1; i++) {nums[i] = rand.nextInt(num);num -= nums[i];}nums[nums.length - 1] = num;return nums;}}

注意:滑块按钮滑到指定区域,可能会出现滑块被吃掉的情况!这是因为被判定为机器操作,所以要尽量模拟出人的速度滑一定的距离停止n毫秒,经过我不断的调试,这样可以减少被误判的几率。成功率在80%左右。

这是小编在开发学习使用和总结的小Demo, 这中间或许也存在着不足,希望可以得到大家的理解和建议。如有侵权联系小编!

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。