目录扫描脚本

学习

一直想搞一个批量扫描工具，最近试着写一下扫描struts2，所以打算自己写一个于是就有了这个demo。

我的demo

这次不再用urllib2了，因为301，302自动跳转问题有些蛋疼 -、-，所以用了requests，第一次用，我语法都没学完，直接写到昨晚2点多，因为自定义404的问题实在是不好判断，所以就找了些资料，思路大概也就是访问一个根本不可能存在的路径，和原扫描路径做比较，如果相似度大于百分之80，就直接舍弃。

1	if r.status_code == 200 and l1 != l2 and 'type' in r.text and difflib.SequenceMatcher(None, r2.text,r.text).ratio() < 0.8

我使用了多个判断条件，首先是状态码，文件长度，type in 是为了判断是否是html，最后的那个是判断文本相似度。  
完整的模块代码如下：


def action_scan(url):
    dirs = ["index.action","home.action","login.action","main.action","homepage.action"]
    for i in range(5):
	furl = 'http://' + url + ':8080/' + dirs[i]
        try:
	    r = requests.get(furl,allow_redirects = False,timeout=3)
            l1 = len(r.text)
            r2 = requests.get(furl+'233323332333',allow_redirects = False,timeout=3)
            l2 = len(r2.text)
            if r.status_code == 200 and l1 != l2 and 'type' in r.text and  difflib.SequenceMatcher(None, r2.text,r.text).ratio() < 0.8:
               
                print furl + '       is ok'
                file_object = open('ok.txt', 'a')
       	        file_object.write(furl + '\n')
	        file_object.close()
            else:
                print furl + '                                       is not ok'
            
        except :
            pass

反思

其实如果我只扫struts2的话，只要有一个可以判断的指纹就可以了，搞得这么复杂，我也是醉了，不过这个demo稍微改一下就可以变成一个web目录扫描了，至少它扫描struts2准确性很高了

接下来打算学习线程，顺便把语法补上来，否则根基太渣，容易垮掉啊。