利用innoextract实现纯真ip库解包自动更新替换

聿聿
python
2024-02-05
391热度
0评论

一.说在前面

自己搭建了一个ip地址的api接口方面自己的项目调用，ip地址查询使用的是离线版的纯真ip地址库，奈何现在每次只发布exe的安装包，每次都需要解压再提取新的ip数据库，网上搜了一些自动更新的文章，发现已经过时了，纯真现在已经不再直接提供数据库的下载地址了。

嘿，你猜怎么着，巧了，今天看github不再需要代理就能浏览内容了，就多看了一会，看到了两个文章直接给我指明了方向，狠狠地star了，项目传送门：

FW27623/qqwry: 纯真IP数据库，每天自动抓取微信公众号推文发布的最新链接进行更新。 (github.com)

dscharrer/innoextract: A tool to unpack installers created by Inno Setup (github.com)

Tips:在CentOS 7.9.2009 x86_64(Py3.7.9)实现

二.思路

这个项目使用了GitHub Actions进行自动化采集纯真ip实验室微信公众号发布的最新ip地址库的网址，并进行自动化下载。

微信公众号数据获取倒是没什么，一个python脚本，但是他这个对获取到的 exe 文件进行解包操作涨知识了。

可以看到使用了dscharrer/innoextract: A tool to unpack installers created by Inno Setup (github.com)

这个项目实现了解压缩由 Inno Setup 创建的安装程序，直接解决了困扰我很久的问题。

1.实现过程

接下来看怎么调用这个工具进行解包，找到GitHub Actions的工作流程文件.github/workflows/qqwry.yml

什么？太多了！懒得梳理（其实是太菜了）没关系，直接问ChatGPT

根据gpt的回答可以知道我们需要的就是9这个步骤：

2.下载和调用innoextract

- name: Download innoextract
        if: steps.cache-innoextract.outputs.cache-hit != 'true'
        run: |
          wget https://github.com/dscharrer/innoextract/releases/download/${{ env.innoextract_version }}/innoextract-${{ env.innoextract_version }}-linux.tar.xz
          tar -xvf innoextract-${{ env.innoextract_version }}-linux.tar.xz innoextract-${{ env.innoextract_version }}-linux/bin/${{ env.arch }}/innoextract --strip-components

直接去项目地址下载对应的包

Release innoextract 1.9 · dscharrer/innoextract (github.com)

下载完成后，执行tar -xvf innoextract-1.9-linux.tar.xz

调用更简单了：

      - name: Get qqwry.dat
        id: dat_date
        shell: pwsh
        run: |
          $zip = (Get-ChildItem Downloads\*.zip)[0].FullName
          $filename = (Get-ChildItem Downloads\*.zip)[0].Name
          $name = $filename.Substring($filename.IndexOf('-') + 1 , $filename.length - $filename.IndexOf('-') - 5)
          echo "dat_date=$name" | Out-File -FilePath $env:GITHUB_ENV
          7z e -y $zip setup.exe
          ./innoextract setup.exe -I qqwry.dat

./innoextract setup.exe -I qqwry.dat

这一句就可以调用了

至此，就把核心内容扒下来据为己有了。

3.代码书写

获取公众号推文发布的ip库下载链接直接用项目作者写的即可

我们只需要实现用python调用innoextract进行解包即可

subprocess.run(['./innoextract', 'setup.exe', '-I', 'qqwry.dat'], check=True)

这是关键代码

代码分享：

import re
import json
import requests
from bs4 import BeautifulSoup
import zipfile
import subprocess
import os
import hashlib
import shutil

def get_link(url):
    headers = {
        'Accept-Language': 'zh-CN,zh;q=0.9,en-CN;q=0.8,en;q=0.7,zh-TW;q=0.6',
        'Cookie': 'rewardsn=; wxtokenkey=777',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36'
    }

    # 访问链接并从json中提取微信推文链接
    response = requests.get(url, headers=headers)
    data = json.loads(response.text)
    link = data['getalbum_resp']['article_list'][0]['url']
    return link

def get_zip_url(link):
    # 访问微信推文链接并解析网页
    response = requests.get(link)
    soup = BeautifulSoup(response.text, 'html.parser')

    # 提取文本中的zip链接，正则匹配以https://开头以.zip后缀的链接
    content = soup.find('div', {'id': 'js_content'}).get_text()
    zip_url = re.findall(r'https://.*?\.zip', content)

    return zip_url

def compare_and_replace(file1, file2):
    hash1 = hashlib.md5()
    hash2 = hashlib.md5()

    with open(file1, 'rb') as f1, open(file2, 'rb') as f2:
        for chunk in iter(lambda: f1.read(4096), b''):
            hash1.update(chunk)
        for chunk in iter(lambda: f2.read(4096), b''):
            hash2.update(chunk)

    if hash1.hexdigest() != hash2.hexdigest():
        shutil.copy2(file1, file2)
        print("文件已替换")
    else:
        print("文件哈希值一致，无需替换")

if __name__ == '__main__':
    # 从微信推文json数据中获得最新一期IP库的发布文章链接
    url = 'https://mp.weixin.qq.com/mp/appmsgalbum?__biz=Mzg3Mzc0NTA3NA==&action=getalbum&album_id=2329805780276838401&f=json'

    try:
        link = get_link(url)
        if link:
            zip_url = get_zip_url(link)
            if zip_url:
                print(zip_url[0])
            else:
                print("没有找到zip链接")
        else:
            print("没有找到微信推文链接")
    except Exception as e:
        print("出现错误：", e)
    try:
        save_path = "/www/wwwroot/innoextract/cz/setup.zip"
        response = requests.get(zip_url[0])
        with open(save_path, "wb") as file:
            file.write(response.content)
        extract_path = "/www/wwwroot/innoextract/innoextract-1.9-linux/"  # 解压后的目标路径
        with zipfile.ZipFile(save_path, 'r') as zip_ref:
            zip_ref.extractall(extract_path)
        # 切换到目标目录
        os.chdir('/www/wwwroot/innoextract/innoextract-1.9-linux')

        # 检查目录是否存在
        if not os.path.exists('./innoextract'):
            print("Error: './innoextract' file or directory not found.")
        else:
            # 执行命令
            subprocess.run(['./innoextract', 'setup.exe', '-I', 'qqwry.dat'], check=True)
    except Exception as e:
        print("Error:", e)
    try:
        file1 = "xxx"
        file2 = "xxx"
        file3 = "xxxx"
        compare_and_replace(file1, file2)
        compare_and_replace(file1, file3)
    except Exception as e:
        print("Error:", e)

三.最终效果

通过对接宝塔面板的计划任务，每天18点自动运行，实现全自动更新：

访问api接口可以发现ip库替换成功，大功告成！