基于 GitHub Actions 的 Uptime 监控

注意：这篇文章上次更新于333天前，文章内容可能已经过时。

起因

前段时间被封了一个圣何塞的甲骨文云账号，上面我开了一个 4C 24G 的 ARM 实例，跑了很多东西，几乎所有以 dijk.eu.org 结尾的域名都在上面跑着。

当然，就包括博客的图片服务器。

这下好了，博客的图片全都挂了。

然后我把博客的图片服务器指定成了家里的服务器。优点是国内访问的速度快，缺点是家里的服务器不稳定。

所以，一个自然的需求就是监控家里的服务器，看看它是否在线。

但是，如果监控服务器的服务器本身都不够稳定，那监控就失去意义了。（没错，我说的就是白嫖的服务器。）

又免费，又相对稳定的服务，我熟悉的就只有 GitHub Actions 了。

效果图

https://wang-guangxin.github.io/sites

代码

想要无限制的使用 GitHub Actions, 你需要一个公开的仓库。

所以我不得不把这套方案公开，尽管它实际上无比丑陋。

https://github.com/WANG-Guangxin/wang-guangxin.github.io

整体结构

这个图其实已经描述的比较清楚了，GitHub Actions 负责执行定时任务，然后 Shell 脚本负责执行 hexo 命令来生成静态页面，Python 脚本负责检查网站是否在线，然后根据结果来生成 Markdown 文件，以提供给 Hexo 来渲染。

下面就一一介绍这三部分。

GitHub Actions

https://github.com/WANG-Guangxin/wang-guangxin.github.io/blob/master/.github/workflows/main.yml


name: Deploy Hexo Site

on:
  push:
    branches:
      - master # Set a branch to trigger deployment
  schedule:
    - cron: '*/15 * * * *' # 每 15 分钟执行一次
jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    steps:
    - name: Checkout
      uses: actions/checkout@v2
      with:
        # This depth parameter is optional - fetching the full history can improve the accuracy of the GH Pages deployment action's change detection.
        # However, it can also increase the time required for the checkout step to complete.
        fetch-depth: 0

    - name: Use Node.js
      uses: actions/setup-node@v2
      with:
        node-version: '18' # Specify your Node.js version here

    - name: Install Dependencies
      run: npm install

    - name: Install Hexo
      run: npm install -g hexo-cli

    - name: Build-0
      run: bash ./build.sh
      env:
        notice_host_server: ${{ secrets.notice_host_server }}
        notice_user: ${{ secrets.notice_user }}
        notice_pwd: ${{ secrets.notice_pwd }}
        notice_mail: ${{ secrets.notice_mail }}
        notice_receiver: ${{ secrets.notice_receiver }}

    - name: Deploying-0
      uses: peaceiris/actions-gh-pages@v3
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        publish_dir: ./public

    - name: Build-1
      run: sleep 300 && bash ./build.sh
      env:
        notice_host_server: ${{ secrets.notice_host_server }}
        notice_user: ${{ secrets.notice_user }}
        notice_pwd: ${{ secrets.notice_pwd }}
        notice_mail: ${{ secrets.notice_mail }}
        notice_receiver: ${{ secrets.notice_receiver }}

    - name: Deploying-1
      uses: peaceiris/actions-gh-pages@v3
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        publish_dir: ./public

    - name: Build-2
      run: sleep 300 && bash ./build.sh
      env:
        notice_host_server: ${{ secrets.notice_host_server }}
        notice_user: ${{ secrets.notice_user }}
        notice_pwd: ${{ secrets.notice_pwd }}
        notice_mail: ${{ secrets.notice_mail }}
        notice_receiver: ${{ secrets.notice_receiver }}

    - name: Deploying-2
      uses: peaceiris/actions-gh-pages@v3
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        publish_dir: ./public

    - name: Commit changes
      uses: stefanzweifel/git-auto-commit-action@v4
      with:
        commit_message: Auto Commit
        branch: ${{ github.head_ref }}
        file_pattern: |
          data.csv
          siteenv

简单解释一下，为什么这个定时任务设置的是每 15 分钟执行一次，而不是每 5 分钟执行一次。

这是因为 GitHub Actions 的定时任务执行时间不准确，基本都会又延迟，如果设置的是每 5 分钟执行一次，那么实际上会有很多次执行是在 5 分钟之后的，甚至 10 分钟之后的。

另外，在 Actions 的条款里，对每次 Actions 的执行时长限制是 30 分钟。

为了遵守条款的同时让这个监控更加稳定，我采取的方案是每 15 分钟执行一次 Actions，但每次 Actions 会部署 3 次 Hexo 网站，每次间隔 5 分钟，这样操作相对来说执行频率更高一些。

负责生成 Hexo 网站的脚本是 build.sh，这里接收了 5 个环境变量，用于配置邮件通知。

- name: Build-0
  run: bash ./build.sh
  env:
    notice_host_server: ${{ secrets.notice_host_server }}
    notice_user: ${{ secrets.notice_user }}
    notice_pwd: ${{ secrets.notice_pwd }}
    notice_mail: ${{ secrets.notice_mail }}
    notice_receiver: ${{ secrets.notice_receiver }}

Shell 脚本

https://github.com/WANG-Guangxin/wang-guangxin.github.io/blob/master/build.sh

hexo clean # 清理 Hexo 缓存
python3 -m pip install --upgrade pip # 更新 pip
pip install -r requirements.txt # 安装 Python 依赖
python3 uptime.py # 执行 Python 脚本 -- 检查网站是否在线 -- 生成 siteenv 文件
source siteenv # 加载 siteenv 文件
cat siteenv # 打印 siteenv 文件
envsubst < "./template_index.md" > "./source/sites/index.md" # 替换模板文件中的变量
cat ./source/sites/index.md # 打印生成的 Markdown 文件
hexo generate # 生成 Hexo 网站

解释一下 envsubst 这个命令，它的作用是替换模板文件中的变量。

https://github.com/WANG-Guangxin/wang-guangxin.github.io/blob/master/template_index.md

模板 template_index.md 文件中有很多shell变量，这些变量会在 siteenv 文件中被定义。

执行 envsubst < "./template_index.md" > "./source/sites/index.md" 这个命令，就会把 template_index.md 文件中的变量替换成 siteenv 文件中的变量的值，然后生成 index.md 文件。

siteenv 文件的内容由 Python 脚本生成。

https://github.com/WANG-Guangxin/wang-guangxin.github.io/blob/master/siteenv

Python 脚本

https://github.com/WANG-Guangxin/wang-guangxin.github.io/blob/master/uptime.py

这里是整个 Uptime 监控的核心代码了。

全局变量总共四个，分别是 g_config, g_data_file, g_data_list, g_notice_enable。


g_config = {
    "https://wgxls.site": 
    {
        "status": "STATUS_WGXLS_SITE='",
        "uptime7d": "WGXLS_SITE_UP_7='",
        "uptime24h": "WGXLS_SITE_UP_24='",
        "ssl": "WGXLS_SITE_SSL='",
    },
    "https://opengrok.dijk.eu.org":
    {
        "status": "STATUS_OPENGROK_DIJK_EU_ORG='",
        "uptime7d": "OPENGROK_DIJK_EU_ORG_UP_7='",
        "uptime24h": "OPENGROK_DIJK_EU_ORG_UP_24='",
        "ssl": "OPENGROK_DIJK_EU_ORG_SSL='"
    }
}

g_data_file = 'data.csv'
g_data_list = []
g_data_list.append([])
g_notice_enable = True

g_config 是一个字典，存储了需要监控的网站的信息。这里面的 key 是网站的域名，value 是一个字典，存储了网站的状态、7 天的在线时间、24 小时的在线时间、是否启用 SSL。

它的Value的value是一个字符串，这个字符串是为了生成一个可以被 Linux Shell 执行 source 的文件。

g_config 是需要和 template_index.md 文件中的变量对应的。

g_data_file 是一个文件名，用于把监控的数据持久化到磁盘，这样我每一执行 Python 脚本时从 g_data_file 中读取以往的数据，来计算 7 天的在线时间和 24 小时的在线时间。

g_data_list 是一个列表，用于在内存中存储监控的数据。

g_notice_enable 是一个布尔值，用于控制是否发送邮件通知。

监控的逻辑被我写成了一个纯面向过程的逻辑。

def main():
    read_csv_to_list() # 读取 data.csv 文件到 g_data_list
    remove_data_before_seven_days() # 删除 7 天之前的数据 
    for key, value in g_config.items(): # 遍历 g_config
        check_url(key) # 检查网站是否在线 数据存储在 g_data_list 同时更新 g_config
    calc_uptime() # 根据 g_data_list 计算 7 天的在线时间和 24 小时的在线时间 同时更新 g_config
    write_list_to_csv() # 把 g_data_list 写入 data.csv
    write_env() # 根据 g_config 生成 siteenv 文件
    if g_notice_enable: # 如果 g_notice_enable 为 True
        do_notice() # 发送邮件通知