此爲本人學習Bash Script後寫的第一個腳本,實現從GitHub上下載單個文件的功能。

Update:於2016.11.02對代碼進行重構


Background

想在GitHub上下載某repo中的單個文件,一直沒有找到解決方法。Stack Overflow上有相關問題 [Download single files from GitHub],上面說什麼的都有,看的人一臉茫然。

找到 GitHub Developer中的Downloads,此API早已被廢棄,推薦使用Releases,又是一臉茫然。

後找到一篇 GitHub 上下载单文件 - GitHub Mate for Chrome,上面說點擊文件頁面上的Raw按鈕或在該按鈕上右擊選擇Save link as即可下載文件。個人測試了下,的確可以通過瀏覽器下載。

但個人想實現的是在GNU/Linux下通過命令行下載,就研究了Raw按鈕的鏈接。發現其與原始鏈接有細微不同。

linux_kernel_driverBooks/Understanding.The.Linux.kernel.3rd.Edition.pdf爲例

源地址: https://github.com/shihyu/linux_kernel_driver/blob/master/Books/Understanding.The.Linux.kernel.3rd.Edition.pdf

Raw地址: https://github.com/shihyu/linux_kernel_driver/raw/master/Books/Understanding.The.Linux.kernel.3rd.Edition.pdf

對比後發現原地址中的blob被替換成了raw,使用命令curl -O -L URLcurl -o /PATH/TO/SAVE_NAME -L URL命令可以下載。不過有一個問題,如果文件名含有特殊符號或中文漢字時,這些字符會被轉義,如文件名Atom編輯器安裝&配置.pdf,而非Atom%E7%B7%A8%E8%BC%AF%E5%99%A8%E5%AE%89%E8%A3%9D%26%E9%85%8D%E7%BD%AE.pdf

故本人想通過Bash Shell腳本實現如下功能:通過文件原地址下載並保存文件,保存的文件名沒有被轉義。

Personal Requests

  • 使用文件原地址(非raw地址),通過GNU/Linux命令行下載文件;
  • 文件名儘量不會被轉義;
  • 默認保存路徑在執行腳本用戶的家目錄下,如果家目錄不存在,則保存至該腳本同級目錄下。

Thinking

  • 通過變量$USER獲取執行該腳本的用戶名,在文件/etc/passwd中查找其家目錄
    • 如果家目錄不存在,則通過$(dirname $(readlink -f "$0"))獲取當前腳本絕對路徑SAVE_DIR
  • 將獲取到的原文件地址,進行轉換生成raw地址
  • 將獲取到的源地址,通過bash字符串處理轉換爲raw地址
  • 使用命令curl -s RAW_URL抓取頁面信息,通過管道|提取對應HTML標籤內未被轉義的文件名ORIGIN_NAME ```bash

    curl -s RAW_URL | grep -E -o ‘.*’ | sed -r ’s@@@g;s@@@g’

#優化後的命令 sed -r -n ’s@.(.).*@\1@p’

* 使用命令`curl -o /SAVE_DIR/ORIGIN_NAME -L RAW_URL`將文件保存到指定的目錄下

---
## Implementations

腳本下載 [GitHub](https://github.com/LempStacker/personalShellScriptCollection/blob/master/shellScripts/saveSingleFileFromGitHub.sh)

```bash
#!/bin/bash
# target: save file from GitHub with origin url by bash shell
# author: lempstacker
# date: 2016.01.12 01:26 Tue Asia/Shanghai
# update date:
# 2016.09.23 11:51 Fri Asia/Shanghai
# 2016.11.02 12:51 Wed Asia/Shanghai
# 2016.12.21 15:45 Wed Asia/Shanghai
# blog: https://lempstacker.com/

# font color
c_red='\e[31;1m'
c_blue='\e[34m'
c_end='\e[0m'

# 1.Check Internet Connection
! ping -q -w 1 -c 1 `ip route | sed -r -n '/default/s@.*via (.*) dev.*@\1@p'` &> /dev/null && printf "%s\n" "Error: No Internet Connection, Please Check It!" && exit 1

# 2. Receive File GitHub URL Link, Parameter $1 Is The 1st Parameter
if [[ -z "$1" ]]; then
    printf $c_red"Please Input A Specific File Url From GitHub, Bye!$c_end\n"
    exit 2
else
    originUrl="$1"
fi

# 3. Check URL Legal Or Not
# http://stackoverflow.com/questions/229551/string-contains-in-bash
if [[ ! "$originUrl" =~ "https://github.com/"* ]]; then
    printf $c_red"Please Input A Leagl GitHub Url, Bye!$c_end\n"
    exit 3
fi


# 4. Check Command CURL / WGET Exists Or Not
#Download Tool (curl, wget) Check
if command -v curl &> /dev/null; then
    flag='curl'
elif command -v wget &> /dev/null; then
    flag='wget'
else
    printf "Sorry: This Script Need $c_red%s$c_end OR $c_red%s$c_end!\n" "curl" "wget"
    unset c_red
    unset c_end
    exit 4
fi

# 4. Variables Setting
retryCount=6    # curl parameter
retryDelayTime=1    # curl parameter


# 5. Get Script Runner Home Dir, If Not Exists, Get The Script Absolute Path
#$USER exist && $SUDO_USER not exist, then use $USER
[[ -n $USER && -z $SUDO_USER ]] && nowUser="$USER" || nowUser="$SUDO_USER"
#Extract Script Runner's Home Dir
userHomeDir=`awk -v FS=':' '$1~/^'"$nowUser"'/{print $(NF-1)}' /etc/passwd`
#File Save Path
[[ ! -z "$userHomeDir" ]] && saveDir="$userHomeDir" || saveDir=$(dirname $(readlink -f "$0"))

printf "Hi $c_blue$nowUser$c_end, Thank For Using This Script!\n"


# 6. Extract File's Complete Name Via Origin URL
#File URL Transform
targetUrl=${originUrl//blob\/}
targetUrl=${targetUrl/github.com/raw.githubusercontent.com}

tempFile=`mktemp -t XXXXXX.txt`
printf "Begin To Extract File Complete Origin Name, Just Be Patient!\n"

extractFileNameFunc() {
    # curl -s --retry $retryCount --retry-delay $retryDelayTime "$originUrl" > "$tempFile"

    case "$flag" in
        curl )
            curl -s --retry $retryCount --retry-delay $retryDelayTime "$originUrl" -o "$tempFile"
            ;;
        wget )
            wget -q -t $retryCount --waitretry $retryDelayTime "$originUrl" -O "$tempFile"
            ;;
    esac

    tempName=$(sed -r -n 's@.*<strong class="final-path">(.*)</strong>.*@\1@p' "$tempFile")
    if [[ "$tempName" == '' ]]; then
        extractFileNameFunc
    else
        echo "$tempName"
    fi
}

rm -f "$tempFile"

originName=$(extractFileNameFunc)

printf "File Name You Wanna Download Is $c_red$originName$c_end.\n"


# 7. Download Target File
printf "Begin To Download File, Just be patient!\n"

fileSavePath="$saveDir/$originName"
[[ -f "$fileSavePath" ]] && rm -f "$fileSavePath"   #remove existing file with same name

downloadFileFunc() {
    curl -# --retry $retryCount --retry-delay $retryDelayTime -o "$fileSavePath" -L $targetUrl

    if [[ -f "$fileSavePath" ]]; then
        printf "Congratulations, File $c_red"$originName"$c_end Has Been Saved Under Directory $c_red"$saveDir"!$c_end\n"
    else
        printf $c_end"Sorry, Fail To Download, Try Again!$c_end\n"
        downloadFileFunc
    fi
}

downloadFileFunc


# 8. Unset Variables
unset originUrl
unset retryCount
unset retryDelayTime
unset nowUser
unset userHomeDir
unset saveDir
unset targetUrl
unset originName
unset fileSavePath

rm -f "$tempFile"

# Script End

Usage

命令格式bash BASH_NAME URL,如果URL中含有空格,須用雙引號""將URL包裹起來

Example1

[flying@lempstacker ~]$ bash /tmp/scripts.sh
Please Input A Specific File Url From GitHub, Bye!
[flying@lempstacker ~]$ bash /tmp/scripts.sh https://lempstacker.com/
Please Input A Leagl GitHub Url, Bye!
[flying@lempstacker ~]$

Example2

[flying@lempstacker ~]$ bash /tmp/scripts.sh "https://github.com/LempStacker/Qingtianjiedu-Blog-Backup/blob/master/origin/MariaDB%E5%A4%9A%E5%80%8B%E5%A4%96%E9%8D%B5%E9%80%B2%E8%A1%8C%E7%B4%9A%E8%81%AF%E6%9B%B4%E6%96%B0%E7%B4%9A%E8%81%AF%E5%88%AA%E9%99%A4.md"
Hi flying, Thank For Using This Script!
Begin To Extract File Complete Origin Name, Just Be Patient!
File Name You Wanna Download Is MariaDB多個外鍵進行級聯更新級聯刪除.md.
Begin To Download File, Just be patient!
######################################################################## 100.0%
Congratulations, File MariaDB多個外鍵進行級聯更新級聯刪除.md Has Been Saved Under Directory /home/flying!
[flying@lempstacker ~]$ ls -lh /home/flying/MariaDB多個外鍵進行級聯更新級聯刪除.md
-rw-rw-r-- 1 flying flying 19K Nov  2 13:58 /home/flying/MariaDB多個外鍵進行級聯更新級聯刪除.md
[flying@lempstacker ~]$

Example3

[flying@lempstacker ~]$ bash /tmp/scripts.sh "https://github.com/LempStacker/DatabaseRelated/blob/master/MariaDB/Backup/DocumentsBackup/PDF_Version/%E4%BD%BF%E7%94%A8sysbench%E5%B0%8DAWS%E4%B8%8BMariaDB%E9%80%B2%E8%A1%8C%E5%A3%93%E5%8A%9B%E6%B8%AC%E8%A9%A6.pdf"
Hi flying, Thank For Using This Script!
Begin To Extract File Complete Origin Name, Just Be Patient!
File Name You Wanna Download Is 使用sysbench對AWS下MariaDB進行壓力測試.pdf.
Begin To Download File, Just be patient!
######################################################################## 100.0%
Congratulations, File 使用sysbench對AWS下MariaDB進行壓力測試.pdf Has Been Saved Under Directory /home/flying!
[flying@lempstacker ~]$ ls -lh /home/flying/*sysbench*.pdf
-rw-rw-r-- 1 flying flying 517K Nov  2 14:00 /home/flying/使用sysbench對AWS下MariaDB進行壓力測試.pdf
[flying@lempstacker ~]$

Example4

此處使用sudosu對腳本進行操作

[flying@lempstacker ~]$ sudo -i
[root@lempstacker ~]# who
flying   :0           2016-11-02 08:59 (:0)
flying   pts/0        2016-11-02 09:01 (:0)
[root@lempstacker ~]# whoami
root
[root@lempstacker ~]# bash /tmp/scripts.sh "https://github.com/shihyu/linux_kernel_driver/blob/master/Books/Linux%E5%86%85%E6%A0%B8%E6%BA%90%E4%BB%A3%E7%A0%81%E6%83%85%E6%99%AF%E5%88%86%E6%9E%90(%E5%85%A8%E5%86%8C%E9%AB%98%E6%B8%85%E5%B8%A6%E4%B9%A6%E7%AD%BE).pdf"
Hi flying, Thank For Using This Script!
Begin To Extract File Complete Origin Name, Just Be Patient!
File Name You Wanna Download Is Linux内核源代码情景分析(全册高清带书签).pdf.
Begin To Download File, Just be patient!
######################################################################## 100.0%
Congratulations, File Linux内核源代码情景分析(全册高清带书签).pdf Has Been Saved Under Directory /home/flying!
[root@lempstacker ~]# ls -lh /home/flying/Linux*.pdf
-rw-r--r-- 1 root root 6.3M Nov  2 14:04 /home/flying/Linux内核源代码情景分析(全册高清带书签).pdf
[root@lempstacker ~]#

Expansion Proble

如何下載單個目錄,鑑於時間關係,有空再研究。

References

Change Log

  • 2016.01.12 01:26 Tue Asia/Beijing
    • 初稿完成
  • 2015.01.25 20:30 Mon Asia/Beijing
    • 勘誤,上傳至lempstacker部落格
  • 2016.09.23 11:40 Fri Asia/Shanghai
    • 對腳本中sed部分的代碼進行優化
  • 2016.11.02 13:52 Wed Asia/Shanghai
    • Shell Script代碼重構
  • 2016.12.21 15:46 Wed Asia/Shanghai
    • Shell Script代碼優化,添加網路狀況檢測,下載工具同時支持curlwget

  • Note Time: 2016.01.12 01:26 Tue
  • Note Location: Asia/Beijing
  • Writer: lempstacker