信息收集

Posted on 2025-04-23 Edited on 2025-08-19 In Learning

常用工具

whois

whois 用于基本信息的查询，如域名，IP 地址，各类相关人员信息等：

$ whois <domain_name/ip_address>

   Domain Name: FACEBOOK.COM
   Registry Domain ID: 2320948_DOMAIN_COM-VRSN
   Registrar WHOIS Server: whois.registrarsafe.com
   Registrar URL: http://www.registrarsafe.com
   Updated Date: 2024-04-24T19:06:12Z
   Creation Date: 1997-03-29T05:00:00Z
   Registry Expiry Date: 2033-03-30T04:00:00Z
   Registrar: RegistrarSafe, LLC
   Registrar IANA ID: 3237
   Registrar Abuse Contact Email: abusecomplaints@registrarsafe.com
   Registrar Abuse Contact Phone: +1-650-308-7004
   Domain Status: clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited
   Domain Status: clientTransferProhibited https://icann.org/epp#clientTransferProhibited
   Domain Status: clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited
   Domain Status: serverDeleteProhibited https://icann.org/epp#serverDeleteProhibited
   Domain Status: serverTransferProhibited https://icann.org/epp#serverTransferProhibited
   Domain Status: serverUpdateProhibited https://icann.org/epp#serverUpdateProhibited
   Name Server: A.NS.FACEBOOK.COM
   Name Server: B.NS.FACEBOOK.COM
   Name Server: C.NS.FACEBOOK.COM
   Name Server: D.NS.FACEBOOK.COM
   DNSSEC: unsigned
   URL of the ICANN Whois Inaccuracy Complaint Form: https://www.icann.org/wicf/
>>> Last update of whois database: 2024-06-01T11:24:10Z <<<

[...]
Registry Registrant ID:
Registrant Name: Domain Admin
Registrant Organization: Meta Platforms, Inc.
[...]

dig

dig 用于挖掘域信息：

1	$ dig domain.com <dns record type>

+trace 显示 DNS 解析的完整路径
-x 表示反向查找，是 +trace 逆过程
+short 提供简短版本的输出
+noall +answer 仅显示查询输出的答案部分

$ dig google.com

; <<>> DiG 9.18.24-0ubuntu0.22.04.1-Ubuntu <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16449
;; flags: qr rd ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             0       IN      A       142.251.47.142

;; Query time: 0 msec
;; SERVER: 172.23.176.1#53(172.23.176.1) (UDP)
;; WHEN: Thu Jun 13 10:45:58 SAST 2024
;; MSG SIZE  rcvd: 54

qr 表示是一个响应 Query Response Flag
rd 表示请求了递归 Recursion Desired Flag
ad 表示解释器认为数据真实 Authentic Data Flag
回答部分的数字表示 TTL，后面的 IP 地址表示与域名相关的 IP 地址，就是对问题部分的回答

dnsenum

dnsenum 用于枚举发现子域，可以获得较为全面的信息：

1	dnsenum --enum inlanefreight.com -f /usr/share/seclists/Discovery/DNS/subdomains-top1million-110000.txt -r

--enum 指定我们要枚举的目标域
-f 指定爆破用的 wordlist
r 表示递归子域爆破，这意味着如果 dnsenum 找到子域，它将尝试枚举子域的子域

$ dnsenum --enum inlanefreight.com -f  /usr/share/seclists/Discovery/DNS/subdomains-top1million-20000.txt 

dnsenum VERSION:1.2.6

-----   inlanefreight.com   -----


Host's addresses:
__________________

inlanefreight.com.                       300      IN    A        134.209.24.248

[...]

Brute forcing with /usr/share/seclists/Discovery/DNS/subdomains-top1million-20000.txt:
_______________________________________________________________________________________

www.inlanefreight.com.                   300      IN    A        134.209.24.248
support.inlanefreight.com.               300      IN    A        134.209.24.248
[...]


done.

gobuster

gobuster 用于枚举发现虚拟主机，通常用于目录和文件暴力破解：

1	$ gobuster vhost -u http://<target_IP_address> -w <wordlist_file> --append-domain

-u 指定目标 URL
-w 指定 wordlist
--apped-domain 将基域附加到 wordlist 中的每个单词（一般要加）
-t 可以指定线程数来加快扫描速度
-k 可以忽略 SSL/TLS 证书错误
-o 可以指定输出文件以保存输出

不知道为啥，我的 bobuster 输出总是有很多 error，不过这些错误信息不会输出到文件里，只会干扰控制台的显示输出，所以我都是输出到文件里再分析。

crt.sh

crt.sh 用于查找 CT 日志，有 web 界面，同时也可以在终端里用 API 直接访问：

$ curl -s "https://crt.sh/?q=facebook.com&output=json" | jq -r '.[] | select(.name_value | contains("dev")) | .name_value' | sort -u
 
*.dev.facebook.com
*.newdev.facebook.com
*.secure.dev.facebook.com
dev.facebook.com
devvm1958.ftw3.facebook.com
facebook-amex-dev.facebook.com
facebook-amex-sign-enc-dev.facebook.com
newdev.facebook.com
secure.dev.facebook.com

curl -s "https://crt.sh/?q=facebook.com&output=json"：此命令从 crt.sh 获取与域 facebook.com 匹配的证书的JSON输出
jq -r '.[] | select(.name_value | contains("dev")) | .name_value'：这部分过滤 JSON 结果，仅选择 name_value 字段（包含域或子域）包含字符串 dev 的条目，-r 标志告诉 jq 输出原始字符串
sort -u：这会按字母顺序对结果进行排序并删除重复项

whatweb

whatweb 可以非常方便快捷的获取网站相关的 web 技术：

1
2
3

$ whatweb http://inlanefreight.com

https://www.inlanefreight.com/ [200 OK] Apache[2.4.41], Bootstrap[5.6.14], Country[UNITED STATES][US], Email[info@inlanefreight.com,info@themeansar.com], HTML5, HTTPServer[Ubuntu Linux][Apache/2.4.41 (Ubuntu)], IP[134.209.24.248], JQuery[3.5.1], MetaGenerator[WordPress 5.6.14], Script[text/javascript], Title[Inlanefreight &#8211; Protected by Wordfence], UncommonHeaders[link], WordPress[5.6.14]

wafw00f

wafw00f 用于检测防火墙：

$ wafw00f inlanefreight.com

                ______
               /      \
              (  W00f! )
               \  ____/
               ,,    __            404 Hack Not Found
           |`-.__   / /                      __     __
           /"  _/  /_/                       \ \   / /
          *===*    /                          \ \_/ /  405 Not Allowed
         /     )__//                           \   /
    /|  /     /---`                        403 Forbidden
    \\/`   \ |                                 / _ \
    `\    /_\\_              502 Bad Gateway  / / \ \  500 Internal Error
      `_____``-`                             /_/   \_\

                        ~ WAFW00F : v2.2.0 ~
        The Web Application Firewall Fingerprinting Toolkit
    
[*] Checking https://inlanefreight.com
[+] The site https://inlanefreight.com is behind Wordfence (Defiant) WAF.
[~] Number of requests: 2

ReconSpider

ReconSpider 用于爬虫：

1	$ python3 ReconSpider.py http://inlanefreight.com

其输出存在 results.json 里：

$ cat results.json

{
    "emails": [
        "lily.floid@inlanefreight.com",
        "cvs@inlanefreight.com",
        ...
    ],
    "links": [
        "https://www.themeansar.com",
        "https://www.inlanefreight.com/index.php/offices/",
        ...
    ],
    "external_files": [
        "https://www.inlanefreight.com/wp-content/uploads/2020/09/goals.pdf",
        ...
    ],
    "js_files": [
        "https://www.inlanefreight.com/wp-includes/js/jquery/jquery-migrate.min.js?ver=3.3.2",
        ...
    ],
    "form_fields": [],
    "images": [
        "https://www.inlanefreight.com/wp-content/uploads/2021/03/AboutUs_01-1024x810.png",
        ...
    ],
    "videos": [],
    "audio": [],
    "comments": [
        "<!-- #masthead -->",
        ...
    ]
}

其他工具

搜索引擎，如 Google 的一些高级技巧，可以搜索到很多开源情报（OSINT, Open Source Intelligence），既合法又好用。

网络档案 The Wayback Machine 可以让我们回到一个网站曾经的版本，访问其当时的内容以及文件结构，这有助于我们发现潜在的目录和漏洞。

练习

首先给了我们 IP 地址和端口 83.136.255.10:58739，然后给了我们虚拟主机 inlanefreight.htb，我们直接添加到 /etc/hosts 里。

1
2
3

$ sudo vim /etc/hosts

83.136.255.10   inlanefreight.htb # 添加到文件末尾就可以

Task1

What is the IANA ID of the registrar of the inlanefreight.com domain?

第一问问我们 inlanefreight.com 的 IANA ID，我一开始没仔细看，以为是问的一开始提到的虚拟主机 inlanefreight.htb 然后用 whois 等工具搞了半天也没搞出来，后来又看了一遍题，才发现是 inlanefreight.com 直接 whois 就解决了。

1 2	$ whois inlanefreight.com \| grep IANA Registrar IANA ID: 468

Task2

What http server software is powering the inlanefreight.htb site on the target system? Respond with the name of the software, not the version, e.g., Apache.

我们直接 whatweb 一下就行了，有很多方法可以解决这个问题。

1 2	$ whatweb http://inlanefreight.htb:58739 http://inlanefreight.htb:58739 [200 OK] Country[FINLAND][FI], HTML5, HTTPServer[nginx/1.26.1], IP[83.136.255.10], Title[inlanefreight], nginx[1.26.1]

答案是 nginx。

Task3

What is the API key in the hidden admin directory that you have discovered on the target system?

这一问卡了我超级久，问 GPT 也费劲，最后受不了了，就到处查资料，查各种论坛找 wp 解决的。

题目让我们去找隐藏的 admin 目录里的 API key，我一开始想到了用 gobuster，然后我就去问 GPT 怎么说，因为题目说要找 directory，所以 GPT 就让我用 gobuster dir 扫，但是这样扫根本扫不到，让我郁闷了好久，搜了之后才知道要先扫 vhost，扫到一个子域之后再用 dir 扫文件。

先扫描 vhost：

1
2
3

$ gobuster vhost -u http://inlanefreight.htb:58739 -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-110000.txt -t 200 -o vhosts.txt --append-domain
$ cat vhosts.txt
Found: web1337.inlanefreight.htb:58739 Status: 200 [Size: 104]

然后我们把扫到的子域名也添加到 /etc/hosts 里：

$ sudo vim /etc/hosts

83.136.255.10   inlanefreight.htb
83.136.255.10   web1337.inlanefreight.htb # 继续添加到文件末尾就可以

再对扫到的这个子域进行扫描：

$ gobuster dir -u http://web1337.inlanefreight.htb:58739 -w /usr/share/seclists/Discovery/Web-Content/common.txt -o dirs.txt -t 200
$ cat dirs.txt
/index.html           (Status: 200) [Size: 104]
/robots.txt           (Status: 200) [Size: 99]

这里我们就发现了 /robots.txt，用 curl 读一下看看：

$ curl http://web1337.inlanefreight.htb:58739/robots.txt
User-agent: *
Allow: /index.html
Allow: /index-2.html
Allow: /index-3.html
Disallow: /admin_h1dd3n
* Connection #0 to host web1337.inlanefreight.htb left intact

我们发现最后有一个文件 Disallow: /admin_h1dd3n，看文件名就绝非善类，直接访问一下看看：

$ curl http://web1337.inlanefreight.htb:58739/admin_h1dd3n
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/1.26.1</center>
</body>
</html>

发现啥也没有，我就在 curl 上下功夫，加 -v -i 啥的，加 -L 跟踪重定向也试了，都没啥用。

$ curl -v http://web1337.inlanefreight.htb:58739/admin_h1dd3n
* Host web1337.inlanefreight.htb:58739 was resolved.
* IPv6: (none)
* IPv4: 83.136.255.10
*   Trying 83.136.255.10:58739...
* Connected to web1337.inlanefreight.htb (83.136.255.10) port 58739
* using HTTP/1.x
> GET /admin_h1dd3n HTTP/1.1
> Host: web1337.inlanefreight.htb:58739
> User-Agent: curl/8.13.0
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 301 Moved Permanently
< Server: nginx/1.26.1
< Date: Tue, 22 Apr 2025 11:00:48 GMT
< Content-Type: text/html
< Content-Length: 169
< Location: http://web1337.inlanefreight.htb/admin_h1dd3n/
< Connection: keep-alive
<
<html>
<head><title>301 Moved Permanently</title></head>
<body>
<center><h1>301 Moved Permanently</h1></center>
<hr><center>nginx/1.26.1</center>
</body>
</html>
* Connection #0 to host web1337.inlanefreight.htb left intact

$ curl -v -L http://web1337.inlanefreight.htb:58739/admin_h1dd3n
* Host web1337.inlanefreight.htb:58739 was resolved.
* IPv6: (none)
* IPv4: 83.136.255.10
*   Trying 83.136.255.10:58739...
* Connected to web1337.inlanefreight.htb (83.136.255.10) port 58739
* using HTTP/1.x
> GET /admin_h1dd3n HTTP/1.1
> Host: web1337.inlanefreight.htb:58739
> User-Agent: curl/8.13.0
> Accept: */*
>
* Request completely sent off
< HTTP/1.1 301 Moved Permanently
< Server: nginx/1.26.1
< Date: Tue, 22 Apr 2025 11:03:38 GMT
< Content-Type: text/html
< Content-Length: 169
< Location: http://web1337.inlanefreight.htb/admin_h1dd3n/
< Connection: keep-alive
* Ignoring the response-body
* setting size while ignoring
<
* Connection #0 to host web1337.inlanefreight.htb left intact
* Clear auth, redirects to port from 58739 to 80
* Issue another request to this URL: 'http://web1337.inlanefreight.htb/admin_h1dd3n/'
* Host web1337.inlanefreight.htb:80 was resolved.
* IPv6: (none)
* IPv4: 83.136.255.10
*   Trying 83.136.255.10:80...
* connect to 83.136.255.10 port 80 from 172.22.246.215 port 40132 failed: Connection refused
* Failed to connect to web1337.inlanefreight.htb port 80 after 278 ms: Could not connect to server
* closing connection #1
curl: (7) Failed to connect to web1337.inlanefreight.htb port 80 after 278 ms: Could not connect to server

PS: 其实这里的 Location 已经告诉我要加 / 了，下面也明确告诉我重定向到了这个 URL，但是我当时根本没考虑会是 / 的问题。

1
2
3

< Location: http://web1337.inlanefreight.htb/admin_h1dd3n/ # 最后面有个 '/'

* Issue another request to this URL: 'http://web1337.inlanefreight.htb/admin_h1dd3n/' # 最后面也有个 '/'

然后我又开始郁闷了，墨迹了好一会，又去找 wp，仔细又仔细地看，发现最后面需要加一个 /，也就是 admin_h1dd3n/，然后就解决了：

1
2

$ curl -i http://web1337.inlanefreight.htb:58739/admin_h1dd3n/
<!DOCTYPE html><html><head><title>web1337 admin</title></head><body><h1>Welcome to web1337 admin site</h1><h2>The admin panel is currently under maintenance, but the API is still accessible with the key e963d863ee0e82ba7080fbf558ca0d3f</h2></body></html>

答案就是 e963d863ee0e82ba7080fbf558ca0d3f。

Task4

After crawling the inlanefreight.htb domain on the target system, what is the email address you have found? Respond with the full email, e.g., mail@inlanefreight.htb.

好的，我看到邮箱了，第一反应就是爬虫，于是我直接用 ReconSpider 去爬：

$ python3 ReconSpider.py http://web1337.inlanefreight.htb:58739
$ cat results.json
{
    "emails": [],
    "links": [],
    "external_files": [],
    "js_files": [],
    "form_fields": [],
    "images": [],
    "videos": [],
    "audio": [],
    "comments": []
}

然后我神奇地发现什么也没爬到，我甚至开始怀疑是我的脚本挂了，还用之前能正常爬的网站测试了一遍，事实证明，脚本确实没问题，确实是啥也扫不到。

我又郁闷了好一会，去看 wp 了。

超了，在之前扫到的子域里再进行一次子域扫描：

1
2
3

$ gobuster vhost -u http://web1337.inlanefreight.htb:58739 -w /usr/share/seclists/Discovery/DNS/subdomains-top1million-110000.txt -t 200 -o vhosts.txt --append-domain
$ cat vhosts.txt
Found: dev.web1337.inlanefreight.htb:58739 Status: 200 [Size: 123]

又扫到东西了，它还在变长。

我们把它添加进 /etc/hosts ：

1
2
3

83.136.255.10   inlanefreight.htb
83.136.255.10   web1337.inlanefreight.htb
83.136.255.10   dev.web1337.inlanefreight.htb

然后爬一下：

$ python3 ReconSpider.py http://web1337.inlanefreight.htb:58739
$ cat results.json | jq '.emails'
[
  "1337testing@inlanefreight.htb"
]

问题~~愉快地~~解决了。

Task5

What is the API key the inlanefreight.htb developers will be changing too?

这一问其实在上一问的时候已经顺带解决了，就在我们爬到的注释里：

$ cat results.json | jq '.comments'
[
  "<!-- Remember to change the API key to ba988b835be4aa97d068941dc852ff33 -->"
]

所以可以先对爬到的数据进行一个浏览，然后在寻找特定的需要的数据时，再用 jq 进行一个筛选。