|
34 | 34 | 6. Crawl custom domains
|
35 | 35 | 7. Check if the link is live
|
36 | 36 | 8. Built-in Updater
|
37 |
| -9. Build visual tree of link relationship that can be quickly viewed or saved to an image file |
| 37 | +9. Build visual tree of link relationship that can be quickly viewed or saved to an file |
38 | 38 |
|
39 | 39 | ...(will be updated)
|
40 | 40 |
|
41 | 41 | ### Dependencies
|
42 |
| -- Tor |
| 42 | +- Tor (Optional) |
43 | 43 | - Python ^3.9
|
44 |
| -- Golang 1.19 |
45 | 44 | - Poetry
|
46 | 45 |
|
47 | 46 | ### Python Dependencies
|
48 | 47 |
|
49 |
| -(see requirements.txt for more details) |
50 |
| - |
51 |
| -### Golang Dependencies |
52 |
| -- https://github.com/KingAkeem/gotor (This service needs to be ran in tandem with TorBot) |
| 48 | +(see pyproject.toml or requirements.txt for more details) |
53 | 49 |
|
54 | 50 | ## Installation
|
55 | 51 |
|
56 |
| -### Gotor |
57 |
| -gotor is needed to run this module. |
58 |
| -Note: If the `gotor` directory is empty, you may need to run `git submodule update --init --recursive` to initialize the submodule. |
59 |
| - |
60 |
| -#### Using local Tor service |
61 |
| -* Run the tor service: |
62 |
| -```sh |
63 |
| -sudo service tor start |
64 |
| -``` |
65 |
| -* Make sure that your torrc is configured to SOCKS_PORT localhost:9050 |
66 |
| - |
67 |
| -* Open a new terminal and start `gotor`, this can be done using `docker` or `go` |
68 |
| -- using go: |
69 |
| -```sh |
70 |
| -cd gotor && go run cmd/main/main.go -server |
71 |
| -``` |
72 |
| - |
73 |
| -#### Using tor and gotor docker containers |
74 |
| -- using docker (multi-stage image, builds tor and gotor container): |
75 |
| -```sh |
76 |
| -cd gotor && ./build.sh |
77 |
| -``` |
78 |
| - |
79 | 52 | ### TorBot
|
80 | 53 | * TorBot dependencies are managed using `poetry`, you can find the installation commands below:
|
81 | 54 | ```sh
|
82 | 55 | poetry install # to install dependencies
|
83 |
| -poetry run python run.py -u https://www.example.com --depth 2 -v # example of running command with poetry |
84 |
| -poetry run python run.py -h # for help |
85 |
| -``` |
86 |
| - |
87 |
| -### Full Installation |
88 |
| -There is a shell script that will attempt to install both `torbot` and `gotor` as global modules. |
89 |
| -The script `install.sh` will first install the latest version of `torbot` found in `PyPI`, |
90 |
| -then it will attempt to install `gotor` to the `GOBIN` path after making the path globally accessible. |
91 |
| -```sh |
92 |
| -source install.sh # execute script |
93 |
| -``` |
94 |
| - |
95 |
| -You can now run |
96 |
| -```sh |
97 |
| -gotor -server |
98 |
| -``` |
99 |
| -and crawl using |
100 |
| -```sh |
101 |
| -python -m torbot -u https://www.example.com |
| 56 | +poetry run python torbot/main.py -u https://www.example.com --depth 2 --visualize tree --save json # example of running command with poetry |
| 57 | +poetry run python torbot/main.py -h # for help |
102 | 58 | ```
|
103 | 59 |
|
104 | 60 | ### Options
|
105 | 61 | <pre>
|
106 | 62 | usage: Gather and analyze data from Tor sites.
|
107 | 63 |
|
108 | 64 | optional arguments:
|
109 |
| - -h, --help show this help message and exit |
110 |
| - --version Show current version of TorBot. |
111 |
| - --update Update TorBot to the latest stable version |
112 |
| - -q, --quiet |
113 | 65 | -u URL, --url URL Specifiy a website link to crawl
|
114 |
| - -s, --save Save results in a file |
115 |
| - -m, --mail Get e-mail addresses from the crawled sites |
116 |
| - -p, --phone Get phone numbers from the crawled sites |
117 | 66 | --depth DEPTH Specifiy max depth of crawler (default 1)
|
118 |
| - --gather Gather data for analysis |
119 |
| - -v, --visualize Visualizes tree of data gathered. |
120 |
| - -d, --download Downloads tree of data gathered. |
121 |
| - -e EXTENSION, --extension EXTENSION |
122 |
| - Specifiy additional website extensions to the list(.com , .org, .etc) |
123 |
| - -c, --classify Classify the webpage using NLP module |
124 |
| - -cAll, --classifyAll Classify all the obtained webpages using NLP module |
125 |
| - -i, --info Info displays basic info of the scanned site </pre> |
| 67 | + -h, --help Show this help message and exit |
| 68 | + -v Displays DEBUG level logging, default is INFO |
| 69 | + --version Show current version of TorBot. |
| 70 | + --update Update TorBot to the latest stable version |
| 71 | + -q, --quiet Prevents display of header and IP address |
| 72 | + --save FORMAT Save results in a file. (tree, json) |
| 73 | + --visualize FORMAT Visualizes tree of data gathered. (tree, json, table) |
| 74 | + -i, --info Info displays basic info of the scanned site |
| 75 | + --disable-socks5 Executes HTTP requests without using SOCKS5 proxy</pre> |
126 | 76 |
|
127 | 77 | * NOTE: -u is a mandatory for crawling
|
128 | 78 |
|
|
0 commit comments