-
-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document disk space requirements #186
Comments
At 884K indexed my Postgres data/base directory is at 6.7G |
One inportant thing to note is that the size is heavily influenced by the amount of file data you store. |
Currently on the FAQ page we have:
I agree better documentation on this would be good - but at the moment things are changing at a rapid pace that will affect disk space usage, and we're just getting to the stage where people have had it running long enough to get some better numbers about the current implementation. The next thing will be rule based workflows that can auto-delete and do other things that will affect this - so maybe we should come back to this in a few months and aim to make some better docs on this when things are more stable? |
That is correct, but a estimate with the default settings would be suffices to start with, and it would give you an estimate as to what hardware you would need to at least just test and play around with it. |
This puts the average torrent size at 6,7×1024×1024÷228000 = ~30KiB vs my estimated 18KiB
Yes, hence the need to use averages which are useful for estimation. People with a higher number of indexed torrents should have averages closer to reality/less bias. It would be interesting to compare between databses with about the same number of torrents.
That was my guess, hence #187
I did not see this section, it was right under my nose /facepalm, however
This puts the average torrent size at 5.2KiB... why so much difference between our 3 measurements? I think more samples are needed. |
I've checked just now and am on 67GB for 13.5 million torrents. A couple of things to bear in mind:
|
For me:
Meaning:
Though I feel that disk IO throughput is more a limiting factor than disk size when you use HDDs. Had a DB much bigger and was struggling to keep up writes. bitmagnet=# select pg_size_pretty(pg_database_size('bitmagnet'));
pg_size_pretty
----------------
4145 MB
(1 row)
bitmagnet=# select count(*) from torrents;
count
--------
528570
(1 row)
bitmagnet=# select count(*) from torrent_files;
count
---------
7086417
(1 row)
|
I'm at 78 GB for 7.059.136 Torrents |
I did not think about that, there is some database space used for TMDB data
I was relying on netdata postgresql bd size monitoring, but it's consistent with the results I get from Thanks everyone for the metrics, I will start a table below and update it every time someone posts their db stats. After a while it could be added to the documentation, hopefully.
|
To add another data point, I have 9 228 000 torrents with a total of 291 283 000 files, stored in 145 GB, using the config option DHT_CRAWLER_SAVE_FILES_THRESHOLD=500000 (to ensure file information is stored even on excessively large torrents, default cutoff is to store at most 100 files per torrent). This means I have 31 files per torrent on average, over twice what kde99 got above. The largest torrent in my database contains 10870 files. 4.5% of torrents exceed the default DHT_CRAWLER_SAVE_FILES_THRESHOLD of 100 files. Average size per torrent is correspondingly a bit larger at 16KB/torrent, or 535 bytes per file. I agree that disk throughput is a much bigger factor. If you are using cheap consumer SSDs you also really feel the wear Bitmagnet puts on the disk. If I'm interpreting my disk stats correctly Bittorrent has written a total of about 180TB in service of creating this 145GB database. |
I have 985 847 torrents with a total of 29 345 871 files, stored in 15 736 869 347 bytes, using the config option |
@danpalmer output.txt.gz, |
Currently sitting at 54M files in 12M torrents. The graph looks approximately like this (not a perfect graph but you get the gist) Just beyond your graph there are some interesting patterns with some sizes being more common. But generally it looks like a normal power law distribution that has spikes at round or notable numbers (e.g. 1000 is 10 times more common than 998 and 5000 is 10 times more common than 4971). The biggest torrent in my database has ~160k files. Typical ultra-high-filecount (>10k files) torrents you might encounter are
None of them seem like abuses or trolling. As a method of shipping software I find it a bit weird, but I generally prefer high-filecount torrents over torrents that contain a simple zip file, or worse: a collection of zip files. Having the files right there means I don't have to unpack anything, I can access parts of the torrent while the rest is still downloading, and I can actually search for filenames using bitmagnet. (@nodiscc I believe you copy-pasted the wrong average per torrent for my entry in your table above, and the wrong db size for orzFly) |
Is your feature request related to a problem? Please describe
The documentation should mention the expected disk space requirements when the DHT crawler is enabled, relative to the number of torrents indexed, since this is by far the most demanding system requirement, and as it is dictated by factors outside the control of the user (total size of the bittorrent DHT... is there an estimate for this somewhere?)
Describe the solution you'd like
Document on https://bitmagnet.io/faq.html a few examples of DB sizes relative to the number of torrents, for example:
The value of 2.5GB for 143k torrents is from a measurement on my test instance. This puts the average size of a torrent at ~18KB. It would be interesting to see the numbers from instances with a lower/higher number of indexed torrents, and use that as an estimate.
I will report a separate issue about a potential setting to hard-limit the DB size.
Describe alternatives you've considered
Documenting the expected disk space requirements related to total run time, since the number of indexed torrents depends on the total time spent crawling.
Additional context
Somewhat related to #70 which would help keep the database size in check.
The text was updated successfully, but these errors were encountered: