Back in 2021 I did a web crawl of the DtV Library, grabbing a copy of all the stories. I am more than happy to share them, but I'm a very infrequent contributor to this forum, and currently blocked from posting links.
@FfejL If you upload it to mega.nz or something, you should be able to just post the link as text with it broken up a bit, like:
mega . nz/file/9v0kFbAQ#jW7Okc2XNQZYvVu-1-R8kRiVIvexKdSjr6LcdxqxEYA
Also, on a side note, in researching which of those old sites went away I was pleased to find that a female muscle portal that's been around about as long as DtV is still alive and kicking: Muscles of Dee Kay
It even still has the same "hot woman casually bulging her bicep" gif on the front page that's an old favorite of mine!
Damn. Really? That was definitely my first foray down the rabbit whole that is female bodybuilding. Been hooked ever since.
The mega URL got taken down by a copyright notice from awefilms. Not sure what is their problem since it's content that is still publicly available on the internet, but it is what it is.
A post on another girlpower forum informed me about the effort going on here to recreate Diana's library, and the collection on Mega.
First off, there are 741 files in the collection that are just saves of Diana's 404 redirect, as you can check for yourself (since you {simoncop73} mentioned scripting, I assume you will understand this):
grep -rl ">That page can't be found<" *
Of those, I have 70 of the real files saved from Diana's site, and another I downloaded from the Wayback Machine some years back.
In total, I have about 320 files that are not matched at the same file location in your collection (270+ from Diana, the rest from Wayback). I say location, because the newstory/ directory includes files that are not properly categorized.
The program I had used for saving Diana's stories is wget. Frankly, it seems to be more capable than the tools you guys were using:
- It timestamps my files, using the Last-Modified time sent by the server to set the modification date of the local saved file. If your tools have that feature, it wasn't turned on.
- It preserves the site's directory structure well.
- Although it has some irritating quirks, that probably led me to miss a few files that I assumed I'd downloaded, overall it seems to be more thorough.
Unfortunately, while I used to save everything in Diana's library (just about, maybe not the foreign-language bookshelves), including stories that would later be deleted, my old collections have been lost to drive crashes and thieves. (If anything's recoverable, I don't expect it to be in the near future). When I rebuilt my collection on my current computer, I skipped most author bookshelves, limiting myself to authors whose tastes were (roughly) compatible with mine. And my ability to recover already-deleted files was limited to what was preserved by Wayback.
Also, while the HTML files you saved from the Wayback Machine have been modified by Wayback, I prefer to add the suffix "id_" to the timestring directory to get the site's raw HTML file. I still have Wayback-modified files, but I intend to replace them.
Out of those 741 only 30 were stories, the rest were broken photos containing the 404 page. As I mentioned, the photos were scraped through the waybackmachine. I checked the stories for the "page is missing" content, but did not bother for the images. Photos were a low priority.
I could've gotten the date from the timestamp, but it was not relevant for me and the date created was more useful for the scraping.
I would've encouraged you (and anyone) to post a version with your files merged and whatever fixes you consider, but might not be advisable now given the copyright notices. I myself fixed some of those in the meantime.
The mega URL got taken down by a copyright notice from awefilms. Not sure what is their problem since it's content that is still publicly available on the internet, but it is what it is.
Now that is absolutely a damn shame considering this post from Steve & Rowena earlier in this very forum thread.
Quoted below for posterity.
Awefilms Apr 01, 2024 - permalink
I am a relative stranger to GWM but David in our customer service dept. shared your forum post about DTV with me today. I heard a couple weeks ago about the passing of DTV and was sincerely saddened. A lot of time has flown by and things come and go but Rowena and I have to say that if not for us coming across DTV back in 1998 on or 28k dial up modem running on aol we would not have had the epiphany that made us start up Awefilms. Once we saw there was a world community of female muscle fans like us, no matter how small in comparisons to other fetishes, we would have never started our company. It is impossible to measure the enormous impact that site had on us over the years. RIP Diana. Steve & Rowena Scibelli
To be fair, looking again, it apparently came from "awefilms@gmail.com". Not sure if it's really them or someone impersonating them since it's a gmail account and not their official domain.
Thanks for the tip, chipperpip! Here's the ZIP of everything in DtV's library circa 2021:
mega .nz/file/8KwDSayZ#fSK5la4nFvie5xN1czE8r8q3DfSz4cz4TEZ3Iuorkko
You got a strike too?
Yeah, probably someone trolling.
I can repost those if anyone wants, but I'm currently in the process of trying to merge FfejL & simoncop73's versions, since although FfejL's was mostly reundant, it does have a fair amount of text files with fewer encoding issues that I'm swapping in.
I was working on figuring out what files simoncop73 missed on Wayback, also which files are redundant. However, now that I know others like FfejL also ripped the site, the urgency is gone. Whatever temporary roadblocks may exist for the moment, I can rest easy that the great bulk of the library will be preserved.
Unfortunately, I wasn't able to download FfejL's zip before it got taken down, but that's another reason to hold off until I see what he has. However, if these zips get too far over 600MB, maybe you guys could consider splitting them? Unlike most sites, Mega downloads to the browser process, and only then is it decoded and written to disk. The first night, I didn't have the free memory to finish saving it to my drive. I ended up doing it the next day with my computer freshly started.
chipperpip: From the sound of it, you are using the simon/mkr archive as your base, and filling in with FfejL's. Is that simply because it was first, or because it is better/more complete? I've already explained that while I'm grateful that the stories are being preserved, for my preferences, that archive has limitations: error-message files, redundant files, unorganized files, Wayback-modded files, no timestamps. Of course, I don't know how FfejL's archive compares.
@BereavedPaul
Because it was the most complete, and the one I started with. Whatever issues you had with the simoncop73/mkr version, FfejL's would have mostly shared them, as shown by the redundant hash comparison I did to remove the exact duplicates between the sets, which only left about 1,300 unique files out of his original 14,500.
I'm not really concerned about things like the Wayback Machine metadata and timestamps, as long as the content itself is viewable. I actually prefer to leave stubs of things that couldn't be retrieved in, since those make it more obvious what still needs to be recovered in the future (possibly from private collections, since I think the Wayback Machine will be mostly tapped out once I'm done).
I've finished restoring the HTML/Multimedia Stories folder as much as I was able, partially from my own decade-and-and-half-old-downloads (it was one of my main concerns, since there's some pretty classic female muscle artwork in there).
I also used wbm-dl to grab my own dump of the library, which seemed to get some things the previous attempts missed.
I might give these two downloaders a try to see if there's much difference, although they both seem like more of a pain to set up.
Actually, mentioning the HTML section gave me a good idea:
Here's a download for my restored version of just the "html/multimedia" section of illustrated stories:
https://www.mediafire.com/file/5rcy2qgaa3bq6v...
The password is "dianathevalkyrie", index.htm is the newer index page, index0.html is the older (and arguably more organized) one from around 2016.
If anyone has any of the missing pictures/gifs, let me know and I can add them in before uploading the full version. There are a couple of gifs missing for the "Big Betty" story linked from html/jpeg/various.htm, for instance.
@chipperpip
I'm not really concerned about things like the Wayback Machine metadata
Besides simpy preferring the "real" original file, I find it a little creepy to have files on my computer that want to call Wayback scripts and other files when I view them. Of course, I would usually look at them in offline mode anyway, and Diana also had some absolute links calling her site. But if the latter concerns anyone, it's relatively easy to write a script to change them to relative links. As I noted, it's possible to get the raw file by using the "id_" suffix.
and timestamps, as long as the content itself is viewable.
I do care, but I have to keep things in perspective. Much better to have the actual stories than a list of filenames with dates!
I actually prefer to leave stubs of things that couldn't be retrieved in, since those make it more obvious what still needs to be recovered in the future
Besides replacing those files with a list, they could be zeroed out, or substituted with much briefer content, e.g. "404".
I also used wbm-dl to grab my own dump of the library, which seemed to get some things the previous attempts missed.
It's true that there are files at Wayback that simoncop73 missed. I had saved a cdx search and was using it to compare Wayback's holdings against my collection and the Mega zip, but like I said, I've put that on hold until I see what all is in the consolidated collection.
I also used wbm-dl to grab my own dump of the library,
I just looked up wbm-dl, and its documentation declares, "The files downloaded are the original ones not the Wayback Archive rewritten version," so hooray for that. It also talks about timestamps, but only seems to be talking about the WM save timestamps, not changing the time of the downloaded files.
@yotv
You can just copy paste the text in notepad and save it as a. .txt file and you don't need to worry about anything
That doesn't scale. Though I actually wrote a script a few years ago for stripping out the Wayback-added code, before I discovered the "id_" suffix.
@chipperpip
It's been a few days, and there hasn't been an updated zip or any news. Are you worried that the DMCA jackass is still lurking? Are you still in the process of trying out different Wayback rippers? Maybe you have not had the free time to sift through all the different sources, to figure out what's new and which version of files to use? Or maybe I misunderstood, and you're simply informing other people how they can do their own Wayback scrapes? If you could spare a few words to update us, I'd appreciate it.
I haven't had as much time as I'd like to go back to it after finishing the HTML folder, hoping to finish the whole thing next week.
Speaking of which, I reuploaded my zip of the HTML/Multimedia stories folder in my previous post above, we'll see how that host works or if the troll is still active.
EDIT: Just a note, the troll is still active. They're claiming to be this company, "Internet Privacy Limited" now: https://find-and-update.company-information.s...
But they also gave the nonexistent email address thevalkyrie@comic.com for their contact info, which kind of confirms it's just someone fooling around, and as always DMCA claims don't seem to need much verification. I'll worry about the hosting later once I finish putting the package together.
although FfejL's was mostly reundant, it does have a fair amount of text files with fewer encoding issues that I'm swapping in.
As part of seeing what can be recovered from the Wayback Machine, I've been looking throught the results of a CDX search of the /stories directory. I had set this search to only show one result per "digest," which is supposedly a base-32 SHA-1 hash of the file. Most stories only had one digest, as expected. However, I noticed that there was a pattern of some old stories getting a new digest version around 2016.
I picked out one of these stories and compared the old Internet Archive snapshot to the one with the new digest. Both files had the same number of bytes. But in the original, the author had started each paragraph with a tab; this had been converted to a space. Also, an apparently non-UTF-8 character got converted to a '#'.
I don't know if this is related to the "encoding issues" you were talking about, and which version you would regard as having "fewer" issues. However, I rebuilt my current collection after the time of the change (seemingly late 2015), and it has the "new" versions. FfejL said he did his site-rip in 2021, so presumably he also downloaded changed files.
The story's directory had other files that were "updated," so I spidered their server headers and checked their "x-orig-last-modified" line. The changed versions did not have a modified date in 2015 or 2016. Some of them were dated to the same day as the original file, but some had a date of four days earlier than the original!
Also, as far as timestamps in general, many files used to report a timestamp of exactly midnight on a certain date, but in later years reported a time of midnight plus one second. I've been talking up timestamps, but it appears Diana's timestamps are more complicated than I realized.
Back in 2021 I did a web crawl of the DtV Library, grabbing a copy of all the stories. I am more than happy to share them, but I'm a very infrequent contributor to this forum, and currently blocked from posting links.
Perhaps those of us who both care about restoring this library and have the tech knowledge to do so might be able to collaborate?