If you save just the HTML of a web page, you lose images & CSS. If you screenshot it, you preserve images and formatting, but lose links and searchability. If you save it to PDF, formatting often gets mangled. Recursive wget grabs too much. Safari webarchives aren't viewable on other platforms. Evernote web clips often format poorly and don't export cleanly. And archiving any whole page will include ads and unrelated junk that pollutes search. Bookmarks rot.
How do you clip web content??
@stevenf it’s a tricky one. At the moment I’ve been bookmarking via Pinboard.in which also creates an archived copy of the page/article. Besides that and using services like archive.is, I have found wget to be the easiest way of scraping an entire website. It’ll get the ads etc too but on a computer with no network connection there’s really no concern. There are probably better methods that people like The Archive Team use though
@bkhl Hm, never seen that one before. Will check it out!
@stevenf can you save to WARC format (which the Internet Archive uses)? It looks like wget will save to it, and there are various tools to work with it: https://www.archiveteam.org/index.php?title=The_WARC_Ecosystem
(I haven't used this myself, but I've vaguely wanted to!)
@npd I’m trying to remember why it didn’t work out for me. Vaguely recall it being harder than expected to turn it back into a viewable page. I should take another look probably, I didn’t spend long with it.
@stevenf manually if using a computer (taking notes in markdown) or with a screenshot if on my phone
@stevenf fwiw some browsers (used to?) export stuff in RTF which can sort-of preserve at least some formatting
I wish Safari’s web archives would see broader use, such an under-appreciated feature.
Merveilles is a community project aimed at the establishment of new ways of speaking, seeing and organizing information — A culture that seeks augmentation through the arts of engineering and design. A warm welcome to any like-minded people who feel these ideals resonate with them.