You may be looking for one of the following things
ð Search Engine
ð Encyclopedia (not mobile-friendly!)
ð Website Explorerimproved
ð Similar Website Finder
ð Server Status
My name is Viktor. I’m a Swedish software engineer and hypertext enjoyer. Marginalia is a website I’ve built. It’s really almost a bunch of websites on a common theme. If you find yourself clicking a link and ending up on a page that looks completely different, that’s just how things are.
ð Marginalia Search on GitHub
ðĶĪ @MarginaliaNu on Twitter
ðĶĪ @marginalia@mastodon.social
ðš @ViktorLofgren on YouTube
âïļ kontakt@marginalia.nu on Email
Site Index
Name | Date |
---|---|
ð Weblog/ | |
ð Miscellaneous/ | |
ð Release Notes/ | |
ð Problems/ | |
ð Recipes/ | |
ð§ Server Status Log/ | |
ð Marginalia Search/ | |
ð Links/ | |
ðĪ Weird AI Crap/ | |
ð Uses |
Recent Updates
- 2025-06-17 Finding Dead Websites in log
- As some of the work planned for Marginalia Search this year has been progressing a bit faster than anticipated, there was time to implement an unplanned change. This post details the implementation of a system for detecting when servers are online, to avoid serving dead links and improve data quality, and for detecting when websites have significant changes including ownership transfers and parking. Table Of Contents Feature Rationale Data Representation Live Data Event Data Change Detection Details Availability Detection Ownership Changes DNS Implementation Hurdles Scheduling Certificate Validation Conclusions Feature Rationale Availability detection is useful not just for filtering out dead links in the search results, but for informing the crawler that it should stop trying to reach a dead domain, as well as a host of other things.
- 2025-05-29 Profiling Websites in log
- The most recent change to the search engine is a system that profiles websites based on their rendered DOM. The goal is identifying advertisements, trackers, nuisance popovers, and similar elements. The search engine already tries to do this, but isn’t very good at it because it’s only looking at static code. It turns out to be somewhat difficult to determine what a website that has non-trivial javascript will look like based its source code alone, as this would require us to among other things solve the halting problem.
- 2025-05-23 A 2030 morning routine in log
- You wake at 05:30 in the morning, feeling somewhat groggy. Instead of the alarm clock ringing like it normally does, a cheerful hologram appears: “Hi! I’m Kyle, your new alarm clock assistant!” You get dressed as Kyle explains all of the fantastic things he is capable of. You head over to the coffee machine. “Hey there! I’m Evan! Are you ready for AI in your coffee? But first - tell me about yourself!
- 2025-05-13 PDF to Text, a challenging problem in log
- The search engine has recently gained the ability to index the PDF file format. The change will deploy over a few months. Extracting text information from PDFs is a significantly bigger challenge than it might seem. The crux of the problem is that the file format isn’t a text format at all, but a graphical format. It doesn’t have text in the way you might think of it, but more of a mapping of glyphs to coordinates on “paper”.
- 2025-04-22 Debugging A Crawler Stall in log
- Some time ago, I migrated the crawler off the okhttp library, to use Java’s builtin HTTP client. This seemed like a good idea at the time, but has led to a fair number of headaches. Java’s HttpClient has one damning flaw, and that that it doesn’t support socket timeouts. Its only supported timeout values are time to connect, and time until first byte of the response. This means the client can get stuck on a read call if a server stops responding, potentially for a very long time!
Tags
Name | Count |
---|---|
ð·ïļ ai/ | 3 |
ð·ïļ bots/ | 4 |
ð·ïļ cooking/ | 6 |
ð·ïļ memex/ | 2 |
ð·ïļ moral-philosophy/ | 7 |
ð·ïļ nlnet/ | 20 |
ð·ïļ platforms/ | 9 |
ð·ïļ programming/ | 25 |
ð·ïļ satire/ | 6 |
ð·ïļ search-engine/ | 72 |
ð·ïļ server/ | 2 |
ð·ïļ sleep/ | 2 |
ð·ïļ web-design/ | 12 |