Bluesky April 2026 Outage Post-Mortem

(pckt.blog)

78 points | by jcalabro 2 hours ago

10 comments

mwkaufma 3 minutes ago
Tell us more about this buggy "new internal service" that's scraping batch data :P
threecheese 2 hours ago
> What I had missed is that we deployed a new internal service last week that sent less than three GetPostRecord requests per second, but it did sometimes send batches of 15-20 thousand URIs at a time. Typically, we'd probably be doing between 1-50 post lookups per request.
That’ll do it.
[-]
- 98codes 2 hours ago
  Ahh, the three relevant numbers in development: 0, 1, and infinity.
- bombcar 2 hours ago
  Zero, one, many, many thousands.
- htx80nerd 23 minutes ago
  less than ideal if I had to be frank.
tapoxi 12 minutes ago
I don't really understand this architecture, but I thought Bluesky was distributed like Mastodon? How can it have an outage?
[-]
- pfraze 8 minutes ago
  This writeup is useful for backend engineers: https://atproto.com/articles/atproto-for-distsys-engineers
  The simple answer is that atproto works like the web & search engines, where the apps aggregate from the distributed accounts. So the proper analogy here would be like yahoo going down in 1999.
  [-]
  - isodev 3 minutes ago
    Google and MSN Search were already available at this time. Also websites used to publish webrings and there was IRC and forums to ask people about things.
- isodev 7 minutes ago
  It’s more of a concept of a plan for being distributed. I even went through the trouble of hosting my own PDC and still, I was unable to use the service during the outage
- Retr0id 8 minutes ago
  Mastodon infra can have outages, too.
  [-]
  - tapoxi 1 minute ago
    It's just confined to one instance if it goes down, not all of Mastodon.
goekjclo 1 hour ago
> The timing of these log spikes lined up with drops in user-facing traffic, which makes sense. Our data plane heavily uses memcached to keep load off our main Scylla database, and if we're exhausting ports, that's a huge problem.
I expect this is common.
electrondood 29 minutes ago
Great write up... curious about the RCA. Thanks!
rvz 1 hour ago
Thank you for the post mortem on this outage.
templar_snow 1 hour ago
[flagged]
[-]
- lavela 1 hour ago
  Why?
jonstaab 50 minutes ago
nostr never goes down
[-]
- jandrese 0 minutes ago
  If nostr went down would people even notice?
- pfraze 48 minutes ago
  All support to other decentralizers but nothing never goes down.
  [-]
  - jonstaab 24 minutes ago
    1000x redundancy makes it vanishingly unlikely. Although I know we're due for a pole shift so all bets are off I suppose.
jmclnx 1 hour ago
Lite Blue on a dark Blue background. That is a new one, I have seen grey text on lite grey, but blue on blue ?
The article does work in lynx, at least I can read it.
gsibble 7 minutes ago
Did all 3 users notice?