🍩 Doughnut Reader 🍩

paulgb

82 comments

2 days ago

We built LegendKeeper using Yjs! (It's a mapping app as well, but fantasy). Ended up rolling our own sync server to handle large scale multiplexing, as we have D&D game-masters with 25,000+ documents to manage. (I don't know how they do it, tbh!)

We opt for the central server as a super-peer and use the Yjs differential update system to avoid loading docs in memory for too long. While there are many things about local-first that are a huge pain in the ass, the UX benefits are pretty huge. The DX can be nice too! Getting to focus on product and not on data transit (once you've got a robust sync system) is pretty sweet. The first 4 weeks of launching our Yjs-based system was rough though; lots of bugs that virally replicated between peers. It requires a really paranoid eye for defensive coding; after several years, we have multiple layers of self-healing and validation on the client.

braden-lk

2 days ago

I feel like we’re on the verge of seeing the start of a whole new wave of local-first apps (and "personal software" as mentioned here https://x.com/rauchg/status/1840293374059839726), but we’re really missing Rails-like frameworks that offer a complete package for development. Something that gives you all the tools you need—from syncing, conflict resolution, state management, authorization, background jobs in the context of local-first to deployment—without having to reinvent the wheel.

I built a simple SaaS [1] to get a sense of what's missing and while React Router + a syncing local first database [2] + $5/month Cloudflare gets you pretty far, I still found myself needing to think through a lot of pieces

[1] https://usequickcheck.com/ [2] https://fireproof.storage/

necrodome

2 days ago

Zero [0] is coming, from the people that made Replicache

[0] https://zerosync.dev/

halfcat

2 days ago

This sounds so fantastic. Thanks for sharing. I wonder how well it'll be able to incorporate into existing apps vs making new ones from scratch. I've been using Splitt.app a lot lately while traveling and it drives me nuts that it doesn't have better offline/low data support. I'd like to improve it but haven't dug into the site yet to see what it would take.

raybb

2 days ago

This seems to be a local cache, not a local-first data store?

ForHackernews

2 days ago

Yes, Replicache is more the local-first solution, if you’re going for offline-capable with eventual data sync. But requires some work to integrate your API, and thinking about how to do conflict-free updates with your API.

I haven’t gotten my hands on Zero yet, but the gist I get from those who have is that it gives you the same kind of experience with less work, where the client just operates from the local cache, and I’d assume there will be some way of lazy loading the entire data set into the cache which would give you offline capabilities if desired, but still functions as a traditional web app if the data set is too large.

halfcat

a day ago

I think a lot of people — including myself — would be very interested in a longer write-up of that system if you're ever interested in sharing more :)

jakelazaroff

2 days ago

Oh fascinating! I'd love to read more about what worked well & poorly when building LegendKeeper on top of Yjs.

What features do you wish Yjs had that would make your life easier?

josephg

2 days ago

It's funny how this newfangled "local first" thing is what us old fogies used to call just "applications."

I get that there is networking and integration that a modern application will typically need to do (depending on its core purpose), and syncing state to and from servers is a special concern (especially if conflict management is necessary) that native desktop applications rarely had to do in years past.

But at the end of the day, it sure does feel like we've come full circle. For a long time, every single application was "local first" by default. And now we're writing research papers and doing side POCs (I'm speaking generally, nothing to do with the author or their article) trying to figure out how to implement these things.

gspencley

2 days ago

I feel this way at times too, but the ability to sync state really is key. Traditional native desktop apps rarely had to consider this because users would only ever interact on one device. Now users are moving between a computer and a phone, if not more devices.

If I were to frame it succinctly I would say applications = local-only which is distinct from local-first.

matthiaswh

a day ago

Agreed but even data sync wasn't unheard of before everything became a "web application."

I mean not to get really low level or abstract, but there's a reason that operating systems have the concept of a virtual file system. Where and how your data is persisted is something that can be abstracted from the rest of the system. Add CRDT or another conflict resolution solution to that layer and, I don't want to pretend that it's simple, far from it ... but it's not new tech is all I'm really saying.

Distributed systems were a very hot topic in the 90s. We even went through one very awkward and short-lived fad where we toyed with the idea of having a single application distributed across a network of computers ... not to persist data across multiple systems per se but for computation performance. This entire concept got abandoned when people realized how unnecessarily complicated it was (all cost / very little reward). But it was a thing for a while.

Even RAID distributes data persistence across physical devices and needs to care about data integrity as a result.

It just seems like the longer I'm in the industry, the more I realize that there is very little that is actually new ... despite the fact that we have a large number of enthusiastic young software developers who are looking at these concepts with starry eyes and youthful ignorance because they weren't around decades ago when this stuff was being researched, developed and experimented with.

gspencley

16 hours ago

> Architecturally, Y-Sweet acts as a bus: clients connect to the Y-Sweet server rather than directly to each other. Whenever a client connects or makes changes, it syncs its local document with the Y-Sweet server. Y-Sweet merges the client’s document into its own copy, saves it to S3 and broadcasts updates to other clients. Since CRDTs are guaranteed to eventually converge on the same state, at the end of this process all clients have the same document.

I had thought that the advantage of CRDTs was you do not need a centralized server and that if you do have a central server Operational Transforms are easier. Am I missing why CRDTs are used here?

er4hn

2 days ago

Author here! A few thoughts on this:

- First and (maybe most importantly), WebRTC in browsers requires a central server for signaling. So unless web browsers loosen that constraint, a "true" P2P web app without a central server is unfortunately infeasible.

- My understanding is that with Operational Transforms, the server is "special" — it's responsible for re-ordering the clients' operations to prevent conflicts. I mention a little later in the article that Y-Sweet is just running plain Yjs under the hood. So it is a central server, but it's easily replaceable with any other instance of Y-Sweet; you could fork the code and run your own and it would work just as well.

- Peers will only sync their changes if they're online at the same time. That means that the longer peers go without being online simultaneously, the more their local documents will diverge. From a user experience point of view, that means people will tend to do a lot of work in a silo and then receive a big batch of others' changes all at once — not ideal! Having a "cloud peer" that's always online mitigates that (this is true for any algorithm).

jakelazaroff

2 days ago

The ability to run your own server--or even not requiring a bespoke set of operations--isn't a special property of a CRDT; the same thing should be doable with something like ShareJS and it's generic tree/JSON structures.

FWIW, though, the author of ShareJS had said some pretty strong things pro-CRDT in the past and even kind of lamenting his work on OT, so...

https://news.ycombinator.com/item?id=24194091

saurik

2 days ago

Hi, that’s me!

OT would work fine to make this collaboratively editable. It’s just not local first. (If that matters to you.)

With an OT based system like sharejs or google docs, the server is the hub, and clients are spokes connecting to that hub. Or to put it another way, the server acts as the central source of truth. If the server goes down, you’ve not only lost the ability to collaboratively edit. You've also usually lost your data too. (You can store a local copy, but sharejs not designed to be able to restore the server’s data from whatever is cached on the clients).

With Yjs (and similar libraries), the entire data set is usually stored on each peer the server is just one node you happen to connect to & use to relay messages. Because they’re using Yjs, the author of this travel app could easily set up a parallel webrtc channel with his wife’s computer (in the same house). Any edits made would be broadcast through all available pipes. Then even when they’re on the road and the internet goes down, their devices could still stay in sync. And if the server was somehow wiped, you could spin up another one. The first client that connects would automatically populate the server with all of the data.

But whether these local first characteristics matter to you is another question. They might be a hindrance - for commercial data, centralisation is often desirable. I can think of plenty of use cases where replicating your entire database to your customers’ computers (and replicating any changes they make back!) would be a disaster. It depends on your use case.

josephg

2 days ago

(One of the authors of Y-Sweet)

You’re right, that is one of the advantages of CRDTs, but it turns out to be hard to realize on the web — aside from RTC (which has its own dragons), you still need a server in the mix.

The other thing an authoritative server solves is persisting the data. Because one server is the authority for a document at a time, you can use S3 or R2 for persistence without worrying about different servers with different versions of the document colliding and erasing each other’s changes.

paulgb

2 days ago

Oh, this is interesting. Can you elaborate (or link to some post) about what sort of issues you run into? I find the concept of CRDTs to be very interesting, but if you still need a centralized server I question the value of them over OTs. I'd love to understand more about if this is a connectivity issue, a CRDT issue, or what.

er4hn

2 days ago

The main issue you run into is that the web is designed around client-server communication; you can do peer-to-peer communication in theory but because of NAT and firewalls, many of those end up not being peer to peer in the end.

Plus, if you want the data to persist so that two people can collaborate even if they are never online at the same time, you need a server anyway.

CRDTs as data structures support peer-to-peer, it’s just that in many use cases that aspect of CRDTs is not needed.

paulgb

2 days ago

Okay that all makes a lot of sense thank you.

er4hn

2 days ago

I think that using a bus decomplicates the connectivity story - having a “cloud peer” removes quite a bit of coding and testing from implementing true peer to peer discovery and communications functionality.

Bonus points: you could potentially rip out the bus and replace it with something that involves peer to peer connectivity without changing client data structures.

com

2 days ago

I love how the map automatically updates based on the places typed in the editor. A great visual aid to a text-based workflow.

I got confused by this comment though:

  > To determine when to re-render, “reactive” frameworks like Svelte and Solid track property access using Proxies, whereas “immutable” frameworks like React rely on object identity.
I thought React was just as reactive as all the other JS frameworks, and that the state/setState code would look similar.

wonger_

2 days ago

Author here! Maybe I could have worded that better — basically, when you call setState in React, it compares the identities of the new state and old state to determine whether to re-render. Svelte and Solid use “signals” to automatically determine when to re-render, with the drawback that you’re no longer interacting with the “raw” value but a Proxy. But neither way would be able to detect Yjs mutating its internal state; you’re correct that even in React you would need some sort of wrapper code “tricking” it into re-rendering when appropriate.

jakelazaroff

2 days ago

I’ve enjoyed using https://syncedstore.org/docs/ as such wrapper code for Yjs in React. They have a Svelte lib too, haven’t tried it.

Great article, congrats on releasing!

jayunit

2 days ago

I haven't worked frontend in a long time but just from reading that snippet I would assume React does a `obj===obj` (identity) comparison for state update under the hood, while Svelte/Solid are doing a Proxy trap on the accessors like `parent.obj =` would result in an intercept fn being called when you hit that 'obj' access. Proxy being a little more complicated to reason about and setup in totality but a lot more powerful and flexible.

IggleSniggle

2 days ago

For those interested in this stack,I have been working on an Obsidian.md plugin called Relay that makes it fully collaborative using yjs and y-sweet.

We also use a hub and spoke model, but we still rely on a central server (pocketbase) for management user flows like authorization and billing.

Obsidian is such a fantastic editor, and it fits so naturally with local-first collaboration.

dtkav

2 days ago

Inability to scale with sustained usage (1+ person*year of data) is the fatal problem with this category in existing approaches. Root of this is primarily the “partial sync problem” - when dataset outgrows both the memory and compute resources available in the client device (which is unreliable, not under your control to make reliable, and resource constrained - and not everybody has the latest giga device), you have to somehow decide which working set to replicate. If the app structure is a graph and involves relational queries, this is undecidable without actually running the queries! If the app structure is constrained to topic documents, you still have to choose which topics to cache and it still falls over at document sizes easily seen in prolonged solo use let alone use at an enterprise which market is necessary to justify VC investment. All this in an increasingly connected world where the cloud is 10ms away from city/enterprise users (the $$$) and any offline periods (subway etc) resolve quickly, so a degraded offline mode (letting you edit whatever you were last looking at w/o global relational query etc) is often acceptable. Oh, and does your app need AI?

Change my view!

dustingetz

2 days ago

Author here! A few thoughts:

1. That "in existing approaches" qualifier is important — local-first is still very much a nascent paradigm, and there are still a lot of common features that we don't really know how to implement yet (such as permissioning). You might be correct for the moment, but watch this space!

2. I think most apps that would benefit from a local-first architecture do not have the monotonically growing dataset you're describing here. Think word processors, image editors, etc.

3. That said, there are some apps that do have that problem, and local-first probably just isn't the right architecture for them! There are plenty of apps for which client-server is a fundamentally better architecture, and that's okay.

4. People love sorting things into binaries, but it doesn't have to be zero-sum. You can build local-first features (or, if you prefer, offline-first features) into a client-server app, and vice versa. For example, the core experience of Twitter is fundamentally client-server, but the bookmarking feature would benefit from being local-first so people can use it even when they're offline.

jakelazaroff

2 days ago

My claim is that the partial sync problem is intractable unless you can partition your dataset into small topic documents. If this claim is correct, it is not "momentarily correct", it is inevitably correct, i.e., "incapable of being avoided or evaded". If the claim is not correct, I welcome any of the many researchers in this space to correct me!

dustingetz

a day ago

I understand your claim! Mine is that even if it's correct, it's not "the fatal problem in this category", for the reasons I outlined.

jakelazaroff

a day ago

A lot of the newer local first systems, like Triplit (biased because I work on it), support partial replication so only the requested/queried data is sent and subscribed to on the client.

The other issue of relying on a just the server to build these highly collaborative apps is you can't wait for a roundtrip to the server for each interaction if you want it to feel fast. Sure you can for those rare cases where your user is on a stable Wifi network, on a fast connection, and near their data; however, a lot computing is now on mobile where pings are much much higher than 10ms and on top of that when you have two people collaborating from different regions, someone will be too far away to rely on round trips.

Ultimately, you're going to need a client caching component (at least optimistic mutations) so you can either lean into that or try to create a complicated mess where all of your logic needs to be duplicated in the frontend and backend (potentially in different programming languages!).

The best approach IMO (again biased to what Triplit does) is to have the same query engine on both client and server so the exact query you use to get data from your central server can also be used to resolve from the clients cache.

matlin

2 days ago

Just an idea: perhaps all of the end devices should have at least some high reliability storage. This would enable local applications that require high data durability and integrity.

Probably it'd require ECC RAM to prevent in memory bitrot, multiple copies of blocks (or even multiple physical block devices) with strong checksums.

Perhaps this data should somehow "automagically" sync between all locally available devices, again protected with strong checksums at every step.

(This idea requires some refining.)

vardump

2 days ago

We've thought a fair amount about this. Our approach is the use sqlite on-device. Think about it more as a partially replicated db instead of a cache.

Then locally available devices can compare changelogs and sync only the delta.

No need for a checksum, since you can use monetonically increasing version numbers and CRDTs!

tevon

2 days ago

> No need for a checksum, since you can use monetonically increasing version numbers and CRDTs!

How does that help against random bit flips?

vardump

18 hours ago

I wonder if you could combine it with BitTorrent to get the distributed nature

hahn-kev

2 days ago

You definitely can! We're using kademlia for sync and discovery, which works quite well

tevon

2 days ago

Extremely curious what kinds of data your app would produce such that it outgrows the memory available on a single client device. I have about 2 person-decades' worth of writing, art, and coding that occupies less than 10 GB (and could probably be made smaller).

ForHackernews

2 days ago

I'm not talking about media content, rather database records. PKM apps host in-process relational queries, the records must fit in working memory and query engine must traverse the working set indexes to return answers "instantly"

dustingetz

2 days ago

Again, I guess I'm struggling to imagine what kind of database your app would need that doesn't fit in a 20 Mb sqlite file? What are all these jillions of records? You're talking about full-text indexing?

ForHackernews

2 days ago

PKM apps are trees of strings! It's fast until its not. Even if you can sync the global dataset to the device storage, the query engine needs data in process memory and still has to traverse it with device levels of compute not cloud compute, "instantly" i.e. without making the UI feel sluggish. If you feel otherwise, use Roam or Tana for a year, even in single-player mode. The entire category is bottlenecked on this scale problem. And now add team support, because you want to sell this to teams and make money, right? Designing for casual, personal-sized datasets is a viable architecture in very few apps. Google Maps is one shining counterpoint, because the content has a natural locality to it – you only need to sync content near where you are geographically!

dustingetz

2 days ago

I think you’re misunderstanding the overall architecture here. Instead of syncing the whole tree of strings, the way you would generally represent a PKM with Yjs is to make each logical document a Yjs document (especially given the assumption that offline periods are short.)

You could still build a server-side search index over those documents, which never needs to be sent to the client.

paulgb

2 days ago

this gives up relational query, i.e. you knowledge graph is no longer a graph. Notion, Roam and Tana all require relational query. What real world app category are you attempting to model that matches document structure? If the domain can be modeled as topic documents and the documents are small, like an individual google doc, sure this set of constraints may be useful. But that does not match PKM!

dustingetz

2 days ago

It doesn’t give it up, it just moves that aspect to the server. The server can still build an index over the documents in whatever way it likes, perform expensive queries, and only send the results to the client.

An example that matches that document structure is Figma; each document is individually small enough to be synced with the client, document metadata is indexed on the server, and queries over documents take place on the server.

paulgb

2 days ago

This (moving relational ops to server) does not match the definition of local first provided in the article and gives up most of the value prop the article enumerates in the conclusion. I agree that Figma is a candidate to be implemented primarily as a document CRDT, same as the google doc example i already provided.

dustingetz

2 days ago

Well, the problem an index over multiple documents solves for is also not present in the application presented by the article. The plan for an individual trip (or even a lifetime of trips for most people) is not going to exceed a size that can be handled and indexed on the client.

paulgb

2 days ago

You're right, the approach totally works for PKM apps with less than, say, 1 person*year of data, which is literally the first sentence I wrote at the top of this thread. (But – there are a lot of architectures that work with casual datasets! Like, store everything in a text file. Or fork bitcoin and run a full node on device, for that matter. What are we trying to accomplish here?)

dustingetz

2 days ago

Yes, but you described it as a fatal problem. My point is:

- this app didn’t need fancy graph querying, so didn’t have to implement it.

- if it did, there’s a natural way to extend this approach to support it.

paulgb

2 days ago

> And now add team support, because you want to sell this to teams and make money, right?

I mean, I don't, personally. I'm writing a couple small apps to scratch my own itches and I might sell them to anyone else who wants an individual copy for personal use.

Remember when you could just buy a copy of a program and use it on your own computer? And it would never get updated to remove functionality or break because some servers were shut down? That's the experience I'm seeking from local-first software.

I think designing for casual, personal-sized data is extremely easy if you give up the idea that every program needs to be some bloated Enterprise-Ready junkware.

ForHackernews

2 days ago

Ok, then this constraint (casual, personal-sized data) should be the headline as the entire architecture is downstream of it

dustingetz

2 days ago

Sure, if you are so committed to the quantified self that you are producing hundreds of megabytes of valuable data every day, then maybe it's impractical for you to keep it all on devices with mere terabytes of local storage and only 32 GB of RAM.

You are Google's dream user. :)

ForHackernews

2 days ago

Ink & Switch is like the Medici family of the Internet era. Medici's funded the piano, and I&S is funding local-first. I love what they do.

xrd

2 days ago

Who is funding them?

ilrwbwrkhv

2 days ago

Lots of ex-twitter and heroku people. They are really thoughtful and community based and are not in silicon valley physically so they have broken out of the groupthink. I met quite a few at the last Strange Loop and it's a really terrific group.

xrd

2 days ago

Interesting. We definitely need more groups like them. This would be the right way of moving tech forward.

ilrwbwrkhv

a day ago

Would be perfect is somehow it could work without S3. Would be awesome if the internet could just work in p2p mode out of the box, just some JS and HTML and you have 2 computers talking to each other collaborating on a doc without the need of a server (or S3)

tkiolp4

2 days ago

You often still want a server somewhere to relay messages. Imagine I type something in my computer at home and turn the device off before leaving the house. Then on the road I pull up the document on my phone. I should see the latest version of the document on my phone. However, there were no moments where both devices were online at the same time.

For this to work, my home computer needs to upload the changes somewhere my phone can access them. For example, a home server or a dumb box in the cloud.

It’s very difficult to make this work without a server kicking around somewhere. So long as the server is fungible (it can easily be replaced for basically any other server), I don’t really see the problem with keeping a server around to relay messages.

josephg

2 days ago

This is great! I was quite excited to see Ink & Switch’s Embark and now this…

Jake makes creating a local-first multiplayer app seem so simple.

com

2 days ago

Is there some version of local-first that doesn't require a webserver, but does seamlessly sync state to a consumer cloud service like Google Drive? I'd love to write apps that have all the speed and portability of local apps, but the data isn't tied to a specific device. It seems like it would be feasible to have a large JSON blog background synced to a cloud file service after some threshold of accumulated change or time.

xnx

2 days ago

I built my habit tool app using Google Drive (app-spesific directory), it can sync between device, though it has a bug (in my code) sometimes cannot sync some data so I have to resync everything from scratch. No server needed.

Currently android only, don't have ios dev subscription https://play.google.com/store/apps/details?id=com.yedev.habi...

yunusefendi52

2 days ago

https://remotestorage.io while it's a specific protocol for storing user data on a compatible server, their library also provides Google Drive integration

chris_pie

2 days ago

There's already been some talk of using consumer cloud providers. One shortcoming however is that to work well you have to duplicate the users' data for each client app, so you'll consume x times more storage space of the user. This is fine for apps with little data, but is impractical for other apps.

felipefar

2 days ago

Not a file storage but https://github.com/git-bug/git-bug push and sync with any git remote. There is a generic data structure you can use to build your conflict-free type.

michaelmure

2 days ago

On an unrelated note, I'm getting local-first case studies everytime I leave a wifi space since my cell phone is still suffering from the Verizon outage (Los Angeles area). My conclusion has been I can read stuff like calendar invites (Google), I can save notes on stuff to do in Trello, but I cannot queue up IMs to send in GHC.

er4hn

2 days ago

I kinda doubt GHC stands for GitHub Comments but if it does I just gotta say thank goodness GitHub started saving comments in local cache so when you accidentally pull to refresh you don't delete your whole comment. Idk when they added this but must have been in the past 6 months or so. Same applies if you try to submit a comment and then are offline.

raybb

2 days ago

Awesome stuff. Reminds me of the offline-first movement from ~10 years ago.

I'm currently looking into TinyBase to make working with high latency decentralised services more bearable.

Would be cool if there were a better comparison of the different solutions for storage and sync, with features, good use-cases, etc.

k__

2 days ago

The future is local-first. IF you haven't yet learned about it and how to break-out of the expensive cloud/serverless cage, this is a good start: https://localfirstweb.dev/

mentalgear

2 days ago

I've been hanging local-first circles but haven't made to switch to write anything with it yet. Is there a typical stack people recommend to get started? I'm not really sure where to start, especially in terms of backend.

NeutralForest

2 days ago

I’m biased as one of the authors of Y-Sweet, but I think Jake’s choice of Yjs + Y-Sweet is as good as any to start with. We’ve worked a lot at making the localhost environment easy to get started with, and Yjs is the de facto CRDT with the most written about it.

https://github.com/jamsocket/y-sweet

paulgb

2 days ago

I'd like to start with something that's more DB oriented like SQLite both locally and in the browser for example but I understand a documented-oriented approach might make more sense.

NeutralForest

2 days ago

RealmDB is sort of the goto. They were acquired by mongo which could be a negative signal but the team is solid and I haven't seen it result it much degredation.

EDIT: actually... it looks like mongo may have just announced the EOL for server-side component a couple weeks ago... bad timing!

JamesSwift

2 days ago

You'd probably enjoy Triplit then--especially if you're using Typescript.

https://triplit.dev

matlin

2 days ago

I'm excited by these explorations of dynamic components in rich text. Notion has popularized the idea of documents with rich blocks, where the blocks can provide dynamic behavior to traditional documents. And now we're seeing types of inline elements that also provide more structure to rich text. Those location routes seem something that I'd use myself.

felipefar

2 days ago

This is awesome! Since you’re using CRDTs, do you have any plans to make it collaborative? I would find it useful to build an itinerary with multiple people

RobCodeSlayer

2 days ago

It is collaborative! Start a document and send someone else the link :)

jakelazaroff

2 days ago

This is a gorgeous little app, and I'm really happy to see someone working on rich linking of map data and content - it feels like a really under-developed area of the whole modern maps ecosystem

swiftcoder

2 days ago

I've been pretty impressed with https://roadtrippers.com for this purpose. (Not local-first, I guess)

foobarbecue

2 days ago

Great write-up, I've been looking for a solution like this for adding syncing to my local-app!

I love that it's document stored in S3, and it's probably going to be way cheaper than if hosted elsewhere in a database. Can't wait to try it out soon

catchmeifyoucan

2 days ago

That’s a nice font

tobyhinloopen

2 days ago