Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added simple dockerfile and compose #59

Open
wants to merge 39 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
50f764b
Config by environment vars (#5)
devinivy May 11, 2023
71c2ee0
Setup simple migration system (#7)
devinivy May 11, 2023
9eb7186
More convenient access to ops by record type (#6)
devinivy May 11, 2023
2a13e9e
tidy
dholms May 11, 2023
84420cc
add auth
dholms May 11, 2023
42eebc3
readme tweaks
dholms May 11, 2023
b6a588c
update app.bsky.feed.getFeedSkeleton link
May 11, 2023
5bc3a5f
fix small typos
May 11, 2023
ff09bc5
Merge pull request #8 from bluesky-social/tweaks
emilyliu7321 May 11, 2023
bca915f
live tail subscription
dholms May 11, 2023
eeeb032
rm log
dholms May 11, 2023
cacc046
comment out auth check & log firehose output
dholms May 11, 2023
fac68c3
tweak error type
dholms May 11, 2023
f9ccf11
fix dependencies (#9)
aliceisjustplaying May 11, 2023
050047f
add did-web example
dholms May 11, 2023
17ad1e9
add .env.example, additional env var (#10)
aliceisjustplaying May 12, 2023
ceae744
Update README with instructions for the default URL (#11)
aliceisjustplaying May 12, 2023
15ed3be
Update .gitignore (#12)
codegod100 May 12, 2023
c936d5a
moar gitignore
dholms May 12, 2023
bc75338
update lexicons & docs
dholms May 12, 2023
2bff86c
add at:// prefix
dholms May 15, 2023
d93c993
Use limit parameter in post query (#18)
simonft May 15, 2023
531aab4
Fix feed generation cursor split (#16)
Cloudhunter May 15, 2023
285ef14
add build script
dholms May 16, 2023
3606414
Add describeFeedGenerator route + multiple feeds (#19)
dholms May 19, 2023
745023c
Improve .env (#20)
dholms May 19, 2023
2f620bd
Publish script (#21)
dholms May 19, 2023
8d5c1b1
remove publishMany script
dholms May 19, 2023
9395b18
Add temp fix for blobref validation issue (#23)
devinivy May 19, 2023
1b7d6e3
Say what PDS stands for (#29)
joesondow May 24, 2023
51ca4d0
Fix some typos and clean up README. (#25)
alimony May 24, 2023
e849ac7
add listenhost option (#28)
benharri May 24, 2023
cbdac34
Docs and helpers (#38)
dholms May 31, 2023
f02b113
Update tsconfig.json (#37)
madrobby May 31, 2023
e70ea59
add link to community template
dholms Jun 1, 2023
3e4011a
Add MIT license file (#47)
mackuba Jun 12, 2023
f4b8159
fix: handle firehose subscription error reconnect (Close #44) (#46)
yuna0x0 Jun 15, 2023
040801a
added link to a Ruby implementation to the readme (#48)
mackuba Jun 19, 2023
7cb58c3
Added simple dockerfile and compose
jatocode Jul 31, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Whichever port you want to run this on
FEEDGEN_PORT=3000

# Change this to use a different bind address
FEEDGEN_LISTENHOST="localhost"

# Set to something like db.sqlite to store persistently
FEEDGEN_SQLITE_LOCATION=":memory:"

# Don't change unless you're working in a different environment than the primary Bluesky network
FEEDGEN_SUBSCRIPTION_ENDPOINT="wss://bsky.social"

# Set this to the hostname that you intend to run the service at
FEEDGEN_HOSTNAME="example.com"

# Set this to the DID of the account you'll use to publish the feed
# You can find your accounts DID by going to
# https://bsky.social/xrpc/com.atproto.identity.resolveHandle?handle=${YOUR_HANDLE}
FEEDGEN_PUBLISHER_DID="did:plc:abcde...."

# Only use this if you want a service did different from did:web
# FEEDGEN_SERVICE_DID="did:plc:abcde..."

# Delay between reconnect attempts to the firehose subscription endpoint (in milliseconds)
FEEDGEN_SUBSCRIPTION_RECONNECT_DELAY=3000
131 changes: 130 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,130 @@
node_modules
# Logs
logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
lerna-debug.log*
.pnpm-debug.log*

# Diagnostic reports (https://nodejs.org/api/report.html)
report.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json

# Runtime data
pids
*.pid
*.seed
*.pid.lock

# Directory for instrumented libs generated by jscoverage/JSCover
lib-cov

# Coverage directory used by tools like istanbul
coverage
*.lcov

# nyc test coverage
.nyc_output

# Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)
.grunt

# Bower dependency directory (https://bower.io/)
bower_components

# node-waf configuration
.lock-wscript

# Compiled binary addons (https://nodejs.org/api/addons.html)
build/Release

# Dependency directories
node_modules/
jspm_packages/

# Snowpack dependency directory (https://snowpack.dev/)
web_modules/

# TypeScript cache
*.tsbuildinfo

# Optional npm cache directory
.npm

# Optional eslint cache
.eslintcache

# Optional stylelint cache
.stylelintcache

# Microbundle cache
.rpt2_cache/
.rts2_cache_cjs/
.rts2_cache_es/
.rts2_cache_umd/

# Optional REPL history
.node_repl_history

# Output of 'npm pack'
*.tgz

# Yarn Integrity file
.yarn-integrity

# dotenv environment variable files
.env
.env.development.local
.env.test.local
.env.production.local
.env.local

# parcel-bundler cache (https://parceljs.org/)
.cache
.parcel-cache

# Next.js build output
.next
out

# Nuxt.js build / generate output
.nuxt
dist

# Gatsby files
.cache/
# Comment in the public line in if your project uses Gatsby and not Next.js
# https://nextjs.org/blog/next-9-1#public-directory-support
# public

# vuepress build output
.vuepress/dist

# vuepress v2.x temp and cache directory
.temp
.cache

# Docusaurus cache and generated files
.docusaurus

# Serverless directories
.serverless/

# FuseBox cache
.fusebox/

# DynamoDB Local files
.dynamodb/

# TernJS port file
.tern-port

# Stores VSCode versions used for testing VSCode extensions
.vscode-test

# yarn v2
.yarn/cache
.yarn/unplugged
.yarn/build-state.yml
.yarn/install-state.gz
.pnp.*
15 changes: 15 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
FROM node:20-alpine

WORKDIR /app
COPY package.json .

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
COPY package.json .
COPY package.json yarn.lock .


RUN yarn install

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
RUN yarn install
RUN --mount=type=cache,target=/usr/local/share/.cache/yarn/v6 yarn install

target=/root/.npm for npm

https://docs.docker.com/reference/dockerfile/#example-cache-go-packages


# Bare neccesary
COPY src/ ./src
COPY .env .

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
COPY .env .

Some values in the .env file might be secrets, so you should mount them at runtime instead of put them in the built and published image.

Using env_file in the compose file instead.


EXPOSE 3000

# Nu kör vi
CMD ["yarn" , "start"]
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2023 Bluesky PBLLC

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
87 changes: 48 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,56 +1,67 @@
# ATProto Feed Generator

🚧 Work in Progress 🚧

We are actively developing Feed Generator integration into the Bluesky PDS. Though we are reasonably confident about the general shape and interfaces laid out here, these interfaces and implementation details _are_ subject to change.

In the meantime, we've put together this starter kit for devs. It doesn't do everything, but it should be enough to get you familiar with the system & started building!
This is a starter kit for creating ATProto Feed Generators. It's not feature complete, but should give you a good starting ground off of which to build and deploy a feed.

## Overview

Feed Generators are services that provide custom algorithms to users through the AT protocol.
Feed Generators are services that provide custom algorithms to users through the AT Protocol.

They work very simply: the server receives a request from a user's server and returns a list of [post URIs](https://atproto.com/specs/at-uri-scheme) with some optional metadata attached. Those posts are then hydrated into full views by the requesting server and sent back to the client. This route is described in the [`com.atproto.feed.getFeedSkeleton` lexicon](https://github.com/bluesky-social/atproto/blob/custom-feeds/lexicons/app/bsky/feed/getFeedSkeleton.json).
They work very simply: the server receives a request from a user's server and returns a list of [post URIs](https://atproto.com/specs/at-uri-scheme) with some optional metadata attached. Those posts are then hydrated into full views by the requesting server and sent back to the client. This route is described in the [`app.bsky.feed.getFeedSkeleton` lexicon](https://atproto.com/lexicons/app-bsky-feed#appbskyfeedgetfeedskeleton).

A Feed Generator service can host one or more algorithms. The service itself is identified by DID, however each algorithm that it hosts is declared by a record in the repo of the account that created it. For instance feeds offered by Bluesky will likely be declared in `@bsky.app`'s repo. Therefore, a given algorithm is identified by the at-uri of the declaration record. This declaration record includes a pointer to the service's DID along with some profile information for the feed.
A Feed Generator service can host one or more algorithms. The service itself is identified by DID, while each algorithm that it hosts is declared by a record in the repo of the account that created it. For instance, feeds offered by Bluesky will likely be declared in `@bsky.app`'s repo. Therefore, a given algorithm is identified by the at-uri of the declaration record. This declaration record includes a pointer to the service's DID along with some profile information for the feed.

The general flow of providing a custom algorithm to a user is as follows:
- A user requests a feed from their server (PDS) using the at-uri of the declared feed
- The PDS resolves the at-uri and finds the DID doc of the Feed Generator
- The PDS sends a `getFeedSkeleton` request to the service endpoint declared in the Feed Generator's DID doc
- This request is authenticated by a JWT signed by the user's repo signing key
- The Feed Generator returns a skeleton of the feed to the user's PDS
- The PDS hydrates the feed (user info, post contents, aggregates, etc)
- In the future, the PDS will hydrate the feed with the help of an App View, but for now the PDS handles hydration itself
- The PDS hydrates the feed (user info, post contents, aggregates, etc.)
- In the future, the PDS will hydrate the feed with the help of an App View, but for now, the PDS handles hydration itself
- The PDS returns the hydrated feed to the user

To the user this should feel like visiting a page in the app. Once they subscribe, it will appear in their home interface as one of their available feeds.
For users, this should feel like visiting a page in the app. Once they subscribe to a custom algorithm, it will appear in their home interface as one of their available feeds.

## Getting Started

We've setup this simple server with sqlite to store & query data. Feel free to switch this out for whichever database you prefer.
We've set up this simple server with SQLite to store and query data. Feel free to switch this out for whichever database you prefer.

Next you will need to do two things:
Next, you will need to do two things:

- Implement indexing logic in `src/subscription.ts`.
1. Implement indexing logic in `src/subscription.ts`.

This will subscribe to the repo subscription stream on startup, parse events and index them according to your provided logic.

This will subscribe to the repo subscription stream on startup, parse events & index them according to your provided logic.
2. Implement feed generation logic in `src/algos`

- Implement feed generation logic in `src/feed-generation.ts`
For inspiration, we've provided a very simple feed algorithm (`whats-alf`) that returns all posts related to the titular character of the TV show ALF.

The types are in place and you will just need to return something that satisfies the `SkeletonFeedPost[]` type

For inspiration, we've provided a very simple feed algorithm ("whats alf") that returns all posts related to the titular character of the TV show ALF.
You can either edit it or add another algorithm alongside it. The types are in place, and you will just need to return something that satisfies the `SkeletonFeedPost[]` type.

We've taken care of setting this server up with a did:web. However, you're free to switch this out for did:plc if you like - you may want to if you expect this Feed Generator to be long-standing and possibly migrating domains.

Once the custom algorithms feature launches, you'll be able to publish your feed in-app by providing the DID of your service.
### Deploying your feed
Your feed will need to be accessible at the value supplied to the `FEEDGEN_HOSTNAME` environment variable.

The service must be set up to respond to HTTPS queries over port 443.

### Publishing your feed

To publish your feed, go to the script at `scripts/publishFeedGen.ts` and fill in the variables at the top. Examples are included, and some are optional. To publish your feed generator, simply run `yarn publishFeed`.

To update your feed's display data (name, avatar, description, etc.), just update the relevant variables and re-run the script.

After successfully running the script, you should be able to see your feed from within the app, as well as share it by embedding a link in a post (similar to a quote post).

## Running the Server

Install dependencies with `yarn` and then run the server with `yarn start`. This will start the server on port 3000, or what's defined in `.env`. You can then watch the firehose output in the console and access the output of the default custom ALF feed at [http://localhost:3000/xrpc/app.bsky.feed.getFeedSkeleton?feed=at://did:example:alice/app.bsky.feed.generator/whats-alf](http://localhost:3000/xrpc/app.bsky.feed.getFeedSkeleton?feed=at://did:example:alice/app.bsky.feed.generator/whats-alf).

## Some Details

### Skeleton Metadata

The skeleton that a Feed Generator puts together is, in its simplest form, a list of post uris.
The skeleton that a Feed Generator puts together is, in its simplest form, a list of post URIs.

```ts
[
Expand All @@ -60,18 +71,12 @@ The skeleton that a Feed Generator puts together is, in its simplest form, a lis
]
```

However, we include two locations to attach some additional context. Here is the full schema:
However, we include an additional location to attach some context. Here is the full schema:

```ts
type SkeletonItem = {
post: string // post URI

// optional metadata about the thread that this post is in reply to
replyTo?: {
root: string, // reply root URI
parent: string, // reply parent URI
}


// optional reason for inclusion in the feed
// (generally to be displayed in client)
reason?: Reason
Expand All @@ -81,9 +86,8 @@ type SkeletonItem = {
type Reason = ReasonRepost

type ReasonRepost = {
$type: @TODO
by: string // the did of the reposting user
indexedAt: string // the time that the repost took place
$type: 'app.bsky.feed.defs#skeletonReasonRepost'
repost: string // repost URI
}
```

Expand Down Expand Up @@ -111,12 +115,12 @@ const payload = {
}
```

We provide utilities for verifying user JWTs in the `@atproto/xrpc-server` package.
We provide utilities for verifying user JWTs in the `@atproto/xrpc-server` package, and you can see them in action in `src/auth.ts`.

### Pagination
You'll notice that the `getFeedSkeleton` method returns a `cursor` in its response & takes a `cursor` param as input.
You'll notice that the `getFeedSkeleton` method returns a `cursor` in its response and takes a `cursor` param as input.

This cursor is treated as an opaque value & fully at the Feed Generator's discretion. It is simply pased through he PDS directly to & from the client.
This cursor is treated as an opaque value and fully at the Feed Generator's discretion. It is simply passed through the PDS directly to and from the client.

We strongly encourage that the cursor be _unique per feed item_ to prevent unexpected behavior in pagination.

Expand All @@ -127,17 +131,22 @@ We recommend, for instance, a compound cursor with a timestamp + a CID:

How a feed generator fulfills the `getFeedSkeleton` request is completely at their discretion. At the simplest end, a Feed Generator could supply a "feed" that only contains some hardcoded posts.

For most usecases, we recommend subscribing to the firehose at `com.atproto.sync.subscribeRepos`. This websocket will send you every record that is published on the network. Since Feed Generators do not need to provide hydrated posts, you can index as much or as little of the firehose as necessary.
For most use cases, we recommend subscribing to the firehose at `com.atproto.sync.subscribeRepos`. This websocket will send you every record that is published on the network. Since Feed Generators do not need to provide hydrated posts, you can index as much or as little of the firehose as necessary.

Depending on your algorithm, you likely do not need to keep posts around for long. Unless your algorithm is intended to provide "posts you missed" or something similar, you can likely garbage collect any data that is older than 48 hours.

Some examples:

### Reimplementing What's Hot
To reimplement "What's Hot", you may subscribe to the firehose & filter for all posts & likes (ignoring profiles/reposts/follows/etc). You would keep a running tally of likes per post & when a PDS requests a feed, you would send the most recent posts that pass some threshold of likes.
To reimplement "What's Hot", you may subscribe to the firehose and filter for all posts and likes (ignoring profiles/reposts/follows/etc.). You would keep a running tally of likes per post and when a PDS requests a feed, you would send the most recent posts that pass some threshold of likes.

### A Community Feed
You might create a feed for a given community by compiling a list of DIDs within that community & filtering the firehose for all posts from users within that list.
You might create a feed for a given community by compiling a list of DIDs within that community and filtering the firehose for all posts from users within that list.

### A Topical Feed
To implement a topical feed, you might filter the algorithm for posts and pass the post text through some filtering mechanism (an LLM, a keyword matcher, etc) that filters for the topic of your choice.
To implement a topical feed, you might filter the algorithm for posts and pass the post text through some filtering mechanism (an LLM, a keyword matcher, etc.) that filters for the topic of your choice.

## Community Feed Generator Templates

- [Python](https://github.com/MarshalX/bluesky-feed-generator) - [@MarshalX](https://github.com/MarshalX)
- [Ruby](https://github.com/mackuba/bluesky-feeds-rb) - [@mackuba](https://github.com/mackuba)
15 changes: 15 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
gversion: "3.2"
services:

swefeed:
build: .
image: feed
ports:
- "3000:3000"
restart: unless-stopped

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
restart: unless-stopped
restart: unless-stopped
env_file: .env

# Uncomment if you want to use a persitent sqlite database
# volumes:
# - type: bind
# source: ./db.sqlite
# target: /app/db.sqlite

Loading