# Plan: Port Tachiyomi Extensions to Go ## Reference Projects - **extensions-source**: `/Users/achmad/Documents/Belajar/Android/extensions-source` - `lib-multisrc/` — base source implementations (Phase 3) - `src/all/`, `src/en/` — standalone sources (Phase 4) - `core/src/` — CatalogueSource/HttpSource interface contracts (Phase 1) - **Suwayomi-Server**: `/Users/achmad/Documents/Belajar/Web/Suwayomi-Server` - `server/src/main/kotlin/suwayomi/tachidesk/manga/model/table/` — DB schema reference (Phase 2) - `server/src/main/kotlin/suwayomi/tachidesk/server/database/migration/` — migration patterns (Phase 2) - `server/src/main/kotlin/suwayomi/tachidesk/manga/api/` — API route patterns (Phase 5) --- ## Context Suwayomi-Server currently loads manga source extensions by downloading APKs, converting DEX bytecode to JARs, and instantiating Kotlin classes at runtime via reflection. The goal is a standalone Go service that reimplements all source logic, persists fetched data into PostgreSQL, and exposes a REST API for any consumer to query. The Go service is self-contained — no dependency on the JVM project. Two rendering modes: - **Direct HTTP**: REST/JSON API sources — plain `net/http` + JSON unmarshal - **FlareSolverr**: Cloudflare-protected or JS-rendered sites — POST to FlareSolverr, get rendered HTML, parse with goquery --- ## New Project: `tachiyomi-go` Separate Go module. Uses PostgreSQL (same DB the Docker Compose in this repo already sets up). ### Directory Structure ``` tachiyomi-go/ ├── cmd/server/main.go ├── internal/ │ ├── source/ │ │ ├── types.go # SManga, SChapter, Page, MangasPage, Filter │ │ └── interfaces.go # Source, CatalogueSource interfaces │ ├── httpclient/ │ │ ├── client.go # Base HTTP client, cookie jar, per-host rate limiter │ │ ├── flaresolverr.go # FlareSolverr integration │ │ ├── graphql.go # GraphQL POST helper │ │ └── headers.go # Common header builders │ ├── parser/ │ │ └── html.go # goquery helper wrappers │ ├── db/ │ │ ├── db.go # pgx pool init, migration runner │ │ ├── queries/ # sqlc-generated query files │ │ │ ├── manga.sql.go │ │ │ ├── chapter.sql.go │ │ │ ├── page.sql.go │ │ │ └── source.sql.go │ │ └── migrations/ │ │ ├── 001_init.sql │ │ └── ... │ └── registry/ │ └── registry.go # Global source map, init-time registration ├── sources/ │ ├── base/ # ~67 base source implementations │ └── all/ + en/ # ~562 standalone source implementations └── api/ └── handler.go ``` --- ## Phase 1: Core Framework ### 1.1 Data Types (`internal/source/types.go`) ```go type SManga struct { URL string Title string Artist string Author string Description string Genre string // comma-separated Status int // 0=unknown,1=ongoing,2=completed,3=licensed,5=hiatus,6=cancelled ThumbnailURL string Initialized bool } type SChapter struct { URL string Name string DateUpload int64 // unix milliseconds ChapterNumber float32 Scanlator string } type Page struct { Index int URL string ImageURL string } type MangasPage struct { Mangas []SManga HasNextPage bool } ``` Filter types: `SelectFilter`, `TextFilter`, `CheckboxFilter`, `TriStateFilter`, `GroupFilter`, `SortFilter`. ### 1.2 Source Interfaces (`internal/source/interfaces.go`) ```go type CatalogueSource interface { ID() int64 Name() string Lang() string SupportsLatest() bool GetPopularManga(page int) (MangasPage, error) GetLatestUpdates(page int) (MangasPage, error) GetSearchManga(page int, query string, filters []Filter) (MangasPage, error) GetMangaDetails(manga SManga) (SManga, error) GetChapterList(manga SManga) ([]SChapter, error) GetPageList(chapter SChapter) ([]Page, error) GetImageURL(page Page) (string, error) GetFilterList() []Filter } ``` #### ID Generation Source IDs use the same formula as Tachiyomi/Suwayomi `HttpSource.generateId`: ``` key = strings.ToLower(name) + "/" + lang + "/" + strconv.Itoa(versionId) hash = MD5(key) // 16 bytes id = first 8 bytes as big-endian int64 id &= math.MaxInt64 // clear sign bit (Long.MAX_VALUE mask) ``` Default `versionId` is 1. The earlier note about Java's `String.hashCode()` in the original plan was incorrect — the authoritative source is `HttpSource.kt` in Suwayomi-Server. ### 1.3 HTTP Client (`internal/httpclient/client.go`) - Per-host rate limiting with `golang.org/x/time/rate` - Persistent cookie jar per source instance - Configurable timeout, user-agent, referer - Transparent retry on 429 (honor Retry-After header) ### 1.4 FlareSolverr (`internal/httpclient/flaresolverr.go`) POST `{"cmd":"request.get","url":"...","maxTimeout":60000}` to `/v1`. Extract `solution.response` (HTML) and `solution.cookies`. After first clearance, reuse cookies via normal HTTP client — only re-invoke FlareSolverr on 403. ### 1.5 GraphQL Helper (`internal/httpclient/graphql.go`) Only used internally to call upstream sources that expose GraphQL APIs (mangahub, senkuro, allanime, luscious, stashapp). Our own API is REST. ```go type GraphQLRequest struct { Query string `json:"query"` Variables any `json:"variables"` } func Post(ctx context.Context, client *http.Client, url string, req GraphQLRequest, headers map[string]string) (*http.Response, error) ``` ### 1.6 HTML Parser (`internal/parser/html.go`) Thin wrappers over `github.com/PuerkitoBio/goquery` (Go equivalent of JSoup): ```go func Parse(html string) (*goquery.Document, error) func Select(doc *goquery.Document, css string) *goquery.Selection func SelectFrom(sel *goquery.Selection, css string) *goquery.Selection func Attr(sel *goquery.Selection, name string) string func AbsURL(sel *goquery.Selection, attr string, baseURL string) string func OwnText(sel *goquery.Selection) string func TextTrim(sel *goquery.Selection) string func First(sel *goquery.Selection) *goquery.Selection ``` ### 1.7 Registry (`internal/registry/registry.go`) ```go var mu sync.RWMutex var sources = map[int64]source.CatalogueSource{} func Register(s source.CatalogueSource) func Get(id int64) (source.CatalogueSource, bool) func All() []source.CatalogueSource ``` Each source package calls `registry.Register(NewMySource())` in its `init()` function. All source packages are blank-imported in `cmd/server/main.go` so their `init()` runs at startup. --- ## Phase 2: Database Layer ### 2.1 Decision: New Schema Compatible with Suwayomi-Server Suwayomi-Server uses either H2 (embedded) or PostgreSQL via Exposed ORM. The existing table structure (MangaTable, ChapterTable, PageTable, SourceTable) is a good reference. We adapt it for Go/PostgreSQL with a **compatible schema** so data can be shared if both services point to the same DB. Key differences from Suwayomi-Server schema: - No `ExtensionTable` (sources are compiled in, not loaded from APKs) - `SourceTable` has no `extension` FK; sources are identified by their built-in ID - Add `fetched_at` timestamps on manga list results for cache invalidation - Use `BIGSERIAL` primary keys where Suwayomi uses `IntIdTable` ### 2.2 Schema (`internal/db/migrations/001_init.sql`) ```sql CREATE TABLE sources ( id BIGINT PRIMARY KEY, -- generated from name+lang same as Tachiyomi: abs(("$name/" + lang + "/1").hashCode()) name VARCHAR(128) NOT NULL, lang VARCHAR(32) NOT NULL, is_nsfw BOOLEAN NOT NULL DEFAULT FALSE ); CREATE TABLE manga ( id SERIAL PRIMARY KEY, source_id BIGINT NOT NULL REFERENCES sources(id), url VARCHAR(2048) NOT NULL, title VARCHAR(512) NOT NULL, initialized BOOLEAN NOT NULL DEFAULT FALSE, artist TEXT, author TEXT, description TEXT, genre TEXT, -- comma-separated status INTEGER NOT NULL DEFAULT 0, thumbnail_url VARCHAR(2048), thumbnail_last_fetched BIGINT NOT NULL DEFAULT 0, in_library BOOLEAN NOT NULL DEFAULT FALSE, in_library_at BIGINT NOT NULL DEFAULT 0, real_url VARCHAR(2048), last_fetched_at BIGINT NOT NULL DEFAULT 0, chapters_last_fetched_at BIGINT NOT NULL DEFAULT 0, update_strategy VARCHAR(64) NOT NULL DEFAULT 'ALWAYS_UPDATE', UNIQUE (source_id, url) ); CREATE TABLE chapters ( id SERIAL PRIMARY KEY, manga_id INTEGER NOT NULL REFERENCES manga(id) ON DELETE CASCADE, url VARCHAR(2048) NOT NULL, name VARCHAR(512) NOT NULL, date_upload BIGINT NOT NULL DEFAULT 0, chapter_number REAL NOT NULL DEFAULT -1, scanlator VARCHAR(256), source_order INTEGER NOT NULL, is_read BOOLEAN NOT NULL DEFAULT FALSE, is_bookmarked BOOLEAN NOT NULL DEFAULT FALSE, last_page_read INTEGER NOT NULL DEFAULT 0, last_read_at BIGINT NOT NULL DEFAULT 0, fetched_at BIGINT NOT NULL DEFAULT 0, real_url VARCHAR(2048), is_downloaded BOOLEAN NOT NULL DEFAULT FALSE, page_count INTEGER NOT NULL DEFAULT -1, UNIQUE (manga_id, url) ); CREATE TABLE pages ( id SERIAL PRIMARY KEY, chapter_id INTEGER NOT NULL REFERENCES chapters(id) ON DELETE CASCADE, "index" INTEGER NOT NULL, url VARCHAR(2048) NOT NULL, image_url TEXT ); CREATE TABLE source_meta ( source_id BIGINT NOT NULL REFERENCES sources(id), key VARCHAR(256) NOT NULL, value TEXT NOT NULL, PRIMARY KEY (source_id, key) ); ``` ### 2.3 Tooling - **Driver**: `github.com/jackc/pgx/v5` (pgx native, no database/sql overhead) - **Query gen**: `sqlc` — write SQL queries in `.sql` files, generate type-safe Go functions - **Migrations**: `golang-migrate/migrate` with `pgx` driver, runs on startup - **Connection pool**: `pgxpool.Pool` with configurable max conns ### 2.4 Data Flow API call → source fetches data → **upsert into DB** → return from API. **Manga list** (`GetPopularManga`): Upsert each SManga into `manga` (on conflict `source_id, url` update title/thumbnail/status). Update `last_fetched_at`. Return from DB. **Manga detail** (`GetMangaDetails`): Fetch full detail from source, upsert all fields into `manga`, set `initialized=true`. **Chapter list** (`GetChapterList`): Upsert each SChapter into `chapters` (on conflict `manga_id, url` update name/date/chapter_number). Update `chapters_last_fetched_at` on manga row. **Page list** (`GetPageList`): Insert pages into `pages` (skip if already present). If source requires a second call to resolve image URLs (`GetImageURL`), store resolved `image_url`. **Cache**: If `last_fetched_at` is within TTL (configurable, default 10 min for lists, 1h for details), serve from DB without hitting the source. TTL bypassed by `?refresh=true`. --- ## Phase 3: Base Source Implementations ### Group A — WordPress/CMS HTML Scrapers | Base | Endpoint pattern | CF? | |------|-----------------|-----| | **madara** | POST `{base}/wp-admin/admin-ajax.php` (list), GET `{url}` (detail/chapters) | Yes | | **mangathemesia** | GET `{base}/{dir}/?page={n}`, GET `{url}` | Yes | | **madtheme** | GET `{base}/search?page={n}` (all list types) | Yes | | **wpcomics** | GET `{base}/{popularPath}?page={n}` | Yes | | **fmreader** | GET `{base}/{requestPath}?page={n}&sort=...` | Yes | | **mmrcms** | GET `{base}/filterList?page={n}&sortBy=views`, POST `{base}/advSearchFilter` | No | | **mangareader** | GET `{base}/?page={n}&type={t}` | Yes | | **zmanga** | GET `{base}/advanced-search/page/{n}/?order=popular` | Yes | | **mangaworld** | GET `{base}/archive?sort=most_read&page={n}` | Yes | | **grouple** | GET `{base}/list?sortType=rate&offset={50*(n-1)}` | No | | **foolslide** | GET `{base}/directory/{n}/` + JSON chapter API | No | | **liliana** | GET `{base}/ranking/week/{n}` | Yes | | **scanreader** | GET `{base}/bibliotheque/page/{n}/?sort=views` | No | | **gigaviewer** | GET `{base}/series` (all at once, no pagination) | Yes | | Others (mangawork, manga18, manhwaz, masonry, multichan, sinmh, etc.) | Various HTML GET | Most Yes | All Group A bases use goquery selectors. Each has a config struct of overridable CSS selectors. FlareSolverr used when CF=Yes. ### Group B — JSON REST API Sources | Base | Key endpoints | CF? | Auth | |------|--------------|-----|------| | **heancms** | `GET {api}/series?page={n}`, `GET {api}/chapter/query?series_slug={s}` | No | None | | **iken** | `GET {api}/comic?order=view&page={n}` | Yes | CF cookies | | **hentaihand** | `GET {base}/api/comics?page={n}&order_by=...` | No | None | | **pizzareader** | `GET {api}/comics`, `GET {api}/comics/{slug}` | Yes | None | | **gmanga** | `GET {base}/api/releases?page={n}`, `POST {base}/api/mangas/search` | No | None | | **spicytheme** | `GET {base}/api/...` | Yes | None | | **zeistmanga** | Blogger Feeds JSON API | Yes | None | | **mccms** | REST JSON | Yes | None | | **kemono** | `GET {base}/api/v1/creators`, `GET {base}/api/v1/{service}/{creator}/posts` | Yes | None | | **lectormoe** | REST JSON | Yes | Token | | **libgroup** | `GET {api}/api/latest-updates`, `GET {api}/api/auth/me` | Yes | WebView token → use FlareSolverr to obtain | | **mangabox** | REST JSON | No | None | | **mangadventure** | REST JSON | No | None | | **ezmanhwa** | REST JSON | No | None | | **monochrome** | REST JSON | No | None | ### Group C — GraphQL Sources | Base | Endpoint | Notes | |------|----------|-------| | **mangahub** | POST `{api}/graphql` | Cookie `mhub_access` acquired via intermediate GET to a random chapter URL | | **senkuro** | POST `{api}/graphql` | API domain configurable via preferences | ### Group D — Special/Unique Sources | Base | Pattern | Gotcha | |------|---------|--------| | **mangotheme** | JSON list + XOR/AES-encrypted page URLs | Implement page URL decryption in Go; key embedded in JS | | **mmlook** | JSON + encrypted pages + CF | Page decryption + FlareSolverr | | **guya** | `GET {base}/api/get_all_series/` (all manga at once) | No pagination; scanlation group filter in response | | **bakkin** | Single JSON URL, no list/search | Enumerate from object keys | | **gigaviewer** | All series in one page HTML | Client-side filter only; latest = same request | --- ## Phase 4: Standalone Sources — Notable Gotchas ### `all/mangadex` - Complex filter system: tags (AND/OR modes), demographics, content rating, publication status, sort - Cover art comes from a separate `covers` relationship — make a second API call or include via `includes[]=cover_art` - Chapter language filtering: only fetch `translatedLanguage[]=en` (or user-configured) - Rate limit: 5 req/s global, stricter for search - `at-home` server URL for pages: `GET /at-home/server/{chapterId}` returns CDN base URL; pages = `{baseUrl}/{quality}/{hash}/{filename}` ### `all/nhentaicom` - Cloudflare-protected - Tag/artist/character search with prefix syntax: `tag:isekai artist:...` - Pages come from a JSON blob embedded in the HTML (`JSON.parse(document.getElementById('...'))`) ### `all/komga` - Self-hosted; user must configure base URL + credentials (Basic Auth) - Series = manga, Books = chapters, Pages from book API - Supports CBZ/PDF libraries — page URL is a direct book page endpoint ### `all/e621` - Pools = manga (collections of posts), Posts = pages - Basic Auth required for higher rate limits and adult content - Nested tag exclusion (e.g. `rating:s`) needs proper encoding ### `all/kemono` - Cloudflare-protected - Service + creator = manga (e.g. Patreon/PixivFANBOX creator) - Posts = chapters; attachments/files within a post = pages - File URLs may be relative to `{base}/data` ### `all/danbooru` - Tag-based search (up to 2 tags for free accounts) - Gold/Platinum tier content only accessible with credentials - Pools as manga, pool posts as pages ### `all/pixiv` - Session cookie (`PHPSESSID`) auth — no public API key - Illust series = manga; user illustrations = chapters - Multi-tier image URLs: `thumb_mini`, `small`, `regular`, `original` — must use correct Referer header or get 403 - R18 content requires age-verified account ### `all/luscious` (GraphQL) - GraphQL POST to `/graphql/playground` - Albums = manga, Pictures = pages - Adult content; account may be required for some content ### `all/mangaplus` - Official Shueisha app API - Uses protobuf OR a JSON endpoint (`/api/title_detail?title_id={id}`) - Page URLs are encrypted/obfuscated: each image URL requires a key from the chapter response to XOR-decrypt the actual URL - Viewer is web-only for some titles ### `all/stashapp` - Self-hosted; configure base URL - GraphQL API ### `en/allanime` (GraphQL) - Complex query variables: `translationType`, `countryOrigin`, search payload - Episode-based (chapters are episodes); `episodeString` used as chapter number - CDN for pages uses multiple quality tiers ### `en/asurascans` - Cloudflare-protected - Discord-gated content warnings on some chapters (just HTML, parseable) - Chapter pages embedded as `` in a protected div — selector varies by site layout version ### `en/mangafire` - React/NextJS SSR; some data in JSON embedded in `