22 KiB
Executable File
Plan: Port Tachiyomi Extensions to Go
Reference Projects
- extensions-source:
/Users/achmad/Documents/Belajar/Android/extensions-sourcelib-multisrc/— base source implementations (Phase 3)src/all/,src/en/— standalone sources (Phase 4)core/src/— CatalogueSource/HttpSource interface contracts (Phase 1)
- Suwayomi-Server:
/Users/achmad/Documents/Belajar/Web/Suwayomi-Serverserver/src/main/kotlin/suwayomi/tachidesk/manga/model/table/— DB schema reference (Phase 2)server/src/main/kotlin/suwayomi/tachidesk/server/database/migration/— migration patterns (Phase 2)server/src/main/kotlin/suwayomi/tachidesk/manga/api/— API route patterns (Phase 5)
Context
Suwayomi-Server currently loads manga source extensions by downloading APKs, converting DEX bytecode to JARs, and instantiating Kotlin classes at runtime via reflection. The goal is a standalone Go service that reimplements all source logic, persists fetched data into PostgreSQL, and exposes a REST API for any consumer to query. The Go service is self-contained — no dependency on the JVM project.
Two rendering modes:
- Direct HTTP: REST/JSON API sources — plain
net/http+ JSON unmarshal - FlareSolverr: Cloudflare-protected or JS-rendered sites — POST to FlareSolverr, get rendered HTML, parse with goquery
New Project: tachiyomi-go
Separate Go module. Uses PostgreSQL (same DB the Docker Compose in this repo already sets up).
Directory Structure
tachiyomi-go/
├── cmd/server/main.go
├── internal/
│ ├── source/
│ │ ├── types.go # SManga, SChapter, Page, MangasPage, Filter
│ │ └── interfaces.go # Source, CatalogueSource interfaces
│ ├── httpclient/
│ │ ├── client.go # Base HTTP client, cookie jar, per-host rate limiter
│ │ ├── flaresolverr.go # FlareSolverr integration
│ │ ├── graphql.go # GraphQL POST helper
│ │ └── headers.go # Common header builders
│ ├── parser/
│ │ └── html.go # goquery helper wrappers
│ ├── db/
│ │ ├── db.go # pgx pool init, migration runner
│ │ ├── queries/ # sqlc-generated query files
│ │ │ ├── manga.sql.go
│ │ │ ├── chapter.sql.go
│ │ │ ├── page.sql.go
│ │ │ └── source.sql.go
│ │ └── migrations/
│ │ ├── 001_init.sql
│ │ └── ...
│ └── registry/
│ └── registry.go # Global source map, init-time registration
├── sources/
│ ├── base/ # ~67 base source implementations
│ └── all/ + en/ # ~562 standalone source implementations
└── api/
└── handler.go
Phase 1: Core Framework
1.1 Data Types (internal/source/types.go)
type SManga struct {
URL string
Title string
Artist string
Author string
Description string
Genre string // comma-separated
Status int // 0=unknown,1=ongoing,2=completed,3=licensed,5=hiatus,6=cancelled
ThumbnailURL string
Initialized bool
}
type SChapter struct {
URL string
Name string
DateUpload int64 // unix milliseconds
ChapterNumber float32
Scanlator string
}
type Page struct {
Index int
URL string
ImageURL string
}
type MangasPage struct {
Mangas []SManga
HasNextPage bool
}
Filter types: SelectFilter, TextFilter, CheckboxFilter, TriStateFilter, GroupFilter, SortFilter.
1.2 Source Interfaces (internal/source/interfaces.go)
type CatalogueSource interface {
ID() int64
Name() string
Lang() string
SupportsLatest() bool
GetPopularManga(page int) (MangasPage, error)
GetLatestUpdates(page int) (MangasPage, error)
GetSearchManga(page int, query string, filters []Filter) (MangasPage, error)
GetMangaDetails(manga SManga) (SManga, error)
GetChapterList(manga SManga) ([]SChapter, error)
GetPageList(chapter SChapter) ([]Page, error)
GetImageURL(page Page) (string, error)
GetFilterList() []Filter
}
ID Generation
Source IDs use the same formula as Tachiyomi/Suwayomi HttpSource.generateId:
key = strings.ToLower(name) + "/" + lang + "/" + strconv.Itoa(versionId)
hash = MD5(key) // 16 bytes
id = first 8 bytes as big-endian int64
id &= math.MaxInt64 // clear sign bit (Long.MAX_VALUE mask)
Default versionId is 1. The earlier note about Java's String.hashCode() in the original plan was incorrect — the authoritative source is HttpSource.kt in Suwayomi-Server.
1.3 HTTP Client (internal/httpclient/client.go)
- Per-host rate limiting with
golang.org/x/time/rate - Persistent cookie jar per source instance
- Configurable timeout, user-agent, referer
- Transparent retry on 429 (honor Retry-After header)
1.4 FlareSolverr (internal/httpclient/flaresolverr.go)
POST {"cmd":"request.get","url":"...","maxTimeout":60000} to /v1. Extract solution.response (HTML) and solution.cookies. After first clearance, reuse cookies via normal HTTP client — only re-invoke FlareSolverr on 403.
1.5 GraphQL Helper (internal/httpclient/graphql.go)
Only used internally to call upstream sources that expose GraphQL APIs (mangahub, senkuro, allanime, luscious, stashapp). Our own API is REST.
type GraphQLRequest struct {
Query string `json:"query"`
Variables any `json:"variables"`
}
func Post(ctx context.Context, client *http.Client, url string, req GraphQLRequest, headers map[string]string) (*http.Response, error)
1.6 HTML Parser (internal/parser/html.go)
Thin wrappers over github.com/PuerkitoBio/goquery (Go equivalent of JSoup):
func Parse(html string) (*goquery.Document, error)
func Select(doc *goquery.Document, css string) *goquery.Selection
func SelectFrom(sel *goquery.Selection, css string) *goquery.Selection
func Attr(sel *goquery.Selection, name string) string
func AbsURL(sel *goquery.Selection, attr string, baseURL string) string
func OwnText(sel *goquery.Selection) string
func TextTrim(sel *goquery.Selection) string
func First(sel *goquery.Selection) *goquery.Selection
1.7 Registry (internal/registry/registry.go)
var mu sync.RWMutex
var sources = map[int64]source.CatalogueSource{}
func Register(s source.CatalogueSource)
func Get(id int64) (source.CatalogueSource, bool)
func All() []source.CatalogueSource
Each source package calls registry.Register(NewMySource()) in its init() function. All source packages are blank-imported in cmd/server/main.go so their init() runs at startup.
Phase 2: Database Layer
2.1 Decision: New Schema Compatible with Suwayomi-Server
Suwayomi-Server uses either H2 (embedded) or PostgreSQL via Exposed ORM. The existing table structure (MangaTable, ChapterTable, PageTable, SourceTable) is a good reference. We adapt it for Go/PostgreSQL with a compatible schema so data can be shared if both services point to the same DB.
Key differences from Suwayomi-Server schema:
- No
ExtensionTable(sources are compiled in, not loaded from APKs) SourceTablehas noextensionFK; sources are identified by their built-in ID- Add
fetched_attimestamps on manga list results for cache invalidation - Use
BIGSERIALprimary keys where Suwayomi usesIntIdTable
2.2 Schema (internal/db/migrations/001_init.sql)
CREATE TABLE sources (
id BIGINT PRIMARY KEY, -- generated from name+lang same as Tachiyomi: abs(("$name/" + lang + "/1").hashCode())
name VARCHAR(128) NOT NULL,
lang VARCHAR(32) NOT NULL,
is_nsfw BOOLEAN NOT NULL DEFAULT FALSE
);
CREATE TABLE manga (
id SERIAL PRIMARY KEY,
source_id BIGINT NOT NULL REFERENCES sources(id),
url VARCHAR(2048) NOT NULL,
title VARCHAR(512) NOT NULL,
initialized BOOLEAN NOT NULL DEFAULT FALSE,
artist TEXT,
author TEXT,
description TEXT,
genre TEXT, -- comma-separated
status INTEGER NOT NULL DEFAULT 0,
thumbnail_url VARCHAR(2048),
thumbnail_last_fetched BIGINT NOT NULL DEFAULT 0,
in_library BOOLEAN NOT NULL DEFAULT FALSE,
in_library_at BIGINT NOT NULL DEFAULT 0,
real_url VARCHAR(2048),
last_fetched_at BIGINT NOT NULL DEFAULT 0,
chapters_last_fetched_at BIGINT NOT NULL DEFAULT 0,
update_strategy VARCHAR(64) NOT NULL DEFAULT 'ALWAYS_UPDATE',
UNIQUE (source_id, url)
);
CREATE TABLE chapters (
id SERIAL PRIMARY KEY,
manga_id INTEGER NOT NULL REFERENCES manga(id) ON DELETE CASCADE,
url VARCHAR(2048) NOT NULL,
name VARCHAR(512) NOT NULL,
date_upload BIGINT NOT NULL DEFAULT 0,
chapter_number REAL NOT NULL DEFAULT -1,
scanlator VARCHAR(256),
source_order INTEGER NOT NULL,
is_read BOOLEAN NOT NULL DEFAULT FALSE,
is_bookmarked BOOLEAN NOT NULL DEFAULT FALSE,
last_page_read INTEGER NOT NULL DEFAULT 0,
last_read_at BIGINT NOT NULL DEFAULT 0,
fetched_at BIGINT NOT NULL DEFAULT 0,
real_url VARCHAR(2048),
is_downloaded BOOLEAN NOT NULL DEFAULT FALSE,
page_count INTEGER NOT NULL DEFAULT -1,
UNIQUE (manga_id, url)
);
CREATE TABLE pages (
id SERIAL PRIMARY KEY,
chapter_id INTEGER NOT NULL REFERENCES chapters(id) ON DELETE CASCADE,
"index" INTEGER NOT NULL,
url VARCHAR(2048) NOT NULL,
image_url TEXT
);
CREATE TABLE source_meta (
source_id BIGINT NOT NULL REFERENCES sources(id),
key VARCHAR(256) NOT NULL,
value TEXT NOT NULL,
PRIMARY KEY (source_id, key)
);
2.3 Tooling
- Driver:
github.com/jackc/pgx/v5(pgx native, no database/sql overhead) - Query gen:
sqlc— write SQL queries in.sqlfiles, generate type-safe Go functions - Migrations:
golang-migrate/migratewithpgxdriver, runs on startup - Connection pool:
pgxpool.Poolwith configurable max conns
2.4 Data Flow
API call → source fetches data → upsert into DB → return from API.
Manga list (GetPopularManga): Upsert each SManga into manga (on conflict source_id, url update title/thumbnail/status). Update last_fetched_at. Return from DB.
Manga detail (GetMangaDetails): Fetch full detail from source, upsert all fields into manga, set initialized=true.
Chapter list (GetChapterList): Upsert each SChapter into chapters (on conflict manga_id, url update name/date/chapter_number). Update chapters_last_fetched_at on manga row.
Page list (GetPageList): Insert pages into pages (skip if already present). If source requires a second call to resolve image URLs (GetImageURL), store resolved image_url.
Cache: If last_fetched_at is within TTL (configurable, default 10 min for lists, 1h for details), serve from DB without hitting the source. TTL bypassed by ?refresh=true.
Phase 3: Base Source Implementations
Group A — WordPress/CMS HTML Scrapers
| Base | Endpoint pattern | CF? |
|---|---|---|
| madara | POST {base}/wp-admin/admin-ajax.php (list), GET {url} (detail/chapters) |
Yes |
| mangathemesia | GET {base}/{dir}/?page={n}, GET {url} |
Yes |
| madtheme | GET {base}/search?page={n} (all list types) |
Yes |
| wpcomics | GET {base}/{popularPath}?page={n} |
Yes |
| fmreader | GET {base}/{requestPath}?page={n}&sort=... |
Yes |
| mmrcms | GET {base}/filterList?page={n}&sortBy=views, POST {base}/advSearchFilter |
No |
| mangareader | GET {base}/?page={n}&type={t} |
Yes |
| zmanga | GET {base}/advanced-search/page/{n}/?order=popular |
Yes |
| mangaworld | GET {base}/archive?sort=most_read&page={n} |
Yes |
| grouple | GET {base}/list?sortType=rate&offset={50*(n-1)} |
No |
| foolslide | GET {base}/directory/{n}/ + JSON chapter API |
No |
| liliana | GET {base}/ranking/week/{n} |
Yes |
| scanreader | GET {base}/bibliotheque/page/{n}/?sort=views |
No |
| gigaviewer | GET {base}/series (all at once, no pagination) |
Yes |
| Others (mangawork, manga18, manhwaz, masonry, multichan, sinmh, etc.) | Various HTML GET | Most Yes |
All Group A bases use goquery selectors. Each has a config struct of overridable CSS selectors. FlareSolverr used when CF=Yes.
Group B — JSON REST API Sources
| Base | Key endpoints | CF? | Auth |
|---|---|---|---|
| heancms | GET {api}/series?page={n}, GET {api}/chapter/query?series_slug={s} |
No | None |
| iken | GET {api}/comic?order=view&page={n} |
Yes | CF cookies |
| hentaihand | GET {base}/api/comics?page={n}&order_by=... |
No | None |
| pizzareader | GET {api}/comics, GET {api}/comics/{slug} |
Yes | None |
| gmanga | GET {base}/api/releases?page={n}, POST {base}/api/mangas/search |
No | None |
| spicytheme | GET {base}/api/... |
Yes | None |
| zeistmanga | Blogger Feeds JSON API | Yes | None |
| mccms | REST JSON | Yes | None |
| kemono | GET {base}/api/v1/creators, GET {base}/api/v1/{service}/{creator}/posts |
Yes | None |
| lectormoe | REST JSON | Yes | Token |
| libgroup | GET {api}/api/latest-updates, GET {api}/api/auth/me |
Yes | WebView token → use FlareSolverr to obtain |
| mangabox | REST JSON | No | None |
| mangadventure | REST JSON | No | None |
| ezmanhwa | REST JSON | No | None |
| monochrome | REST JSON | No | None |
Group C — GraphQL Sources
| Base | Endpoint | Notes |
|---|---|---|
| mangahub | POST {api}/graphql |
Cookie mhub_access acquired via intermediate GET to a random chapter URL |
| senkuro | POST {api}/graphql |
API domain configurable via preferences |
Group D — Special/Unique Sources
| Base | Pattern | Gotcha |
|---|---|---|
| mangotheme | JSON list + XOR/AES-encrypted page URLs | Implement page URL decryption in Go; key embedded in JS |
| mmlook | JSON + encrypted pages + CF | Page decryption + FlareSolverr |
| guya | GET {base}/api/get_all_series/ (all manga at once) |
No pagination; scanlation group filter in response |
| bakkin | Single JSON URL, no list/search | Enumerate from object keys |
| gigaviewer | All series in one page HTML | Client-side filter only; latest = same request |
Phase 4: Standalone Sources — Notable Gotchas
all/mangadex
- Complex filter system: tags (AND/OR modes), demographics, content rating, publication status, sort
- Cover art comes from a separate
coversrelationship — make a second API call or include viaincludes[]=cover_art - Chapter language filtering: only fetch
translatedLanguage[]=en(or user-configured) - Rate limit: 5 req/s global, stricter for search
at-homeserver URL for pages:GET /at-home/server/{chapterId}returns CDN base URL; pages ={baseUrl}/{quality}/{hash}/{filename}
all/nhentaicom
- Cloudflare-protected
- Tag/artist/character search with prefix syntax:
tag:isekai artist:... - Pages come from a JSON blob embedded in the HTML (
JSON.parse(document.getElementById('...')))
all/komga
- Self-hosted; user must configure base URL + credentials (Basic Auth)
- Series = manga, Books = chapters, Pages from book API
- Supports CBZ/PDF libraries — page URL is a direct book page endpoint
all/e621
- Pools = manga (collections of posts), Posts = pages
- Basic Auth required for higher rate limits and adult content
- Nested tag exclusion (e.g.
rating:s) needs proper encoding
all/kemono
- Cloudflare-protected
- Service + creator = manga (e.g. Patreon/PixivFANBOX creator)
- Posts = chapters; attachments/files within a post = pages
- File URLs may be relative to
{base}/data
all/danbooru
- Tag-based search (up to 2 tags for free accounts)
- Gold/Platinum tier content only accessible with credentials
- Pools as manga, pool posts as pages
all/pixiv
- Session cookie (
PHPSESSID) auth — no public API key - Illust series = manga; user illustrations = chapters
- Multi-tier image URLs:
thumb_mini,small,regular,original— must use correct Referer header or get 403 - R18 content requires age-verified account
all/luscious (GraphQL)
- GraphQL POST to
/graphql/playground - Albums = manga, Pictures = pages
- Adult content; account may be required for some content
all/mangaplus
- Official Shueisha app API
- Uses protobuf OR a JSON endpoint (
/api/title_detail?title_id={id}) - Page URLs are encrypted/obfuscated: each image URL requires a key from the chapter response to XOR-decrypt the actual URL
- Viewer is web-only for some titles
all/stashapp
- Self-hosted; configure base URL
- GraphQL API
en/allanime (GraphQL)
- Complex query variables:
translationType,countryOrigin, search payload - Episode-based (chapters are episodes);
episodeStringused as chapter number - CDN for pages uses multiple quality tiers
en/asurascans
- Cloudflare-protected
- Discord-gated content warnings on some chapters (just HTML, parseable)
- Chapter pages embedded as
<img>in a protected div — selector varies by site layout version
en/mangafire
- React/NextJS SSR; some data in JSON embedded in
<script id="__NEXT_DATA__">tag - Extract and parse the JSON blob rather than scraping DOM
en/webtoons
- Official API with
Sec-Webtoon-Client-DataHMAC header — compute HMAC-SHA256 over{url}_{timestamp}with a fixed key baked into the app - Mobile API differs from web API; use web API for broader access
en/mangadraft
- NextJS SSR; data in
__NEXT_DATA__JSON blob - Authentication state affects available chapters
en/globalcomix
- Paginated REST API
- Issue-based chapters; page images behind signed CDN URLs with short expiry — don't cache image URLs
en/bookwalker
- WebView/JS required for page rendering (DRM-protected)
- Only metadata (title, cover, description) is accessible without purchase — mark all pages as unavailable or skip
GetPageList
Phase 5: HTTP API (REST)
The service exposes a plain REST API over HTTP. No GraphQL. (Note: some upstream manga sources internally use GraphQL — those are handled by the internal/httpclient/graphql.go helper when calling their APIs, but our API is always REST JSON.)
GET /api/sources → [{id, name, lang, supportsLatest, isNsfw}]
GET /api/sources/{id}/popular?page=1 → {mangas:[...], hasNextPage:bool} — from DB if cached
GET /api/sources/{id}/latest?page=1 → {mangas:[...], hasNextPage:bool}
GET /api/sources/{id}/search?q=&page=1 → {mangas:[...], hasNextPage:bool}
GET /api/sources/{id}/filters → [{type, name, values:[...]}]
GET /api/sources/{id}/manga?url={url} → {id, title, author, ...} — fetches detail + upserts
GET /api/sources/{id}/manga?url={url}/chapters → [{url, name, chapterNumber, ...}]
GET /api/sources/{id}/manga?url={url}/chapters/{chapterUrl}/pages → [{index, url, imageUrl}]
GET /api/manga/{id} → full manga row from DB
GET /api/manga/{id}/chapters → chapter list from DB
GET /api/chapters/{id} → chapter row from DB
GET /api/chapters/{id}/pages → page list from DB
GET /api/image?url={encoded}&source_id={id} → proxied image bytes, correct Content-Type header
Query param ?refresh=true bypasses TTL and forces a re-fetch from the source.
All errors: {"error": "message"} with appropriate HTTP status code.
Dependencies (Go)
github.com/PuerkitoBio/goquery # HTML parsing (JSoup equivalent)
golang.org/x/time/rate # Per-host rate limiting
github.com/go-chi/chi/v5 # HTTP router
github.com/jackc/pgx/v5 # PostgreSQL driver + pool
github.com/golang-migrate/migrate # DB migrations
github.com/sqlc-dev/sqlc # SQL → Go code generation (dev dependency)
encoding/json # stdlib
net/http # stdlib
Implementation Order
internal/source/— types + interfacesinternal/httpclient/— client, FlareSolverr, GraphQL helperinternal/parser/html.gointernal/db/— schema, migrations, pgx pool, sqlc queriesinternal/registry/registry.goapi/handler.go+cmd/server/main.go- Bases (simple → complex):
- heancms → iken → hentaihand → pizzareader → gmanga (JSON)
- keyoapp → wpcomics → fmreader → madtheme (HTML)
- madara → mangathemesia (complex AJAX)
- mangahub + senkuro (GraphQL)
- mangotheme + mmlook (encryption)
- libgroup (WebView auth via FlareSolverr)
- remaining bases alphabetically
- Standalone
all/— JSON-API first (mangadex, kemono, e621, komga), then CF/HTML - Standalone
en/— JSON-API first, then CF/HTML - Wire DB upserts into all source call paths
Verification
GET /api/sourceslists all registered sourcesGET /api/sources/{heancms_id}/popular?page=1returns ≥1 manga, data persisted inmangatableGET /api/sources/{heancms_id}/popular?page=1second call served from DB (no HTTP call to source, verify via logs)POST /api/sources/{id}/chaptersreturns chapters for a known URL, persisted inchapterstablePOST /api/sources/{madara_id}/pagesresolves image URLs via FlareSolverr pathGET /api/image?url=...proxies correctly with right Content-Type?refresh=trueforces re-fetch and updates DB records- Run
psqlagainst the DB and confirm rows inmanga,chapters,pagestables after API calls