From 85d2ea6143ff4ee2fd08c4bad87362e0da3ab57d Mon Sep 17 00:00:00 2001 From: achmad Date: Sun, 10 May 2026 21:23:24 +0700 Subject: [PATCH] =?UTF-8?q?feat:=20initial=20Phase=201=20implementation=20?= =?UTF-8?q?=E2=80=94=20core=20framework=20+=20Docker?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Data types (SManga, SChapter, Page, MangasPage, all Filter variants) - Source interfaces (Source, CatalogueSource) with MD5-based ID generation matching Tachiyomi/Suwayomi - HTTP client with per-host rate limiting, cookie jar, and 429 retry - FlareSolverr v1 client (FLARESOLVERR_URL env) - Generic GraphQL POST helper - goquery HTML parser wrappers - Source registry (panics on duplicate ID) - Multi-stage Dockerfile (golang:1.26-alpine + distroless) and compose.yml (postgres, flaresolverr, app) --- .gitignore | 17 + Dockerfile | 17 + PLAN.md | 521 ++++++++++++++++++++++ TODO.md | 12 + cmd/server/main.go | 22 + compose.yml | 40 ++ docs/phase1-core-framework.md | 170 +++++++ docs/phase2-database.md | 183 ++++++++ docs/phase3-bases.md | 261 +++++++++++ docs/phase4-standalone.md | 665 ++++++++++++++++++++++++++++ docs/phase5-api.md | 180 ++++++++ go.mod | 13 + go.sum | 73 +++ internal/httpclient/client.go | 147 ++++++ internal/httpclient/flaresolverr.go | 95 ++++ internal/httpclient/graphql.go | 55 +++ internal/httpclient/headers.go | 42 ++ internal/parser/html.go | 67 +++ internal/registry/registry.go | 43 ++ internal/registry/registry_test.go | 39 ++ internal/source/interfaces.go | 59 +++ internal/source/interfaces_test.go | 28 ++ internal/source/types.go | 115 +++++ 23 files changed, 2864 insertions(+) create mode 100644 .gitignore create mode 100644 Dockerfile create mode 100644 PLAN.md create mode 100644 TODO.md create mode 100644 cmd/server/main.go create mode 100644 compose.yml create mode 100644 docs/phase1-core-framework.md create mode 100644 docs/phase2-database.md create mode 100644 docs/phase3-bases.md create mode 100644 docs/phase4-standalone.md create mode 100644 docs/phase5-api.md create mode 100644 go.mod create mode 100644 go.sum create mode 100644 internal/httpclient/client.go create mode 100644 internal/httpclient/flaresolverr.go create mode 100644 internal/httpclient/graphql.go create mode 100644 internal/httpclient/headers.go create mode 100644 internal/parser/html.go create mode 100644 internal/registry/registry.go create mode 100644 internal/registry/registry_test.go create mode 100644 internal/source/interfaces.go create mode 100644 internal/source/interfaces_test.go create mode 100644 internal/source/types.go diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..e390285 --- /dev/null +++ b/.gitignore @@ -0,0 +1,17 @@ +# Binaries +/goyomi +/cmd/server/server + +# Environment +.env +.env.* + +# Go tooling +*.test +*.out +/vendor/ + +# IDE +.idea/ +.vscode/ +*.swp diff --git a/Dockerfile b/Dockerfile new file mode 100644 index 0000000..cf34b3d --- /dev/null +++ b/Dockerfile @@ -0,0 +1,17 @@ +FROM golang:1.26-alpine AS builder + +WORKDIR /app + +COPY go.mod go.sum ./ +RUN go mod download + +COPY . . +RUN CGO_ENABLED=0 GOOS=linux go build -trimpath -ldflags="-s -w" -o /goyomi ./cmd/server + +FROM gcr.io/distroless/static-debian12 + +COPY --from=builder /goyomi /goyomi + +EXPOSE 8080 + +ENTRYPOINT ["/goyomi"] diff --git a/PLAN.md b/PLAN.md new file mode 100644 index 0000000..f64de30 --- /dev/null +++ b/PLAN.md @@ -0,0 +1,521 @@ +# Plan: Port Tachiyomi Extensions to Go + +## Reference Projects + +- **extensions-source**: `/Users/achmad/Documents/Belajar/Android/extensions-source` + - `lib-multisrc/` — base source implementations (Phase 3) + - `src/all/`, `src/en/` — standalone sources (Phase 4) + - `core/src/` — CatalogueSource/HttpSource interface contracts (Phase 1) +- **Suwayomi-Server**: `/Users/achmad/Documents/Belajar/Web/Suwayomi-Server` + - `server/src/main/kotlin/suwayomi/tachidesk/manga/model/table/` — DB schema reference (Phase 2) + - `server/src/main/kotlin/suwayomi/tachidesk/server/database/migration/` — migration patterns (Phase 2) + - `server/src/main/kotlin/suwayomi/tachidesk/manga/api/` — API route patterns (Phase 5) + +--- + +## Context + +Suwayomi-Server currently loads manga source extensions by downloading APKs, converting DEX bytecode to JARs, and instantiating Kotlin classes at runtime via reflection. The goal is a standalone Go service that reimplements all source logic, persists fetched data into PostgreSQL, and exposes a REST API for any consumer to query. The Go service is self-contained — no dependency on the JVM project. + +Two rendering modes: +- **Direct HTTP**: REST/JSON API sources — plain `net/http` + JSON unmarshal +- **FlareSolverr**: Cloudflare-protected or JS-rendered sites — POST to FlareSolverr, get rendered HTML, parse with goquery + +--- + +## New Project: `tachiyomi-go` + +Separate Go module. Uses PostgreSQL (same DB the Docker Compose in this repo already sets up). + +### Directory Structure + +``` +tachiyomi-go/ +├── cmd/server/main.go +├── internal/ +│ ├── source/ +│ │ ├── types.go # SManga, SChapter, Page, MangasPage, Filter +│ │ └── interfaces.go # Source, CatalogueSource interfaces +│ ├── httpclient/ +│ │ ├── client.go # Base HTTP client, cookie jar, per-host rate limiter +│ │ ├── flaresolverr.go # FlareSolverr integration +│ │ ├── graphql.go # GraphQL POST helper +│ │ └── headers.go # Common header builders +│ ├── parser/ +│ │ └── html.go # goquery helper wrappers +│ ├── db/ +│ │ ├── db.go # pgx pool init, migration runner +│ │ ├── queries/ # sqlc-generated query files +│ │ │ ├── manga.sql.go +│ │ │ ├── chapter.sql.go +│ │ │ ├── page.sql.go +│ │ │ └── source.sql.go +│ │ └── migrations/ +│ │ ├── 001_init.sql +│ │ └── ... +│ └── registry/ +│ └── registry.go # Global source map, init-time registration +├── sources/ +│ ├── base/ # ~67 base source implementations +│ └── all/ + en/ # ~562 standalone source implementations +└── api/ + └── handler.go +``` + +--- + +## Phase 1: Core Framework + +### 1.1 Data Types (`internal/source/types.go`) + +```go +type SManga struct { + URL string + Title string + Artist string + Author string + Description string + Genre string // comma-separated + Status int // 0=unknown,1=ongoing,2=completed,3=licensed,5=hiatus,6=cancelled + ThumbnailURL string + Initialized bool +} + +type SChapter struct { + URL string + Name string + DateUpload int64 // unix milliseconds + ChapterNumber float32 + Scanlator string +} + +type Page struct { + Index int + URL string + ImageURL string +} + +type MangasPage struct { + Mangas []SManga + HasNextPage bool +} +``` + +Filter types: `SelectFilter`, `TextFilter`, `CheckboxFilter`, `TriStateFilter`, `GroupFilter`, `SortFilter`. + +### 1.2 Source Interfaces (`internal/source/interfaces.go`) + +```go +type CatalogueSource interface { + ID() int64 + Name() string + Lang() string + SupportsLatest() bool + GetPopularManga(page int) (MangasPage, error) + GetLatestUpdates(page int) (MangasPage, error) + GetSearchManga(page int, query string, filters []Filter) (MangasPage, error) + GetMangaDetails(manga SManga) (SManga, error) + GetChapterList(manga SManga) ([]SChapter, error) + GetPageList(chapter SChapter) ([]Page, error) + GetImageURL(page Page) (string, error) + GetFilterList() []Filter +} +``` + +#### ID Generation + +Source IDs use the same formula as Tachiyomi/Suwayomi `HttpSource.generateId`: + +``` +key = strings.ToLower(name) + "/" + lang + "/" + strconv.Itoa(versionId) +hash = MD5(key) // 16 bytes +id = first 8 bytes as big-endian int64 +id &= math.MaxInt64 // clear sign bit (Long.MAX_VALUE mask) +``` + +Default `versionId` is 1. The earlier note about Java's `String.hashCode()` in the original plan was incorrect — the authoritative source is `HttpSource.kt` in Suwayomi-Server. + +### 1.3 HTTP Client (`internal/httpclient/client.go`) + +- Per-host rate limiting with `golang.org/x/time/rate` +- Persistent cookie jar per source instance +- Configurable timeout, user-agent, referer +- Transparent retry on 429 (honor Retry-After header) + +### 1.4 FlareSolverr (`internal/httpclient/flaresolverr.go`) + +POST `{"cmd":"request.get","url":"...","maxTimeout":60000}` to `/v1`. Extract `solution.response` (HTML) and `solution.cookies`. After first clearance, reuse cookies via normal HTTP client — only re-invoke FlareSolverr on 403. + +### 1.5 GraphQL Helper (`internal/httpclient/graphql.go`) + +Only used internally to call upstream sources that expose GraphQL APIs (mangahub, senkuro, allanime, luscious, stashapp). Our own API is REST. + +```go +type GraphQLRequest struct { + Query string `json:"query"` + Variables any `json:"variables"` +} + +func Post(ctx context.Context, client *http.Client, url string, req GraphQLRequest, headers map[string]string) (*http.Response, error) +``` + +### 1.6 HTML Parser (`internal/parser/html.go`) + +Thin wrappers over `github.com/PuerkitoBio/goquery` (Go equivalent of JSoup): + +```go +func Parse(html string) (*goquery.Document, error) +func Select(doc *goquery.Document, css string) *goquery.Selection +func SelectFrom(sel *goquery.Selection, css string) *goquery.Selection +func Attr(sel *goquery.Selection, name string) string +func AbsURL(sel *goquery.Selection, attr string, baseURL string) string +func OwnText(sel *goquery.Selection) string +func TextTrim(sel *goquery.Selection) string +func First(sel *goquery.Selection) *goquery.Selection +``` + +### 1.7 Registry (`internal/registry/registry.go`) + +```go +var mu sync.RWMutex +var sources = map[int64]source.CatalogueSource{} + +func Register(s source.CatalogueSource) +func Get(id int64) (source.CatalogueSource, bool) +func All() []source.CatalogueSource +``` + +Each source package calls `registry.Register(NewMySource())` in its `init()` function. All source packages are blank-imported in `cmd/server/main.go` so their `init()` runs at startup. + +--- + +## Phase 2: Database Layer + +### 2.1 Decision: New Schema Compatible with Suwayomi-Server + +Suwayomi-Server uses either H2 (embedded) or PostgreSQL via Exposed ORM. The existing table structure (MangaTable, ChapterTable, PageTable, SourceTable) is a good reference. We adapt it for Go/PostgreSQL with a **compatible schema** so data can be shared if both services point to the same DB. + +Key differences from Suwayomi-Server schema: +- No `ExtensionTable` (sources are compiled in, not loaded from APKs) +- `SourceTable` has no `extension` FK; sources are identified by their built-in ID +- Add `fetched_at` timestamps on manga list results for cache invalidation +- Use `BIGSERIAL` primary keys where Suwayomi uses `IntIdTable` + +### 2.2 Schema (`internal/db/migrations/001_init.sql`) + +```sql +CREATE TABLE sources ( + id BIGINT PRIMARY KEY, -- generated from name+lang same as Tachiyomi: abs(("$name/" + lang + "/1").hashCode()) + name VARCHAR(128) NOT NULL, + lang VARCHAR(32) NOT NULL, + is_nsfw BOOLEAN NOT NULL DEFAULT FALSE +); + +CREATE TABLE manga ( + id SERIAL PRIMARY KEY, + source_id BIGINT NOT NULL REFERENCES sources(id), + url VARCHAR(2048) NOT NULL, + title VARCHAR(512) NOT NULL, + initialized BOOLEAN NOT NULL DEFAULT FALSE, + artist TEXT, + author TEXT, + description TEXT, + genre TEXT, -- comma-separated + status INTEGER NOT NULL DEFAULT 0, + thumbnail_url VARCHAR(2048), + thumbnail_last_fetched BIGINT NOT NULL DEFAULT 0, + in_library BOOLEAN NOT NULL DEFAULT FALSE, + in_library_at BIGINT NOT NULL DEFAULT 0, + real_url VARCHAR(2048), + last_fetched_at BIGINT NOT NULL DEFAULT 0, + chapters_last_fetched_at BIGINT NOT NULL DEFAULT 0, + update_strategy VARCHAR(64) NOT NULL DEFAULT 'ALWAYS_UPDATE', + UNIQUE (source_id, url) +); + +CREATE TABLE chapters ( + id SERIAL PRIMARY KEY, + manga_id INTEGER NOT NULL REFERENCES manga(id) ON DELETE CASCADE, + url VARCHAR(2048) NOT NULL, + name VARCHAR(512) NOT NULL, + date_upload BIGINT NOT NULL DEFAULT 0, + chapter_number REAL NOT NULL DEFAULT -1, + scanlator VARCHAR(256), + source_order INTEGER NOT NULL, + is_read BOOLEAN NOT NULL DEFAULT FALSE, + is_bookmarked BOOLEAN NOT NULL DEFAULT FALSE, + last_page_read INTEGER NOT NULL DEFAULT 0, + last_read_at BIGINT NOT NULL DEFAULT 0, + fetched_at BIGINT NOT NULL DEFAULT 0, + real_url VARCHAR(2048), + is_downloaded BOOLEAN NOT NULL DEFAULT FALSE, + page_count INTEGER NOT NULL DEFAULT -1, + UNIQUE (manga_id, url) +); + +CREATE TABLE pages ( + id SERIAL PRIMARY KEY, + chapter_id INTEGER NOT NULL REFERENCES chapters(id) ON DELETE CASCADE, + "index" INTEGER NOT NULL, + url VARCHAR(2048) NOT NULL, + image_url TEXT +); + +CREATE TABLE source_meta ( + source_id BIGINT NOT NULL REFERENCES sources(id), + key VARCHAR(256) NOT NULL, + value TEXT NOT NULL, + PRIMARY KEY (source_id, key) +); +``` + +### 2.3 Tooling + +- **Driver**: `github.com/jackc/pgx/v5` (pgx native, no database/sql overhead) +- **Query gen**: `sqlc` — write SQL queries in `.sql` files, generate type-safe Go functions +- **Migrations**: `golang-migrate/migrate` with `pgx` driver, runs on startup +- **Connection pool**: `pgxpool.Pool` with configurable max conns + +### 2.4 Data Flow + +API call → source fetches data → **upsert into DB** → return from API. + +**Manga list** (`GetPopularManga`): Upsert each SManga into `manga` (on conflict `source_id, url` update title/thumbnail/status). Update `last_fetched_at`. Return from DB. + +**Manga detail** (`GetMangaDetails`): Fetch full detail from source, upsert all fields into `manga`, set `initialized=true`. + +**Chapter list** (`GetChapterList`): Upsert each SChapter into `chapters` (on conflict `manga_id, url` update name/date/chapter_number). Update `chapters_last_fetched_at` on manga row. + +**Page list** (`GetPageList`): Insert pages into `pages` (skip if already present). If source requires a second call to resolve image URLs (`GetImageURL`), store resolved `image_url`. + +**Cache**: If `last_fetched_at` is within TTL (configurable, default 10 min for lists, 1h for details), serve from DB without hitting the source. TTL bypassed by `?refresh=true`. + +--- + +## Phase 3: Base Source Implementations + +### Group A — WordPress/CMS HTML Scrapers + +| Base | Endpoint pattern | CF? | +|------|-----------------|-----| +| **madara** | POST `{base}/wp-admin/admin-ajax.php` (list), GET `{url}` (detail/chapters) | Yes | +| **mangathemesia** | GET `{base}/{dir}/?page={n}`, GET `{url}` | Yes | +| **madtheme** | GET `{base}/search?page={n}` (all list types) | Yes | +| **wpcomics** | GET `{base}/{popularPath}?page={n}` | Yes | +| **fmreader** | GET `{base}/{requestPath}?page={n}&sort=...` | Yes | +| **mmrcms** | GET `{base}/filterList?page={n}&sortBy=views`, POST `{base}/advSearchFilter` | No | +| **mangareader** | GET `{base}/?page={n}&type={t}` | Yes | +| **zmanga** | GET `{base}/advanced-search/page/{n}/?order=popular` | Yes | +| **mangaworld** | GET `{base}/archive?sort=most_read&page={n}` | Yes | +| **grouple** | GET `{base}/list?sortType=rate&offset={50*(n-1)}` | No | +| **foolslide** | GET `{base}/directory/{n}/` + JSON chapter API | No | +| **liliana** | GET `{base}/ranking/week/{n}` | Yes | +| **scanreader** | GET `{base}/bibliotheque/page/{n}/?sort=views` | No | +| **gigaviewer** | GET `{base}/series` (all at once, no pagination) | Yes | +| Others (mangawork, manga18, manhwaz, masonry, multichan, sinmh, etc.) | Various HTML GET | Most Yes | + +All Group A bases use goquery selectors. Each has a config struct of overridable CSS selectors. FlareSolverr used when CF=Yes. + +### Group B — JSON REST API Sources + +| Base | Key endpoints | CF? | Auth | +|------|--------------|-----|------| +| **heancms** | `GET {api}/series?page={n}`, `GET {api}/chapter/query?series_slug={s}` | No | None | +| **iken** | `GET {api}/comic?order=view&page={n}` | Yes | CF cookies | +| **hentaihand** | `GET {base}/api/comics?page={n}&order_by=...` | No | None | +| **pizzareader** | `GET {api}/comics`, `GET {api}/comics/{slug}` | Yes | None | +| **gmanga** | `GET {base}/api/releases?page={n}`, `POST {base}/api/mangas/search` | No | None | +| **spicytheme** | `GET {base}/api/...` | Yes | None | +| **zeistmanga** | Blogger Feeds JSON API | Yes | None | +| **mccms** | REST JSON | Yes | None | +| **kemono** | `GET {base}/api/v1/creators`, `GET {base}/api/v1/{service}/{creator}/posts` | Yes | None | +| **lectormoe** | REST JSON | Yes | Token | +| **libgroup** | `GET {api}/api/latest-updates`, `GET {api}/api/auth/me` | Yes | WebView token → use FlareSolverr to obtain | +| **mangabox** | REST JSON | No | None | +| **mangadventure** | REST JSON | No | None | +| **ezmanhwa** | REST JSON | No | None | +| **monochrome** | REST JSON | No | None | + +### Group C — GraphQL Sources + +| Base | Endpoint | Notes | +|------|----------|-------| +| **mangahub** | POST `{api}/graphql` | Cookie `mhub_access` acquired via intermediate GET to a random chapter URL | +| **senkuro** | POST `{api}/graphql` | API domain configurable via preferences | + +### Group D — Special/Unique Sources + +| Base | Pattern | Gotcha | +|------|---------|--------| +| **mangotheme** | JSON list + XOR/AES-encrypted page URLs | Implement page URL decryption in Go; key embedded in JS | +| **mmlook** | JSON + encrypted pages + CF | Page decryption + FlareSolverr | +| **guya** | `GET {base}/api/get_all_series/` (all manga at once) | No pagination; scanlation group filter in response | +| **bakkin** | Single JSON URL, no list/search | Enumerate from object keys | +| **gigaviewer** | All series in one page HTML | Client-side filter only; latest = same request | + +--- + +## Phase 4: Standalone Sources — Notable Gotchas + +### `all/mangadex` +- Complex filter system: tags (AND/OR modes), demographics, content rating, publication status, sort +- Cover art comes from a separate `covers` relationship — make a second API call or include via `includes[]=cover_art` +- Chapter language filtering: only fetch `translatedLanguage[]=en` (or user-configured) +- Rate limit: 5 req/s global, stricter for search +- `at-home` server URL for pages: `GET /at-home/server/{chapterId}` returns CDN base URL; pages = `{baseUrl}/{quality}/{hash}/{filename}` + +### `all/nhentaicom` +- Cloudflare-protected +- Tag/artist/character search with prefix syntax: `tag:isekai artist:...` +- Pages come from a JSON blob embedded in the HTML (`JSON.parse(document.getElementById('...'))`) + +### `all/komga` +- Self-hosted; user must configure base URL + credentials (Basic Auth) +- Series = manga, Books = chapters, Pages from book API +- Supports CBZ/PDF libraries — page URL is a direct book page endpoint + +### `all/e621` +- Pools = manga (collections of posts), Posts = pages +- Basic Auth required for higher rate limits and adult content +- Nested tag exclusion (e.g. `rating:s`) needs proper encoding + +### `all/kemono` +- Cloudflare-protected +- Service + creator = manga (e.g. Patreon/PixivFANBOX creator) +- Posts = chapters; attachments/files within a post = pages +- File URLs may be relative to `{base}/data` + +### `all/danbooru` +- Tag-based search (up to 2 tags for free accounts) +- Gold/Platinum tier content only accessible with credentials +- Pools as manga, pool posts as pages + +### `all/pixiv` +- Session cookie (`PHPSESSID`) auth — no public API key +- Illust series = manga; user illustrations = chapters +- Multi-tier image URLs: `thumb_mini`, `small`, `regular`, `original` — must use correct Referer header or get 403 +- R18 content requires age-verified account + +### `all/luscious` (GraphQL) +- GraphQL POST to `/graphql/playground` +- Albums = manga, Pictures = pages +- Adult content; account may be required for some content + +### `all/mangaplus` +- Official Shueisha app API +- Uses protobuf OR a JSON endpoint (`/api/title_detail?title_id={id}`) +- Page URLs are encrypted/obfuscated: each image URL requires a key from the chapter response to XOR-decrypt the actual URL +- Viewer is web-only for some titles + +### `all/stashapp` +- Self-hosted; configure base URL +- GraphQL API + +### `en/allanime` (GraphQL) +- Complex query variables: `translationType`, `countryOrigin`, search payload +- Episode-based (chapters are episodes); `episodeString` used as chapter number +- CDN for pages uses multiple quality tiers + +### `en/asurascans` +- Cloudflare-protected +- Discord-gated content warnings on some chapters (just HTML, parseable) +- Chapter pages embedded as `` in a protected div — selector varies by site layout version + +### `en/mangafire` +- React/NextJS SSR; some data in JSON embedded in `