admin/goyomi

Fork 0

Files

T

Achmad 3741f4f696 change perms

2026-05-11 06:48:23 +00:00

22 KiB

Executable File

Raw Permalink Blame History

Plan: Port Tachiyomi Extensions to Go

Reference Projects

extensions-source: /Users/achmad/Documents/Belajar/Android/extensions-source
- lib-multisrc/ — base source implementations (Phase 3)
- src/all/, src/en/ — standalone sources (Phase 4)
- core/src/ — CatalogueSource/HttpSource interface contracts (Phase 1)
Suwayomi-Server: /Users/achmad/Documents/Belajar/Web/Suwayomi-Server
- server/src/main/kotlin/suwayomi/tachidesk/manga/model/table/ — DB schema reference (Phase 2)
- server/src/main/kotlin/suwayomi/tachidesk/server/database/migration/ — migration patterns (Phase 2)
- server/src/main/kotlin/suwayomi/tachidesk/manga/api/ — API route patterns (Phase 5)

Context

Suwayomi-Server currently loads manga source extensions by downloading APKs, converting DEX bytecode to JARs, and instantiating Kotlin classes at runtime via reflection. The goal is a standalone Go service that reimplements all source logic, persists fetched data into PostgreSQL, and exposes a REST API for any consumer to query. The Go service is self-contained — no dependency on the JVM project.

Two rendering modes:

Direct HTTP: REST/JSON API sources — plain net/http + JSON unmarshal
FlareSolverr: Cloudflare-protected or JS-rendered sites — POST to FlareSolverr, get rendered HTML, parse with goquery

New Project: `tachiyomi-go`

Separate Go module. Uses PostgreSQL (same DB the Docker Compose in this repo already sets up).

Directory Structure

tachiyomi-go/
├── cmd/server/main.go
├── internal/
│   ├── source/
│   │   ├── types.go            # SManga, SChapter, Page, MangasPage, Filter
│   │   └── interfaces.go       # Source, CatalogueSource interfaces
│   ├── httpclient/
│   │   ├── client.go           # Base HTTP client, cookie jar, per-host rate limiter
│   │   ├── flaresolverr.go     # FlareSolverr integration
│   │   ├── graphql.go          # GraphQL POST helper
│   │   └── headers.go          # Common header builders
│   ├── parser/
│   │   └── html.go             # goquery helper wrappers
│   ├── db/
│   │   ├── db.go               # pgx pool init, migration runner
│   │   ├── queries/            # sqlc-generated query files
│   │   │   ├── manga.sql.go
│   │   │   ├── chapter.sql.go
│   │   │   ├── page.sql.go
│   │   │   └── source.sql.go
│   │   └── migrations/
│   │       ├── 001_init.sql
│   │       └── ...
│   └── registry/
│       └── registry.go         # Global source map, init-time registration
├── sources/
│   ├── base/                   # ~67 base source implementations
│   └── all/ + en/              # ~562 standalone source implementations
└── api/
    └── handler.go

Phase 1: Core Framework

1.1 Data Types (`internal/source/types.go`)

type SManga struct {
    URL          string
    Title        string
    Artist       string
    Author       string
    Description  string
    Genre        string  // comma-separated
    Status       int     // 0=unknown,1=ongoing,2=completed,3=licensed,5=hiatus,6=cancelled
    ThumbnailURL string
    Initialized  bool
}

type SChapter struct {
    URL           string
    Name          string
    DateUpload    int64   // unix milliseconds
    ChapterNumber float32
    Scanlator     string
}

type Page struct {
    Index    int
    URL      string
    ImageURL string
}

type MangasPage struct {
    Mangas      []SManga
    HasNextPage bool
}

Filter types: SelectFilter, TextFilter, CheckboxFilter, TriStateFilter, GroupFilter, SortFilter.

1.2 Source Interfaces (`internal/source/interfaces.go`)

type CatalogueSource interface {
    ID() int64
    Name() string
    Lang() string
    SupportsLatest() bool
    GetPopularManga(page int) (MangasPage, error)
    GetLatestUpdates(page int) (MangasPage, error)
    GetSearchManga(page int, query string, filters []Filter) (MangasPage, error)
    GetMangaDetails(manga SManga) (SManga, error)
    GetChapterList(manga SManga) ([]SChapter, error)
    GetPageList(chapter SChapter) ([]Page, error)
    GetImageURL(page Page) (string, error)
    GetFilterList() []Filter
}

ID Generation

Source IDs use the same formula as Tachiyomi/Suwayomi HttpSource.generateId:

key  = strings.ToLower(name) + "/" + lang + "/" + strconv.Itoa(versionId)
hash = MD5(key)                          // 16 bytes
id   = first 8 bytes as big-endian int64
id  &= math.MaxInt64                     // clear sign bit (Long.MAX_VALUE mask)

Default versionId is 1. The earlier note about Java's String.hashCode() in the original plan was incorrect — the authoritative source is HttpSource.kt in Suwayomi-Server.

1.3 HTTP Client (`internal/httpclient/client.go`)

Per-host rate limiting with golang.org/x/time/rate
Persistent cookie jar per source instance
Configurable timeout, user-agent, referer
Transparent retry on 429 (honor Retry-After header)

1.4 FlareSolverr (`internal/httpclient/flaresolverr.go`)

POST {"cmd":"request.get","url":"...","maxTimeout":60000} to /v1. Extract solution.response (HTML) and solution.cookies. After first clearance, reuse cookies via normal HTTP client — only re-invoke FlareSolverr on 403.

1.5 GraphQL Helper (`internal/httpclient/graphql.go`)

Only used internally to call upstream sources that expose GraphQL APIs (mangahub, senkuro, allanime, luscious, stashapp). Our own API is REST.

type GraphQLRequest struct {
    Query     string `json:"query"`
    Variables any    `json:"variables"`
}

func Post(ctx context.Context, client *http.Client, url string, req GraphQLRequest, headers map[string]string) (*http.Response, error)

1.6 HTML Parser (`internal/parser/html.go`)

Thin wrappers over github.com/PuerkitoBio/goquery (Go equivalent of JSoup):

func Parse(html string) (*goquery.Document, error)
func Select(doc *goquery.Document, css string) *goquery.Selection
func SelectFrom(sel *goquery.Selection, css string) *goquery.Selection
func Attr(sel *goquery.Selection, name string) string
func AbsURL(sel *goquery.Selection, attr string, baseURL string) string
func OwnText(sel *goquery.Selection) string
func TextTrim(sel *goquery.Selection) string
func First(sel *goquery.Selection) *goquery.Selection

1.7 Registry (`internal/registry/registry.go`)

var mu sync.RWMutex
var sources = map[int64]source.CatalogueSource{}

func Register(s source.CatalogueSource)
func Get(id int64) (source.CatalogueSource, bool)
func All() []source.CatalogueSource

Each source package calls registry.Register(NewMySource()) in its init() function. All source packages are blank-imported in cmd/server/main.go so their init() runs at startup.

Phase 2: Database Layer

2.1 Decision: New Schema Compatible with Suwayomi-Server

Suwayomi-Server uses either H2 (embedded) or PostgreSQL via Exposed ORM. The existing table structure (MangaTable, ChapterTable, PageTable, SourceTable) is a good reference. We adapt it for Go/PostgreSQL with a compatible schema so data can be shared if both services point to the same DB.

Key differences from Suwayomi-Server schema:

No ExtensionTable (sources are compiled in, not loaded from APKs)
SourceTable has no extension FK; sources are identified by their built-in ID
Add fetched_at timestamps on manga list results for cache invalidation
Use BIGSERIAL primary keys where Suwayomi uses IntIdTable

2.2 Schema (`internal/db/migrations/001_init.sql`)

CREATE TABLE sources (
    id          BIGINT PRIMARY KEY,   -- generated from name+lang same as Tachiyomi: abs(("$name/" + lang + "/1").hashCode())
    name        VARCHAR(128) NOT NULL,
    lang        VARCHAR(32)  NOT NULL,
    is_nsfw     BOOLEAN      NOT NULL DEFAULT FALSE
);

CREATE TABLE manga (
    id                      SERIAL PRIMARY KEY,
    source_id               BIGINT       NOT NULL REFERENCES sources(id),
    url                     VARCHAR(2048) NOT NULL,
    title                   VARCHAR(512)  NOT NULL,
    initialized             BOOLEAN       NOT NULL DEFAULT FALSE,
    artist                  TEXT,
    author                  TEXT,
    description             TEXT,
    genre                   TEXT,                    -- comma-separated
    status                  INTEGER       NOT NULL DEFAULT 0,
    thumbnail_url           VARCHAR(2048),
    thumbnail_last_fetched  BIGINT        NOT NULL DEFAULT 0,
    in_library              BOOLEAN       NOT NULL DEFAULT FALSE,
    in_library_at           BIGINT        NOT NULL DEFAULT 0,
    real_url                VARCHAR(2048),
    last_fetched_at         BIGINT        NOT NULL DEFAULT 0,
    chapters_last_fetched_at BIGINT       NOT NULL DEFAULT 0,
    update_strategy         VARCHAR(64)   NOT NULL DEFAULT 'ALWAYS_UPDATE',
    UNIQUE (source_id, url)
);

CREATE TABLE chapters (
    id              SERIAL PRIMARY KEY,
    manga_id        INTEGER       NOT NULL REFERENCES manga(id) ON DELETE CASCADE,
    url             VARCHAR(2048) NOT NULL,
    name            VARCHAR(512)  NOT NULL,
    date_upload     BIGINT        NOT NULL DEFAULT 0,
    chapter_number  REAL          NOT NULL DEFAULT -1,
    scanlator       VARCHAR(256),
    source_order    INTEGER       NOT NULL,
    is_read         BOOLEAN       NOT NULL DEFAULT FALSE,
    is_bookmarked   BOOLEAN       NOT NULL DEFAULT FALSE,
    last_page_read  INTEGER       NOT NULL DEFAULT 0,
    last_read_at    BIGINT        NOT NULL DEFAULT 0,
    fetched_at      BIGINT        NOT NULL DEFAULT 0,
    real_url        VARCHAR(2048),
    is_downloaded   BOOLEAN       NOT NULL DEFAULT FALSE,
    page_count      INTEGER       NOT NULL DEFAULT -1,
    UNIQUE (manga_id, url)
);

CREATE TABLE pages (
    id          SERIAL PRIMARY KEY,
    chapter_id  INTEGER       NOT NULL REFERENCES chapters(id) ON DELETE CASCADE,
    "index"     INTEGER       NOT NULL,
    url         VARCHAR(2048) NOT NULL,
    image_url   TEXT
);

CREATE TABLE source_meta (
    source_id   BIGINT       NOT NULL REFERENCES sources(id),
    key         VARCHAR(256) NOT NULL,
    value       TEXT         NOT NULL,
    PRIMARY KEY (source_id, key)
);

2.3 Tooling

Driver: github.com/jackc/pgx/v5 (pgx native, no database/sql overhead)
Query gen: sqlc — write SQL queries in .sql files, generate type-safe Go functions
Migrations: golang-migrate/migrate with pgx driver, runs on startup
Connection pool: pgxpool.Pool with configurable max conns

2.4 Data Flow

API call → source fetches data → upsert into DB → return from API.

Manga list (GetPopularManga): Upsert each SManga into manga (on conflict source_id, url update title/thumbnail/status). Update last_fetched_at. Return from DB.

Manga detail (GetMangaDetails): Fetch full detail from source, upsert all fields into manga, set initialized=true.

Chapter list (GetChapterList): Upsert each SChapter into chapters (on conflict manga_id, url update name/date/chapter_number). Update chapters_last_fetched_at on manga row.

Page list (GetPageList): Insert pages into pages (skip if already present). If source requires a second call to resolve image URLs (GetImageURL), store resolved image_url.

Cache: If last_fetched_at is within TTL (configurable, default 10 min for lists, 1h for details), serve from DB without hitting the source. TTL bypassed by ?refresh=true.

Phase 3: Base Source Implementations

Group A — WordPress/CMS HTML Scrapers

Base	Endpoint pattern	CF?
madara	POST `{base}/wp-admin/admin-ajax.php` (list), GET `{url}` (detail/chapters)	Yes
mangathemesia	GET `{base}/{dir}/?page={n}`, GET `{url}`	Yes
madtheme	GET `{base}/search?page={n}` (all list types)	Yes
wpcomics	GET `{base}/{popularPath}?page={n}`	Yes
fmreader	GET `{base}/{requestPath}?page={n}&sort=...`	Yes
mmrcms	GET `{base}/filterList?page={n}&sortBy=views`, POST `{base}/advSearchFilter`	No
mangareader	GET `{base}/?page={n}&type={t}`	Yes
zmanga	GET `{base}/advanced-search/page/{n}/?order=popular`	Yes
mangaworld	GET `{base}/archive?sort=most_read&page={n}`	Yes
grouple	GET `{base}/list?sortType=rate&offset={50*(n-1)}`	No
foolslide	GET `{base}/directory/{n}/` + JSON chapter API	No
liliana	GET `{base}/ranking/week/{n}`	Yes
scanreader	GET `{base}/bibliotheque/page/{n}/?sort=views`	No
gigaviewer	GET `{base}/series` (all at once, no pagination)	Yes
Others (mangawork, manga18, manhwaz, masonry, multichan, sinmh, etc.)	Various HTML GET	Most Yes

All Group A bases use goquery selectors. Each has a config struct of overridable CSS selectors. FlareSolverr used when CF=Yes.

Group B — JSON REST API Sources

Base	Key endpoints	CF?	Auth
heancms	`GET {api}/series?page={n}`, `GET {api}/chapter/query?series_slug={s}`	No	None
iken	`GET {api}/comic?order=view&page={n}`	Yes	CF cookies
hentaihand	`GET {base}/api/comics?page={n}&order_by=...`	No	None
pizzareader	`GET {api}/comics`, `GET {api}/comics/{slug}`	Yes	None
gmanga	`GET {base}/api/releases?page={n}`, `POST {base}/api/mangas/search`	No	None
spicytheme	`GET {base}/api/...`	Yes	None
zeistmanga	Blogger Feeds JSON API	Yes	None
mccms	REST JSON	Yes	None
kemono	`GET {base}/api/v1/creators`, `GET {base}/api/v1/{service}/{creator}/posts`	Yes	None
lectormoe	REST JSON	Yes	Token
libgroup	`GET {api}/api/latest-updates`, `GET {api}/api/auth/me`	Yes	WebView token → use FlareSolverr to obtain
mangabox	REST JSON	No	None
mangadventure	REST JSON	No	None
ezmanhwa	REST JSON	No	None
monochrome	REST JSON	No	None

Group C — GraphQL Sources

Base	Endpoint	Notes
mangahub	POST `{api}/graphql`	Cookie `mhub_access` acquired via intermediate GET to a random chapter URL
senkuro	POST `{api}/graphql`	API domain configurable via preferences

Group D — Special/Unique Sources

Base	Pattern	Gotcha
mangotheme	JSON list + XOR/AES-encrypted page URLs	Implement page URL decryption in Go; key embedded in JS
mmlook	JSON + encrypted pages + CF	Page decryption + FlareSolverr
guya	`GET {base}/api/get_all_series/` (all manga at once)	No pagination; scanlation group filter in response
bakkin	Single JSON URL, no list/search	Enumerate from object keys
gigaviewer	All series in one page HTML	Client-side filter only; latest = same request

Phase 4: Standalone Sources — Notable Gotchas

`all/mangadex`

Complex filter system: tags (AND/OR modes), demographics, content rating, publication status, sort
Cover art comes from a separate covers relationship — make a second API call or include via includes[]=cover_art
Chapter language filtering: only fetch translatedLanguage[]=en (or user-configured)
Rate limit: 5 req/s global, stricter for search
at-home server URL for pages: GET /at-home/server/{chapterId} returns CDN base URL; pages = {baseUrl}/{quality}/{hash}/{filename}

`all/nhentaicom`

Cloudflare-protected
Tag/artist/character search with prefix syntax: tag:isekai artist:...
Pages come from a JSON blob embedded in the HTML (JSON.parse(document.getElementById('...')))

`all/komga`

Self-hosted; user must configure base URL + credentials (Basic Auth)
Series = manga, Books = chapters, Pages from book API
Supports CBZ/PDF libraries — page URL is a direct book page endpoint

`all/e621`

Pools = manga (collections of posts), Posts = pages
Basic Auth required for higher rate limits and adult content
Nested tag exclusion (e.g. rating:s) needs proper encoding

`all/kemono`

Cloudflare-protected
Service + creator = manga (e.g. Patreon/PixivFANBOX creator)
Posts = chapters; attachments/files within a post = pages
File URLs may be relative to {base}/data

`all/danbooru`

Tag-based search (up to 2 tags for free accounts)
Gold/Platinum tier content only accessible with credentials
Pools as manga, pool posts as pages

`all/pixiv`

Session cookie (PHPSESSID) auth — no public API key
Illust series = manga; user illustrations = chapters
Multi-tier image URLs: thumb_mini, small, regular, original — must use correct Referer header or get 403
R18 content requires age-verified account

`all/luscious` (GraphQL)

GraphQL POST to /graphql/playground
Albums = manga, Pictures = pages
Adult content; account may be required for some content

`all/mangaplus`

Official Shueisha app API
Uses protobuf OR a JSON endpoint (/api/title_detail?title_id={id})
Page URLs are encrypted/obfuscated: each image URL requires a key from the chapter response to XOR-decrypt the actual URL
Viewer is web-only for some titles

`all/stashapp`

Self-hosted; configure base URL
GraphQL API

`en/allanime` (GraphQL)

Complex query variables: translationType, countryOrigin, search payload
Episode-based (chapters are episodes); episodeString used as chapter number
CDN for pages uses multiple quality tiers

`en/asurascans`

Cloudflare-protected
Discord-gated content warnings on some chapters (just HTML, parseable)
Chapter pages embedded as <img> in a protected div — selector varies by site layout version

`en/mangafire`

React/NextJS SSR; some data in JSON embedded in <script id="__NEXT_DATA__"> tag
Extract and parse the JSON blob rather than scraping DOM

`en/webtoons`

Official API with Sec-Webtoon-Client-Data HMAC header — compute HMAC-SHA256 over {url}_{timestamp} with a fixed key baked into the app
Mobile API differs from web API; use web API for broader access

`en/mangadraft`

NextJS SSR; data in __NEXT_DATA__ JSON blob
Authentication state affects available chapters

`en/globalcomix`

Paginated REST API
Issue-based chapters; page images behind signed CDN URLs with short expiry — don't cache image URLs

`en/bookwalker`

WebView/JS required for page rendering (DRM-protected)
Only metadata (title, cover, description) is accessible without purchase — mark all pages as unavailable or skip GetPageList

Phase 5: HTTP API (REST)

The service exposes a plain REST API over HTTP. No GraphQL. (Note: some upstream manga sources internally use GraphQL — those are handled by the internal/httpclient/graphql.go helper when calling their APIs, but our API is always REST JSON.)

GET  /api/sources                           → [{id, name, lang, supportsLatest, isNsfw}]
GET  /api/sources/{id}/popular?page=1       → {mangas:[...], hasNextPage:bool}  — from DB if cached
GET  /api/sources/{id}/latest?page=1        → {mangas:[...], hasNextPage:bool}
GET  /api/sources/{id}/search?q=&page=1     → {mangas:[...], hasNextPage:bool}
GET  /api/sources/{id}/filters              → [{type, name, values:[...]}]

GET  /api/sources/{id}/manga?url={url}      → {id, title, author, ...}  — fetches detail + upserts
GET  /api/sources/{id}/manga?url={url}/chapters  → [{url, name, chapterNumber, ...}]
GET  /api/sources/{id}/manga?url={url}/chapters/{chapterUrl}/pages → [{index, url, imageUrl}]

GET  /api/manga/{id}                        → full manga row from DB
GET  /api/manga/{id}/chapters               → chapter list from DB
GET  /api/chapters/{id}                     → chapter row from DB
GET  /api/chapters/{id}/pages               → page list from DB

GET  /api/image?url={encoded}&source_id={id} → proxied image bytes, correct Content-Type header

Query param ?refresh=true bypasses TTL and forces a re-fetch from the source.

All errors: {"error": "message"} with appropriate HTTP status code.

Dependencies (Go)

github.com/PuerkitoBio/goquery    # HTML parsing (JSoup equivalent)
golang.org/x/time/rate            # Per-host rate limiting
github.com/go-chi/chi/v5          # HTTP router
github.com/jackc/pgx/v5           # PostgreSQL driver + pool
github.com/golang-migrate/migrate  # DB migrations
github.com/sqlc-dev/sqlc          # SQL → Go code generation (dev dependency)
encoding/json                     # stdlib
net/http                          # stdlib

Implementation Order

internal/source/ — types + interfaces
internal/httpclient/ — client, FlareSolverr, GraphQL helper
internal/parser/html.go
internal/db/ — schema, migrations, pgx pool, sqlc queries
internal/registry/registry.go
api/handler.go + cmd/server/main.go
Bases (simple → complex):
- heancms → iken → hentaihand → pizzareader → gmanga (JSON)
- keyoapp → wpcomics → fmreader → madtheme (HTML)
- madara → mangathemesia (complex AJAX)
- mangahub + senkuro (GraphQL)
- mangotheme + mmlook (encryption)
- libgroup (WebView auth via FlareSolverr)
- remaining bases alphabetically
Standalone all/ — JSON-API first (mangadex, kemono, e621, komga), then CF/HTML
Standalone en/ — JSON-API first, then CF/HTML
Wire DB upserts into all source call paths

Verification

GET /api/sources lists all registered sources
GET /api/sources/{heancms_id}/popular?page=1 returns ≥1 manga, data persisted in manga table
GET /api/sources/{heancms_id}/popular?page=1 second call served from DB (no HTTP call to source, verify via logs)
POST /api/sources/{id}/chapters returns chapters for a known URL, persisted in chapters table
POST /api/sources/{madara_id}/pages resolves image URLs via FlareSolverr path
GET /api/image?url=... proxies correctly with right Content-Type
?refresh=true forces re-fetch and updates DB records
Run psql against the DB and confirm rows in manga, chapters, pages tables after API calls

22 KiB Executable File Raw Permalink Blame History

Plan: Port Tachiyomi Extensions to Go

Reference Projects

Context

New Project: tachiyomi-go

Directory Structure

Phase 1: Core Framework

1.1 Data Types (internal/source/types.go)

1.2 Source Interfaces (internal/source/interfaces.go)

ID Generation

1.3 HTTP Client (internal/httpclient/client.go)

1.4 FlareSolverr (internal/httpclient/flaresolverr.go)

1.5 GraphQL Helper (internal/httpclient/graphql.go)

1.6 HTML Parser (internal/parser/html.go)

1.7 Registry (internal/registry/registry.go)

Phase 2: Database Layer

2.1 Decision: New Schema Compatible with Suwayomi-Server

2.2 Schema (internal/db/migrations/001_init.sql)

2.3 Tooling

2.4 Data Flow

Phase 3: Base Source Implementations

Group A — WordPress/CMS HTML Scrapers

Group B — JSON REST API Sources

Group C — GraphQL Sources

Group D — Special/Unique Sources

Phase 4: Standalone Sources — Notable Gotchas

all/mangadex

all/nhentaicom

all/komga

all/e621

all/kemono

all/danbooru

all/pixiv

all/luscious (GraphQL)

all/mangaplus

all/stashapp

en/allanime (GraphQL)

en/asurascans

en/mangafire

en/webtoons

en/mangadraft

en/globalcomix

en/bookwalker