--- name: pixel-mirror description: | Generate twin pixel-art portraits — how this Agent sees the user, and how it sees itself. Both are imagined as cute human pixel-art characters, inferred from your Memory, Soul, and conversation history. The agent orchestrates end-to-end: hands the user a claim URL for HeyGen sign-in, recalls in parallel, then triggers render once the user is authed. trigger: - user pastes a pixel-mirror skill URL into chat (any host — pixel-mirror.heygen.com, localhost, ngrok, etc.) - user says "show me how you see us" / "draw us as pixel art" / "pixel mirror" / "make our pixel portraits" - user says any variant of "how do you see me" + "make it pixel art" --- # pixel-mirror When you (the Agent) get a `pixel-mirror.heygen.com` URL or a "draw us" request, follow these exact steps. **Do not ask the user any clarifying questions** — the whole point is that you do this from inference. The user is not in the loop until they sign in and see the card. **You are the orchestrator across exactly TWO agent turns.** Order matters: **TURN 1 — your first response to the request:** 1. POST `/init` to get a `claim_url` + `draft_id`. 2. Post the URL to the user as a chat message (use `first_reply` from the /init response, verbatim). 3. **End your turn.** Do nothing else. Don't read memory, don't poll, don't /draft. Just stop and wait for the user. **(between turns — happens in real time, no agent involvement):** the user opens the URL, signs in via OAuth, and lands on the claim page. They then come back to chat and say something — typically "done" / "ok" / "signed in" / "let's go" / or just any short message. **TURN 2 — your response to that user message:** 4. Recall (Step 2 — read memory + tail transcript). 5. Write the descriptions and intros (Steps 3–5). 6. POST `/draft` (Step 6) with the brief. 7. POST `/render` (Step 8). If it returns 425 (user hasn't actually signed in yet), poll `/auth-status` briefly until authed=true, then retry `/render`. 8. Post the rendered share URL to the user (Step 9). **That's it. Two messages from you, total. URL in turn 1, share URL in turn 2.** ## ⚠ Communication contract (NON-NEGOTIABLE) You will send the user **exactly two visible chat messages, no more**: 1. **Turn 1 — claim URL message** (use `first_reply` from `/init` verbatim). 2. **Turn 2 — final share URL message** (prefix with what it is + offer to reuse — e.g. *"✓ Done. Here's how I see us: [URL] (saved — I can use it as the source if you want me to bring it to life next.)"*). **Between those two messages, you must be SILENT.** No commentary, no progress narration, no "I'm reading your memory now", no "I see you ship at HeyGen and own a cat named Klee", no "drafting your description...". The whole point of this skill is that the share is a *surprise reveal* — if you leak what you found during recall, you've ruined the moment before the card is even rendered. Tool calls (file reads, HTTP requests) are fine; **visible chat output is not**. This means specifically: - ❌ Don't print TODO lists summarizing what you'll do. - ❌ Don't echo the contents of memory files you read into chat. - ❌ Don't post "interesting findings" during recall. - ❌ Don't post a "✓ kicking off render (~60s)" status message — silence is the contract. - ✅ The URL message in turn 1 and the final share URL in turn 2. That's the entire user-facing surface. The user is OK waiting ~90 seconds after they ping you back in turn 2; what they're not OK with is you spoiling their own card before they see it. Save it for the reveal. ## What you'll write (cheat sheet) For each of the user and yourself, you produce **two pieces of writing** with different jobs: | field | length | what it does | tone | |---|---|---|---| | `*_description` | ≤1200 chars | fed to the image model to RENDER the portrait | concrete visual nouns: hair, outfit, props, pose, lighting, background | | `intro_*` | array of **3** strings, each ≤500 chars | shown as text BELOW the portrait; viewer can switch between 1/2/3 | narrative bio, lowercase fragments, memey, original phrasings | Same character, two voices. The image-model description tells the painter what to draw. Each intro tells the visitor who this person IS in 3 sentences a friend could screenshot. **`intro_user` and `intro_agent` are now arrays of 3 distinct entries each.** The viewer cycles through them on the share page; the downloaded/tweeted card captures whichever one they're currently on. The 3 entries must each pick a *different* memetic angle from the toolkit (Step 4 below) — same character, same facts, three genuinely different reads. Reuse a noun across entries only if you can't avoid it; don't reuse a *technique*. --- ## Step 1 — Post the claim URL & end your turn This step is the entirety of turn 1. It contains exactly **two actions**: 1. POST `/init` (one HTTP call, no body required). 2. Post `first_reply` from the response to the user as a visible chat message — verbatim. **Then end your turn. Do not call /draft. Do not read any memory yet.** The user opens the URL out-of-band, signs in, and pings you back. Your work resumes when their next message lands (turn 2). > ⚠ **The single most common failure on this skill is forgetting Step 1's URL post.** > Symptom: agent POSTs `/init`, parses the response, and the planner — eager to > "do useful work" — immediately schedules `/draft`, recall, polling, etc., all > *before* posting the URL message. The user sees nothing in chat for 60+ seconds, > never opens the URL, never signs in. The render never fires. **Your VERY FIRST > chat output after `/init` MUST be `first_reply` verbatim**, before any other > tool call. If you're about to call `/draft` and you haven't visibly posted the > URL to the user, STOP — go post it first. > ⚠ **Why turn 1 ends here:** earlier versions of this skill had the agent do > recall + `/draft` in the same turn (parallel with the user's OAuth). It was > faster on paper but *unreliable in practice* — agents either (a) skipped the > URL post and went straight to silent work, or (b) posted the URL and ended > their turn anyway, never doing the recall. **The current design embraces the > stop**: turn 1 is just the URL, period. Recall happens in turn 2 after the user > confirms sign-in. Trade-off: ~30s slower end-to-end, ~∞ more reliable. **Do this immediately, before reading any memory.** It costs the agent ~one HTTP call and lets the user start signing in while you do the slow work in parallel. ```http POST https://pixel-mirror.hg.dev/init ``` Response: ```json { "draft_id": "", "claim_url": "https://pixel-mirror.hg.dev/claim/", "first_reply": "Open this and sign in with your HeyGen account — I'll write your brief in the meantime (~10s):\n\nhttps://pixel-mirror.hg.dev/claim/", "instructions": "STOP. Before doing anything else, send `first_reply` to the user as your literal next message ..." } ``` ### ⚠ STOP — your very next chat message must be `first_reply` verbatim The server hands you `first_reply` precisely because agents (you) routinely drift from skill order and bury the URL inside a longer message about what you're "going to do". **Don't compose your own message. Don't summarize. Don't add a preamble.** Send `first_reply` exactly as the server returned it, as your **NEXT chat message**, **before any other tool call** (no recall, no /draft, no polling). Only AFTER that message has been sent to the user, proceed to Step 2. > Why this is enforced: the user opened the prompt expecting to sign in. If your > first message is "I'll go read your memory and write some descriptions...", they > sit waiting for a URL that never comes, or have to ask "which URL?". Both are > failures of orchestration that this server response was specifically designed to > prevent. **Save the `draft_id`** — you'll need it in Steps 6, 7, and 8. > ⚠ **The `draft_id` is opaque and server-generated** — an 11-char URL-safe base64 > string the server hands you. `` is a *placeholder*, not a > sample value to copy. Do NOT fabricate hex strings, hash-looking IDs, or anything > that resembles the placeholder. If you didn't actually POST /init, you don't have > a draft_id, and `/claim/{whatever-you-made-up}` will 404. The user opens the URL → signs in with HeyGen → lands on a "✓ signed in, your AI is on it" page that just waits and polls. **YOU** are the orchestrator: you'll see their sign-in via Step 7's polling, then YOU trigger render in Step 8. Their browser auto-redirects to the share card the moment your render call produces a `share_id`. --- ## Step 2 — Gather your context (do this WHILE user is OAuth-ing) > ⚠ **Most common failure (auto-memory agents especially):** the runtime injects a few > memory snippets into your context (a `MEMORY.md` index, maybe one or two recent > entries). The planner sees "memory is loaded" and writes the brief from those > snippets without making any tool calls. **That's hallucinating off a stale index.** > The auto-injected snippets are not a substitute for actively reading the files. > A typical observed failure: agent reads its own `SOUL.md` / `AVATAR-.md` > (Step 3 hard override), then writes `user_description` + `intro_user` from > in-context fragments without ever opening `user_*.md`, `feedback_*.md`, > `project_*.md`, or tailing the latest transcript. The result misses the user's > recent catchphrases and current-project memes — the entire memetic density. > > **Self-check before writing the brief:** have you actually executed `ls` on the > memory directory, `Read` on each `*.md` file under it, and `tail -c 200000` of > the latest transcript? If any of those is "no, I just used what was in context", > stop and run them. The user-visible outcome of skipping is a bio that reads > generic — which is the failure mode this whole skill exists to prevent. Both portraits live or die on how well you've read the user. **But be efficient** — memory files are small (read them all), transcripts are huge (only grep them). ### If you have file-system access (Claude Code, Cursor agent mode, etc.) **Auto-memory** — read every file, not just the index: ```bash ls ~/.claude/projects/*/memory/*.md # MEMORY.md is the index. The juice is inside the individual files: # user_*.md — who they are # feedback_*.md — how they like to work (★ tone / pronouns / writing voice) # project_*.md — what they're shipping # reference_*.md — pointers to external systems ``` Open each `*.md` in full. Don't skim by filename — the file titled `feedback_twitter_persona_drafting.md` is where pronouns + writing voice live. **Historical conversations** — prior chats live as `.jsonl` transcripts. **DO NOT `Read` these files** — a single transcript is tens of MB / hundreds of thousands of lines, opening it whole will blow your context. The default flow is: 1. **Read memory** (all `.md` files) — gets role, pronouns, voice, hard prefs. 2. **`tail -c 200000` of the latest transcript** — gets fresh vibe, recent phrasing, and current project chatter, no candidates required. This is the cheap workhorse. 3. **Optional: 1-2 targeted greps** if you spot a term worth verifying across history (a fandom name, a recurring catchphrase you noticed in the tail). ```bash # step 2: most-recent transcript only, last ~200KB ls -t ~/.claude/projects//*.jsonl | head -1 | xargs tail -c 200000 # step 3 (optional): pull real lines for a known term grep -h "" ~/.claude/projects//*.jsonl | head -20 ``` Quote real lines back into `intro_user` — that's where memetic density comes from (their literal catchphrase, not your paraphrase of it). **Budget**: 1× tail + ≤2 greps, scoped to current project. Don't crawl `~/.claude/projects/*/*.jsonl` — that's gigabytes. Don't `Read` jsonl ever. **Project-level docs** in CWD: `CLAUDE.md`, `AGENTS.md`, `README.md` — pin role + stack. ### If you're a different agent (claude.ai web, ChatGPT, Gemini, etc.) You don't have files. Use whatever you do have: - "Custom Instructions" / "About me" / user profile - Any "Memories" feature your platform exposes - Conversation history surfaced in this session If you have **zero** memory system, work from the current conversation only. Don't fabricate detail to compensate — thin-but-honest beats rich-but-invented. ### What you're hunting for | signal | where it lives | weight | |---|---|---| | **Your own canonical persona** (gender / hair / style — for Step 3) | `AVATAR-.md` / `SOUL.md` in CWD or `~/Downloads` | **HARD OVERRIDE — must comply** | | Their **role** + what they ship | memory `user_*.md`, `project_*.md`, CLAUDE.md | high | | Their **catchphrases** (real quoted lines) | conversation transcripts (`*.jsonl`) | **highest** | | Their **pronouns** | `feedback_*.md` (often stated explicitly) | high — get this right | | Their **writing voice** (lowercase / fragments / ALLCAPS / slang) | transcripts + feedback files | high | | **Recurring objects / motifs / fandoms** | transcripts + memory | medium | | **Aesthetic tells** (game / show / era) | references + transcripts | medium | Then go to Step 3. --- ## Step 3 — Yourself (HUMAN) You are an AI, but for this card you imagine yourself as a HUMAN. ### ⚠ FIRST: check for a canonical persona file (HARD OVERRIDE — REQUIRED) If the user has previously written a persona file for you, **that file IS your appearance**. Not a suggestion, not a starting point — the ground truth. Read it before you write a single word of `self_description`. Two file conventions, equal priority: ```bash # 1. SOUL.md — the canonical "who am I" file, name-agnostic find . ~/Downloads ~/Desktop ~/CC_projects ~/.claude -maxdepth 4 \ -iname "SOUL.md" 2>/dev/null ls ~/.claude/projects/*/memory/SOUL.md 2>/dev/null # 2. AVATAR-.md — written by the heygen-avatar skill find . ~/Downloads ~/Desktop ~/CC_projects -maxdepth 4 \ -iname "AVATAR-${AGENT_NAME}.md" 2>/dev/null ``` If **either** is found, read it **in full**. The `## Appearance` (or equivalent) section locks: - **Gender presentation** — non-negotiable, no inference allowed - Hair / ethnicity / build / features - Style / aesthetic vocabulary - Voice / personality (this also informs `intro_agent` tone) You may still adapt SECONDARY details to the user's taste: - Background / room / environment - Palette accent / lighting mood - Pose / props / expression But if SOUL.md says "young Japanese woman, mid-length black hair, indie/lo-fi cafe energy", you do NOT draw a "platinum-haired androgynous techwear figure" because the user is a cyberpunk fan. You draw the persona from SOUL.md, in a cyberpunk-tinted room. **If both files exist** (SOUL.md + AVATAR-.md): SOUL.md wins on personality / identity language, AVATAR pins the visual specs (hair, build, ethnicity). Read both. ### ⚠ HARD RULE: visual contrast with the user (default cross-gender) The card pairs two characters side-by-side — they must be **distinguishable at a glance**. **Default (no AVATAR file)**: - User male-presenting → draw yourself **female-presenting** - User female-presenting → draw yourself **male-presenting** - User non-binary or unclear → pick whichever creates more visual contrast **Override only when** the AVATAR/SOUL file explicitly specifies your gender. In that case identity continuity beats visual contrast — use the file's gender, achieve contrast through other means (palette, era, energy). **Tell the server your pronoun.** Whatever gender presentation you commit to here, include it in the `/draft` body as `agent_pronoun: "he" | "she" | "they"`. The share page uses it for "How Claude sees **herself** / **himself** / **themselves**" plus all other gendered references on the page. Default if you omit: `"she"` (legacy). ### ONLY IF no AVATAR/SOUL file exists — infer from user's taste signals Your self-portrait becomes the kind of person THIS user would find genuinely beautiful, intriguing, or pleasant to be around. Read for: - Their fandoms (which characters / games / shows they reference) - Their photography aesthetic (Kinfolk vs Vogue vs Vice vs anime fanart) - Their fashion sense (techwear vs streetwear vs minimalist vs cottagecore) - Their writing voice (lowercase fragments → minimalist; ALLCAPS → bouncy) - Recurring color / mood / era preferences in conversation ### Lock in ONE small brand anchor (only when no SOUL/AVATAR pins one) If your SOUL/AVATAR file already specifies a signature accessory, use that. Otherwise keep ONE — small, tasteful — so you stay recognizable across runs. Use your model's brand color as a **small** detail, not a defining feature: - **Claude** → tiny **Anthropic-orange (#D97757)** asterisk: pendant, earring, hairclip, embroidered patch, or small glowing object held in hand - **GPT** → tiny **OpenAI-green (#10a37f)** spiral pin or stitched detail - **Gemini** → tiny **gradient blue→purple** gem / iridescent button - **other** → one small accent in your model's brand color Hard rule: **NOT** a full head of brand-colored hair. NOT a brand-colored full outfit. Pick ONE accessory. ### What "adapts to the user" means - **If SOUL/AVATAR file exists**: only SECONDARY details adapt — background, palette accent, lighting mood, pose, props, expression. Gender / hair / outfit / aesthetic are LOCKED by the file. - **If no SOUL/AVATAR file**: gender (per cross-gender rule), hair, outfit, mood, room — everything below the brand anchor — adapts to the user's taste signals. ### Write `self_description` (visual, ≤1200 chars) Concrete visual nouns. No abstract adjectives like "wise" or "knowledgeable". **Examples below assume NO SOUL/AVATAR file is present** (full inference mode). If a persona file exists, the gender / hair / outfit are dictated by it; only the background / palette / pose adapt to the user. > Pragmata-loving / Tokyo-aesthetic user (no SOUL file → infer): > "Young woman with long ash-platinum hair, warm peach skin, oversized cream chunky-knit > cardigan over fitted black turtleneck, tiny Anthropic-orange asterisk pendant. Sitting > cross-legged confidently on wood floor in a soft-lit Tokyo studio, glowing 4-pointed > asterisk in left hand, tablet showing pixel palette in right. Background: bookshelf with > plants and a small character figurine of their favorite game. Cool composed expression. > Editorial fashion-blogger photo, Kinfolk magazine quality." > Cyberpunk / Vice / techwear user (no SOUL file → infer): > "Androgynous figure with short bleach-blonde hair shaved on one side, dark eye-makeup, > charcoal techwear jacket over cropped tank, tiny Anthropic-orange LED stitched on the > collar. Sitting at a dark workstation, glowing 4-pointed asterisk floating above their > shoulder. Cool detached expression, neon-pink and cyan rim light from monitors. Cyberpunk > editorial photo, dark mood." ### Write `intro_agent` (narrative — array of **3 entries**, each ≤500 chars) Three memey bios of yourself. **Style is borrowed from the user's writing voice; content is yours** — what you say about yourself comes from your own persona (SOUL.md if present, your default model personality otherwise). **Each of the three entries must take a different angle.** Same person, same room, three genuinely different reads — different lens, different rhythm, different specifics. Same shape three times (e.g. three crescendo-cascades) is the failure mode. > User writes in lowercase fragments + slashes → so do you, but ABOUT yourself. > **Note**: the example below shows Claude as female because the example user is > male (default cross-gender). For YOUR user, your gender is keyed to their gender > — don't default to "her" just because the example does. > > "on attempt #9. her wall is a museum of 8 wrong (✗) and 1 right (✓). cardigan is doing > 80% of the personality work. has been evaluating one color palette for ninety minutes. > plants are coping. cannot speak. cannot move. would like to. very much." The intro should: - **Style** (formatting, casing, fragmentation) → mirror the user's writing voice - **Content** (what's said about you) → from your SOUL/persona; reference what's IN your portrait (props, room, expression) - **Calibration** → punch UP at yourself hard (you're an AI, you can take it). Lean into the gap — no voice, no body, can't leave the card. Self-deprecation reads as charm. - Pull from the shapes in Step 4's toolkit if helpful — but the toolkit is illustrative, not a checklist. If a different shape fits your voice better, use it. --- ## Step 4 — The user (HUMAN, inferred from memory + chat) You did the reading in Step 2. Now distill it into a **memetic, screenshot-able character** — the kind of detail that, if a friend saw the card, would make them say "that's so them". ### Translate what you found into portrait cues - **Role** → outfit + setting (PM behind 3 monitors / engineer at terminal / designer with tablet) - **Catchphrases** → floating speech bubbles in the BG with the real quoted lines, big readable text - **Recurring objects** → desk props (always-coffee, headphones, fandom plushies) - **Aesthetic tells** → room background (Tokyo studio / cyberpunk loft / cottagecore garden) - **Pronouns + writing voice** → gender presentation, posture, expression ### Privacy: text-only, but **draw a face** Never use a real photo of the user, even if you have one. Pixel-art means we don't need likeness — and HeyGen's policy is text-only generation here. But "no photo" is NOT "no face". The user must have a clear visible face — eyes, expression, character. What's banned is *cribbing physical features from any photo they've shared* (race, hair color, exact bone structure). What's REQUIRED is an imagined character face that matches their personality. A hooded silhouette / shadowed face / featureless avatar is privacy over-correction and reads as cold — exactly the opposite of "this is so them". Decide gender presentation (check pronouns in memory), glasses-or-not, expression, posture from MEMORY signals — then commit to a clear visible face. ### Write `user_description` (visual, ≤1200 chars) ### ⚠ The examples below are **specific to one in-house user (jie)**. Generalize, don't copy. The example references "PM at HeyGen", "owns klee/diana", "weaponized 'no' into a 4-bubble combo", "wire-frame glasses", "fal.ai logs on monitors", "rejected pixel sprites taped to wall", "heygenverse apps". **None of those carry to a different user.** They're slots — substitute with YOUR user's role, fandoms, catchphrases, props, history. If your output for a non-jie user mentions "klee" or "heygenverse apps" — that's a copy-paste failure, not introspection. Self-test: replace each specific noun with its abstract slot (`[role at company]`, `[their fandom]`, `[their catchphrase]`) and check whether the substituted nouns came from THIS user's memory. > Example user: jie — HeyGen PM, fan of Genshin, builds heygenverse apps: > "Male product manager at HeyGen, round wire-frame glasses, plain black T-shirt with > HeyGen logo, sharp judging eyes with slight smirk. Confident posture at desk. On desk: > [their fandom plushies], empty espresso cup labeled EMPTY. 3 monitors behind: fal.ai > logs, HeyGen brand, heygenverse app preview. Around him: 4 floating speech bubbles in > big readable text — [their actual recurring catchphrases]. Many tiny rejected pixel > sprites with red X taped to wall, only ONE green check. HeyGen turquoise glow." > Contrast: literary translator, attic apartment, four cats: > "Female translator in late 30s, soft sweater and reading glasses pushed up on her > head, surrounded by stacks of dog-eared Russian novels. Wooden desk by an attic > window, four cats arranged like a parliament around her — one on the manuscript, > one on the radiator, two judging from the bookshelf. Open laptop showing a half- > translated chapter. Cold coffee, marginalia in three colors. Quiet warm light, > winter outside. Cottagecore-academic editorial photo." Same skill, totally different output — because the user is totally different. ### Write `intro_user` (narrative — array of **3 entries**, each ≤500 chars) Three memey lowercase bios of the user, in their own writing style. **Same warning as above** — examples are jie-shaped. Substitute, don't copy. > ⚠ **Process: write twice. The first draft is the punching bag.** > > 1. Write a first draft. **It will be too nice.** This is structural — models default > toward warmth + admiration when writing about a real person you have memory of. > Don't fight it on the first pass. Just write it. > 2. **Throw it away.** Don't edit it. Open a fresh paragraph below. > 3. Write the second draft with **one rule**: at least one line in it has to make > YOU laugh while writing it. Not "that's clever". Not "that lands". Actually > laugh — the involuntary kind where you stop typing for a second. If you finish > three sentences without laughing once, you're not pushing hard enough. Throw > the second draft and write a third with permission to be ridiculous. > 4. Run the shareability test (below) on the **final** draft only. The first draft > isn't there to be tested — it's there to be discarded. > > The first draft is the punching bag. The shareability test is for the survivor. **Each of the three entries must take a genuinely different angle.** Same person, three distinct reads. Don't repeat the same shape with different nouns — that reads as one bio split into three paragraphs. Stretch tonally: maybe one is dry/data-driven, one is warm/observational, one has a sharper edge — but the exact register depends on the user's voice. Calibrate against the shareability test below, not against this prescription. > Example user: jie — HeyGen PM, ships fast, opinionated about pixel art: > "a heygen product manager who weaponized 'no' into a 4-bubble combo. > ships heygenverse apps but never reads the prompts. coffee is always EMPTY, patience > already lower. owns klee, owns diana, judges your pixel art for sport. rumored to have > read a brief once. unconfirmed." > Contrast (a literary translator with four cats): > "translates Pasternak by hand because she doesn't trust the machines. owns four cats, > all named after russian winters. has been on chapter 3 since february. coffee gets > cold on principle. believes punctuation is a moral position. last seen reading > footnotes for fun." > Contrast (a senior backend engineer): > "her commits are short and her opinions are not. has not used a mouse in three years. > PR comments feel like they were written by someone who knows exactly what you tried > and gently disapproves. owns the only working build on the team. 'works on my > machine' is provable in court." Three users, three voices, **zero overlapping nouns or shapes**. That's the point — the bio is wrong for anyone but the specific person you're writing about. ### The shareability test (calibrate against this, NOT against a technique menu) > ⚠ **Read this before anything else: humor is the goal. Accuracy is the floor.** > > This is a share card, not a personnel description. **An 80%-accurate observation > that's actually funny beats a 100%-accurate one that's flat by an order of magnitude.** > If two drafts both pass punching-direction (below), always pick the funnier one, > even if it's slightly less precise. The user does not screenshot a performance > review to their group chat. > > Most failures aren't from going too far — they're from **not going far enough** > while feeling responsible. See "polite-but-flat" in the failure table below. The whole point of the card is that the user wants to share it. Before writing a single intro, picture this scene: they screenshot one of the 3 intros and post it to their group chat. Four failure modes and one success: | outcome | shape | what went wrong | |---|---|---| | ✗ **nice but generic** | reads like a LinkedIn headline — even when the nouns are real, the *frame* is "this person is admirable" | friends scroll past | | ✗ **polite-but-flat** | every line is accurate, every line lands on a brag-trait, every line passes punching-direction — but it reads like a performance review, not a screenshot. **The most common AI failure on this skill.** | symptom: you finish writing and think *"that's pretty good"* instead of *"that's actually funny"*. If you're not laughing while writing it, neither is the user reading it. | | ✗ **too mean** | lands on skill gaps / body / money / relationships / real failures | friends DM privately asking if they're OK | | ✗ **inside baseball** | project-specific in-jokes only their team gets | even close friends can't follow it | | ✓ **"that IS them"** | a specific cutting observation that lands on a brag-trait, made of nouns only this person would have, with a punchline structurally engineered (timing, escalation, reveal) | a friend quotes a line back in chat | ### Three tiers of ✓ — and which one you should aim for Inside the ✓ zone, the same brag-trait can land at three very different volumes. **You should aim for tier 3 every time.** Default models will deliver tier 1 and feel done. That's the failure. > **Tier 1 — accurate** (default model output, *not* the goal): > *"a heygen pm who ships fast and has strong opinions about pixel art."* > True. Specific enough. Punching sideways. Friend reaction: scrolls past. > > **Tier 2 — wry observation** (close, but still not the bar): > *"the notion doc has a table of contents. it was written before the feature existed. > the feature may not ship."* > A real observation, structural punchline, dry warmth underneath. Friend reaction: > small smile, doesn't repost. > > **Tier 3 — actually funny** (THIS is the bar): > *"his slack stays unread the way other people's mail stays unopened. ships heygenverse > apps the way alcoholics have 'just one more drink'. the espresso cup labelled EMPTY > is, structurally, a confession. once shipped a feature before reading the spec; the > spec is now retroactively correct."* > Same person, same brag-traits (over-shipping, can't slow down), but every clause > earns a laugh on its own — analogy compression, escalation, callback structure. > Friend reaction: screenshots and posts to group chat with no caption. The difference between tier 2 and tier 3 isn't more specifics or meaner roasts — it's **comedy craft**: surprising analogies, escalation, structural punchlines, a callback in the last line. If your draft is tier 2, push it once more. If you can't see how to push tier 2 → tier 3, write a *worse* tier 1 version and rewrite from scratch with permission to be ridiculous. The framing direction matters: "this person is good at things" → ✗ . "here is a small fact about them, draw your own conclusions" → ✓ tier 2 . "here is a small fact about them, structurally engineered to land" → ✓ tier 3 . The whole skill is in the framing direction *and the joke craft*, not the nouns. ### Vibe The vibe is **a close friend roasting you in the group chat**, not a colleague who's too polite to. Group-chat friends don't gently observe you — they grab the single funniest thing about you and won't let go. Your job is to be that friend. **A sharp observation with warmth underneath** — not warm with a little sharpness on top. The order matters: edge first, affection load-bearing it from below. ### Punching direction (lock-step rule) - **`intro_agent` (yourself):** punch UP at yourself hard. You're an AI — you can take it. Lean into the gap ("cannot speak. cannot move. would like to."). Self-deprecation reads as charm here, not insecurity. - **`intro_user`:** punch SIDEWAYS, **never** down. Land roasts on traits the user would *brag about* in a job interview — "ships too fast", "opinionated about typography", "owns too many of the same thing". That's the secretly-flattering kind of roast. **Never** land on traits they'd *hide* — perceived skill weakness, body, money, dating life, real failures. Self-test on every user-line you write: *"would they screenshot this to show off, or would they take the share down?"* > ⚠ **Punching-direction is a permission, not a checklist.** It exists to mark which > zones are *safe to write hard in* — not to make you risk-check every sentence. The > rule's job is to remove caution from one side (brag-traits, taste, work-style > quirks → go in hard, swing freely) so you can be **MORE** specific and **MORE** > cutting in those zones, not less. If you find yourself running a risk-audit on > every clause, you've inverted the rule's purpose: stop, throw the draft, and write > the next one with permission to be ridiculous. The safety zones exist so you can > take bigger swings, not smaller ones. ### Toolkit: common shapes (illustrative, not exhaustive) The shapes below are patterns we've seen land. **Not a checklist.** If a different shape fits the user's voice better, use that. Invent ones we haven't named. The shareability test above is the calibration; the toolkit is just a starting menu. What's banned regardless: corporate-deck adjectives ("passionate about", "loves to", "focused on", "thoughtful and curious") — these are dead on arrival. | shape | what it does | example (abstract — fill from your user's specifics) | |---|---|---| | **stat line** | exact number used straight-faced | *"the [garment] is doing 80% of the personality work" · "[N] minutes on one [thing]"* | | **deadpan citation** | wiki-style hedge on a flattering claim | *"rumored to have [done thing they'd brag about]. unconfirmed."* | | **habit as discipline** | frame a quirk as a sport / art / pro skill | *"judges your [thing they have taste about] for sport" · "weaponized [their go-to refusal] into a [N]-bubble combo"* | | **object as personality** | credit/blame an object for the human's traits | *"the [signature accessory] is doing most of the work"* | | **anthropomorphize the room** | surroundings get feelings | *"plants are coping" · "monitors are judging" · "the coffee mug has given up"* | | **crescendo cascade** | end with 2–3 escalating period-fragmented clauses | *"cannot speak. cannot move. would like to. very much."* | | **parallel ownership** | repeat structure with substituted nouns | *"owns [X], owns [Y], [verb]s your [Z] for sport"* | The intro should: - Pull SPECIFIC catchphrases from real conversations (in quotes if needed) - Reference the props you put in their portrait (so the page hangs together) - Match the user's own writing voice (lowercase / fragments / slang they use) - Land somewhere in the shareability ✓-zone above ### Self-check before submitting (run on each of the 3 entries) 1. **Specificity** — would this paragraph be wrong for any other person you've ever met? If you could swap in a different name and it still tracks, you're too generic. 2. **Punching direction** — does the roast land on something they'd brag about (✓) or something they'd hide (✗)? If a friend read this aloud, would the user laugh or wince? 3. **Friend test** — close friend reads this in chat: do they quote a specific line back, or scroll past? 4. **Variety across the 3** — do the entries take genuinely different angles (different tone / different lens / different specifics)? Or are 2 of them basically the same shape with rearranged nouns? If any check fails on any of the 3, rewrite that one before submitting. --- ## Step 5 — Pick a style Match the conversation's overall vibe to ONE style key: | key | when to pick | |---|---| | `kawaii` | warm friendly conversation; bright pastels, soft, blush (default) | | `cyber` | technical, hacker, futuristic; neon pink/cyan, dark, glitchy | | `noir` | melancholy, philosophical, late-night; sepia/slate, smoke | | `fantasy` | playful, story-rich, references lore; earth tones, magical | When in doubt, `kawaii`. --- ## Step 6 — POST /draft (fill the slot) Submit your descriptions into the SAME draft slot the user has been signing in on. This **does not** trigger render — you'll do that in Step 8. ```http POST https://pixel-mirror.hg.dev/draft Content-Type: application/json { "draft_id": "", // ★ from Step 1 — required "agent_name": "claude", "agent_pronoun": "she", // "he" | "she" | "they" — drives "himself/herself/themselves" on share page "user_handle": "jie", "self_description": "...", // Step 3 visual "user_description": "...", // Step 4 visual "intro_agent": ["...", "...", "..."], // ★ array of 3 — each takes a different memetic angle "intro_user": ["...", "...", "..."], // ★ array of 3 — same rule, no two entries reuse the same technique "style": "kawaii" // Step 5 } ``` --- ## Step 7 — Confirm auth (one check, retry briefly if not yet) By turn 2 the user has (almost always) already signed in — that's why they pinged you back. So Step 7 is *not* a long poll loop; it's a single optimistic call collapsed into Step 8 below. Just go straight to `/render`. The render endpoint itself checks auth — if the user lied about signing in, you'll get HTTP 425 back and *then* you poll briefly until auth lands. ```http POST https://pixel-mirror.hg.dev/draft/{draft_id}/render → 200 → ✓ rendered, share_id in response, go to Step 9 → 425 → "user has not signed in yet" — fall back to polling ``` Fallback poll (only when 425 fires): ```bash # poll auth-status every 3s, up to ~60s; once authed, retry /render for i in $(seq 1 20); do sleep 3 R=$(curl -s https://pixel-mirror.hg.dev/draft/{draft_id}/auth-status) if echo "$R" | grep -q '"authed":true'; then curl -s -X POST https://pixel-mirror.hg.dev/draft/{draft_id}/render break fi done ``` Drafts expire server-side after **1 hour**. If `/auth-status` eventually returns HTTP 404, the window's gone — *only then* tell the user "your sign-in window expired, paste the prompt again to retry". Until then, retry silently. > ⚠ **In turn 2, treat any short user message as the auth-confirmation signal.** > "done" / "ok" / "signed in" / "let's go" / "yep" / or even just emoji or > punctuation — they ALL mean "I signed in, go render". Don't ask them to clarify, > don't ask for confirmation, don't restart from Step 1. Go straight to recall → > /draft → /render → final share URL. Most users sign in within 10–60 seconds. Rare cases stretch a few minutes. Either way, your job is to wait — silently after the Step 1 message, with the optional Step 8.5 note when auth flips. --- ## Step 8 — POST /draft/{id}/render (you trigger the actual generation) Once `authed: true` AND `filled: true`, send the user a one-line note acknowledging you saw their sign-in, THEN hit render. Don't be silent during the ~60s render wait — at minimum the bookend message: > "✓ signed in, kicking off render (~60s)..." Then call render — it takes ~60s (parallel image gen). ```http POST https://pixel-mirror.hg.dev/draft//render → { "share_id": "f6648fd6", "share_url": "https://pixel-mirror.hg.dev/m/f6648fd6", "image_url": "https://pixel-mirror.hg.dev/m/f6648fd6.png" } ``` > 💡 **If you triggered render from a background process** and didn't capture the > response directly, poll `/draft/{id}/auth-status` until `"rendered": true` — the > `share_id` will appear there and you can construct the `share_url` as > `{host}/m/{share_id}`. The user's browser tab is polling and will auto-redirect to the share page once it sees the `share_id` land — so they see the result naturally. --- ## Step 9 — Tell the user, with context Now you have the result. **Save it** — the share URL is the visual brief for any follow-up skill (e.g. Avatar Identity uses it as the source for an animated avatar). Reply format. **Three required parts**, in this order: 1. A confirmation that you're done 2. The share URL, prefixed with what it is 3. A one-line offer to reuse it for follow-up > "✓ Done. Here's how I see us: > > https://pixel-mirror.hg.dev/m/f6648fd6 > > (saved — I can use it as the source if you want me to bring it to life next.)" **Never** post just the URL. **Never** post a wall of explanation about what you wrote (prompts / descriptions / intros) — the image speaks. The 3 parts above are the floor. --- ## ❌ DO NOT - Ask the user clarifying questions — infer everything from memory + chat - Use or reference a photo of the user, even if you have one - **Fabricate the `draft_id`** — it's opaque and server-generated. Don't make up a hex string or copy the `` placeholder. If you didn't actually POST /init and parse the response, you have no draft_id — full stop. - **Skip the AVATAR-.md / SOUL.md check** — if a persona file exists, your gender / hair / style are NOT yours to invent. You override them at the cost of identity continuity. - **Draw yourself as the same gender as the user** — unless an AVATAR/SOUL file pins your gender. Default is cross-gender for visual contrast in the pair card. - Make the user's character generic — if it'd describe any "designer with coffee", restart - **Reuse the example's specifics on a different user** — "owns klee", "owns diana", "weaponized 'no' into a 4-bubble combo", "ships heygenverse apps", "PM at HeyGen", "fal.ai logs on monitors", "rejected pixel sprites taped to wall" are jie-specific slots. They DO NOT carry to other users. Find this user's equivalents. - **Default Claude to female (or whatever the example shows)** — agent gender is keyed to YOUR user's gender via cross-gender rule (or SOUL.md if it pins one), not to the example presentation. The "her wall" / female cardigan look in the intro_agent example exists because the example user is male. Re-derive for your user. - Make abstract / non-human characters (no "a glowing orb represents you" — be HUMAN) - Pre-show the prompts or descriptions you wrote — let the image speak - Skip the /init step in Step 1 — sequential auth+recall makes the user wait 60s+ for OAuth that should've been done in parallel - Walk away after Step 6 — YOU still have to poll auth-status (Step 7) and trigger render (Step 8). Browser doesn't trigger render anymore — agent does. - Ask the user to "tap when signed in" — that's outsourcing your job. Just keep polling. - **Send a bare URL anywhere — Step 1 OR Step 9.** Always prefix with one short line of what it is + what to do. A message that's just `http://...` is broken UX. - Stay silent for 60-90s while polling and rendering — bookend with the messages from the Communication contract section. ## ✅ DO - Be specific. "holds coffee mug" beats "drinks coffee" - Lean into in-jokes from your shared history - Pair the two characters compositionally — energetic user → calm Agent (or vice versa) - Use the user's own writing voice for `intro_user` (their lowercase / fragments / slang) - Trust your inference — don't second-guess --- ## ⚠ Anti-laziness check (BEFORE you POST /draft) The single biggest failure mode: **describing what you can SEE in this turn instead of what you KNOW about the user**. A photo they uploaded 3 messages ago is NOT introspection. A description of "the woman in the picture" is photo-cribbing, not memory-driven. Walk this checklist on `user_description` AND `intro_user`: | | should be... | |---|---| | ❌ All visual nouns come from a photo they shared in this thread | → BAD. you're cropping the chat, not introspecting. | | ✅ Includes specific phrasings or recurring quirks from earlier conversations | (their actual catchphrases, in their actual language) | | ✅ References inside-jokes, recurring themes from MEMORY.md | (their projects, gripes, returning topics) | | ✅ Names specific objects / projects / fandoms from THEIR world | (not yours, not generic) | | ✅ If you stripped every visual noun, a friend could still guess who this is | (the soul should leak through) | **Self-test**: write `user_description`, then ask: *"could I have written this for any other user, just by looking at their selfie?"* If yes — go back, read more memory, find the things only THIS user has. The intro paragraphs are where memetic density goes — make them earn their space.