Building Visual Neurons: image + video generation with Next.js, Replicate, and Amazon Bedrock -

Reading Time: 21 minutes

TL;DR: A vibe coded Next.js 15 app (https://visualneurons.com/) that lets you create images, edit them with precision masks, and generate short videos. It runs on a small t3.xlarge EC2 box behind nginx with TLS, stores files on disk, tracks usage and costs, and authenticates with AWS Cognito. Models: Imagen 4 Ultra, Nova Canvas, Nano Banana, Qwen Image Edit+, SeedEdit 3.0, Seedream 4, Grounded SAM, Veo 3.1, and Claude 4.5 Sonnet.

And yes, the blog post featured image has been generated with Nano Banana on Visual Neurons 😎

Check out how the app works! I show how to create and edit images and create videos. Precision mask inpainting has its dedicated section below.

About the vibe coding blood shed

I’d never vibe-code anything that seriously has to work. If you are a professional you must understand what every single line of code does. This is the opposite of vibe-coding, so I’d never vibe-code anything production-related at work.
LLMs are getting better and better (I started with Claude 3.7 Sonnet + Claude Projects and ended up with Claude Sonnet 4.5 + Cursor) but they are still super dumb. They do all sorts of stupid things that not even a Junior dev would do. They hallucinate APIs even after feeding them the actual most recent documentation and explicitly telling them to read it.
No matter how strong and refined your prompt is, an LLM will almost never implement an entire feature end-2-end bug-free. Planning is key (Cursor is great at this) to break the feature down into tasks. Make sure to review the plan very thoroughly. Ask follow up questions. Challenge the LLM. Ask another (different) LLM to review the plan generated by the first.
The LLM must see what the human sees. And viceversa. Coding locally and then, separately, deploying the app to a platform (Vercel, Amplify) the LLM has no access to is a recipe for disaster. I encountered build errors remotely (when everything was ✅ locally) related to some random quirks of the platform in question. Feeding build logs to the LLM was more damaging than beneficial, as the LLM started guessing what might have gone wrong and, instead of saying “Look dude, I have no clue tbh” it went into a refactoring/fixing spree that made things only worse. My delivery speed 10x-ed when I started developing everything locally. Build logs fed in real-time to the LLM, which knows the exact machine/env specs it is running into. Game-changer.

Now the full story. It might be shocking to admit but this web app has been entirely vibe coded. And when I say entirely I mean all of it. I did not write a single line of code. Cursor and VSCode + Claude Sonnet 4.5 did the magic. This alone blows my mind while I type it down and it’s a major success. I need to provide some context though to avoid falling into the “AI is stealing software engineering jobs” trap. Believe me, it’s not.

It took me at least 3 pivots, significantly changing (and dramatically simplifying) strategy and execution over the last 8 months (with long breaks every now and then) to ship this result. Everything started in February 2025 when I decided to build a complete end-to-end web app. Something I had never done before. Here’s the concise back story of what led me here.

AWS Amplify: the nightmare begins

No matter how long I have been working in tech (close to 10 years), I still have to fight the crazy, nonsensical, instinctive urge of overcomplicating a solution to a problem. Overcomplicating is what led me to the Amplify disaster. February 2025: I like doing stuff with images and I am frustrated that I have to create accounts in 10 different platforms each one with its own models. Create an image here. Then download and upload it back there to edit it in a way the first product does not support. Then go back. So bad.

I want a Google AI Studio experience with all models at my fingertips. Chatting with images and models in a single place. Of course this comes at a cost. I am not going to train and host my own models. I just want to integrate existing providers (Gemini, OpenAI, Stability, Runway, you name it). This means opening accounts with all of them, adding my credit card, and creating API keys. It also means I need an effective billing system. I cannot open this product to external users and tell them to have fun with it. Each operation costs me money, so I need a way to control how much each user spends (yes, I am picturing an actual product here).

I immediately think big. I want auth, modern frontend, solid backend, storage, databases integrations, billing and payments, multi-users support, strong analytics. All of it. From day 1. I look into the AWS ecosystem and Amplify stands out.

AWS Amplify is everything you need to build web and mobile apps. Easy to start, easy to scale […] Go from idea to app in hours. Deploy server-side rendered and static frontend apps, develop UI, add features like auth and storage, connect to real-time data sources, and scale to millions of users. No cloud expertise required.

I am sold. I know nothing about “modern” web development, so I start discussing with ChatGPT, explaining what I want to do and that I want to stay within the AWS ecosystem. It agrees Amplify is a good idea, so I ask to come up with a plan and we start working. This is February 2025, so, believe it or not, I am doing this in Claude + ChatGPT Projects with models (Claude 3.7 Sonnet + o3-mini-high) whose performance is laughable compared to the LLMs we have now after only 8 months. Most importantly, the tooling I had at my disposal was pretty rudimentary (Cursor already existed but I overlooked it completely). As I explain here, the workflow consisted essentially in a back-and-forth copy-paste between VSCode and LLM chats. Note how this violates one of my gotchas above “The LLM must see what the human sees. And viceversa”. It looked like magic at the beginning but I did not go far. In hindsight, the issues were very obvious: I went for complexity (like adding a ton of superflous DynamoDB non-sense) that led to Claude wanting to refactor everything all the time. Plus stubbornly trying to get Amplify to work. I am praising it here, but I was only at the beginning. Btw, when I thank “the Amplify service team for unblocking me on a UI issue”, I am referring to this one, which, to this day, has not been resolved yet. Essentially, when you start from official Amplify boilerplate (a To-Do sample app), there is no way to get out of it (possible but non-trivial actually). For whatever random bug, some backend resources created by Amplify are not overridden by infra changes you make to your new app. So annoying. In honor of the truth, the Amplify service team did help me (they were VERY helpful), but it took a call with a software engineer in Seattle to figure it out. That’s a privilege I have just because I am part of the AWS Hero community. My Amplify journey lasted a couple of months. This is my last post providing a status update. Two things started happening that killed the project: the first is that builds on the Amplify console took between 12 and 15 minutes. This is crazy. How am I supposed to “iterate fast” if it takes so much time to check if things work (yes, I know there are sandboxes, I used them but they didn’t help)? The second is that builds failed for random, barely understandable reasons. Neither Claude nor ChatGPT managed to figure them out. Just plain “error something – all red armageddon” screen. This is when I gave up. Note that I tried twice. First and second repo. Both ended nowhere. The second time I simplified my requirements. I tried to start from scratch and not from the buggy To-Do sample app. But no. Same outcome. All starts ok, turning into a wreck pretty fast. An important detail is that by my second attempt, I had switched to Cursor and Claude 4. Better tool and model, same disaster. Nothing would build after a while.

Ditch Amplify for Vercel and Supabase: the nightmare continues

To be fair, Vercel and Supabase were better than Amplify. Builds worked (at least at the beginning) but what really hit my nerves was that I had to create I-lost-count-of-how-many accounts. Each with its own pricing and weird quirks. There was a ton of clicking things I did not understand to do things I did not understand. Once again, all guided by a blindfolded LLM. Again violating the “The LLM must see what the human sees. And viceversa” gotcha. If for some reason I had forgotten to check a box on a Supabase random subpage, another random feature in the app would break, and no matter how smart the LLM was, that build error is close to impossible to debug without a bird eye view of all the infra, and most importantly of the latest UIs. E.g. when Claude told me “Navigate to Vercel page X and click Y” half of the times X and Y did not exist. Enough is enough.

Ditch Vercel and Supabase for EC2: the nightmare is over

This is when I realised I had gotten everything wrong since the beginning, and the moment I wrote down the LLM-vibe-coding gotchas I wrote above in the TL;DR. In the “make it work, make it right, make it fast” three-step development principle, I was not even “getting it to work”. I needed to go back to KISS, “Keep It Simple, Stupid”, e.g. “avoiding unnecessary complexity to create solutions that are straightforward, clear, and maintainable”. I was doing the opposite and LLMs were making it worse. The moment I switched to an entirely local web app (and careful planning), the project picked up steam and I started to truly vibe code. Claude Sonnet 4.5 (the best in town hands down) saw in real-time the effect of code changes and I somewhat turned into an Engineering Manager watching efficient Junior devs (Cursor AI agents) doing exactly what I told them to do. Making stupid mistakes every now and then (basically screwing up every single external API integration despite docs being available) but eventually converging to a product that worked. Massively simplified compared to my initial vision (no billing, single user – me, AWS integrations reduced to a minimum) but completely functional. Visual Neurons was born.

What the app does

Visual Neurons is a small image/video studio born from frustration. I was tired of switching between platforms: Midjourney/Gemini for images, Runway for video, Stability for editing. Each with its own account, billing, and quirks.

I wanted one place with all the models. Here’s what it does:

Create: generate images from text
Edit: change parts of an image with plain English, optionally using a precision mask
Video: create short 8-second clips
Gallery: browse saved items
Usage: track estimated costs by model/provider

Architecture overview

Next.js 15 (App Router, React 19, TypeScript) serves both UI and API routes
Prisma + SQLite for data; files live on disk at /var/visualneurons/media
Replicate: gateway to Imagen 4, Nano Banana, Qwen, SeedEdit, Seedream, Veo 3.1, Grounded SAM
Amazon Bedrock: Nova Canvas + Claude 4.5 Sonnet (for prompt help and error debugging)
nginx: terminates TLS, forwards to Next.js on localhost:3000 (more below on deployment)

Repo layout

visual-llms/
├─ app/              # pages + API routes
├─ components/       # UI components (e.g., MaskGenerator)
├─ lib/              # providers, auth, pricing, storage helpers
├─ prisma/           # schema.prisma
├─ scripts/          # nginx + SSL helpers, Cognito helper
└─ /var/visualneurons/
   ├─ db.sqlite
   └─ media/{sessionId}/image_*.png, mask_*.png, video_*.mp4

What are Replicate and Bedrock?

Both are model hosting platforms. They run the actual AI models so I don’t have to.

Replicate is a service that lets you run models via API. I send a prompt, Replicate runs the model (Imagen 4, Veo 3.1, etc.) on their infrastructure, and returns the result. No GPUs needed on my end. I pay per generation. Think of it as “serverless for AI models.”

Amazon Bedrock is AWS’s managed AI service. Similar concept: I call an API, AWS runs the model (Nova Canvas, Claude 4.5) and returns results. The key difference is that Bedrock uses my EC2 instance’s IAM role for authentication. No API keys in .env files. It’s deeply integrated with AWS services.

Why use both? Replicate has a wider model selection (Imagen 4, Veo 3.1, Qwen, etc.). Bedrock has Amazon’s own models (Nova Canvas) plus Claude 4.5, which is perfect for the prompt improvement and error debugging features since it’s vision-capable and excellent at understanding context.

The exact models I use

🆒🔗🚨 Which image editing model should I use? Great Replicate blog post that gave me lots of ideas 🆒🔗🚨

Model	Provider	Purpose	Pricing
Imagen 4 Ultra	Google via Replicate	Image creation	$0.06 per image (highest quality)
Amazon Nova Canvas	Amazon Bedrock	Image creation + editing: supports language‑driven edits and mask‑guided inpainting. A lot more than this actually, but the performance is way subpar compared to Google, so I did not add all the available options.	$0.08 per 2048×2048 premium image
Nano Banana	Google via Replicate	Image creation and language-driven image editing	≈$0.039 per 1024×1024 image
Qwen Image Edit Plus	Qwen via Replicate	Image editing	$0.03 per output image
SeedEdit 3.0	Bytedance via Replicate	Image editing	$0.03 per output image
Seedream 4	Bytedance via Replicate	Image editing	$0.03 per output image
Grounded SAM	Replicate	Mask prompting based on Grounding DINO & Segment Anything	$0.0014 per mask generation
Veo 3.1	Google via Replicate	Video generation in standard mode or reference (R2V) mode (more below); supports webhooks.	$3.20 per 8-second video ($0.40/second, includes native audio)
Claude 4.5 Sonnet	Amazon Bedrock	Improves prompts and explains errors with suggestions	$3 per million input tokens and $15 per million output tokens

Bottom line: Nano Banana is my workhorse at $0.039 per image. Veo 3.1 is expensive but worth it.

Data model (Prisma + SQLite)

The schema tracks four entities: sessions (one per user), media assets (images/videos), actions (create/edit logs), and predictions (async jobs).

Files live on disk under /var/visualneurons/media/{sessionId}/, and the MediaAsset.path field stores the relative path. Each user’s Cognito username maps to exactly one Session, and logging it through different devices/browsers shows the exact same gallery and history per user.

Full data model here on Github.

Backend API (Next.js App router)

Endpoint	Method	Purpose
`/api/images/create`	POST	Generate an image
`/api/images/edit`	POST	Edit an image (optional mask)
`/api/masks/generate`	POST	Grounded SAM precision mask
`/api/videos/create`	POST	Generate a video
`/api/webhooks/replicate`	POST	Receive async video results
`/api/gallery`	GET	List saved assets
`/api/media/[.path]`	GET	Stream/serve a file
`/api/auth/login`	POST	Set httpOnly cookies
`/api/auth/logout`	POST	Clear cookies
`/api/predictions/[id]`	GET	Poll a prediction status

The /api/media/[...path] route does the heavy lifting for serving files: it checks ownership (only your session can fetch your files), sets MIME types, enables CORS for canvas operations, and adds Accept-Ranges: bytes so you can jump to any point in a video without re-downloading it.

End‑to‑end flows

1) Create image

When you press Create, the client calls /api/images/create with { prompt, model }.
On the server I route to one of three generators:

// app/api/images/create/route.ts (excerpt)
if (model === 'imagen4') {
  const r = await generateImageWithImagen4(prompt);           // Replicate
  modelName = 'imagen-4.0-ultra-generate-001';
  provider  = 'google-imagen4';
} else if (model === 'nova-canvas') {
  const r = await generateImageWithNovaCanvas(prompt);        // Bedrock
  modelName = 'amazon.nova-canvas-v1:0';
  provider  = 'aws-nova-canvas';
} else {
  const r = await generateImage(prompt);                      // Nano Banana
  modelName = 'gemini-2.5-flash-image';
  provider  = 'gemini-nano-banana';
}

Resulting images are saved to disk (timestamped filename), recorded as a MediaAsset, and an Action is logged. The API returns a URL like /api/media/<relativePath> for display in the chat.

How Nova Canvas generation is called (Bedrock)

Nova Canvas (via Bedrock) takes a text prompt and returns base64 image data:

// lib/bedrock.ts (excerpt)
const command = new InvokeModelCommand({
  modelId: "amazon.nova-canvas-v1:0",
  contentType: "application/json",
  accept: "application/json",
  body: JSON.stringify({
    taskType: "TEXT_IMAGE",
    textToImageParams: { text: prompt },
    imageGenerationConfig: { numberOfImages: 1, height: 2048, width: 2048, quality: "premium", cfgScale: 8.0 }
  }),
});

How image generation is called (Replicate)

In production I call two Replicate image models:

Imagen 4 Ultra for highest-quality single images
Nano Banana as a fast, inexpensive “always-on” creator (IMO on par if not better than Imagen)

Replicate models return URLs instead. I fetch and convert to base64:

// lib/replicate.ts (excerpt) — Imagen 4 Ultra
export async function generateImageWithImagen4(prompt: string) {
  const output = await replicate.run('google/imagen-4-ultra', {
    input: { prompt, aspect_ratio: '1:1', output_format: 'png' }
  }) as any;

  const url = extractUrlFromOutput(output); // handles array/object/string shapes
  if (!url) throw new Error('Unable to resolve image URL from Imagen 4 Ultra output');

  const res = await fetch(url);
  const buf = Buffer.from(await res.arrayBuffer());
  return { imageData: buf.toString('base64'), mimeType: 'image/png' };
}

// lib/replicate.ts (excerpt) — Nano Banana
export async function generateImage(prompt: string) {
  const output = await replicate.run('google/nano-banana', { input: { prompt } }) as any;

  const url = extractUrlFromOutput(output);
  if (!url) throw new Error('Unable to resolve image URL from Nano Banana output');

  const res = await fetch(url);
  const buf = Buffer.from(await res.arrayBuffer());
  return { imageData: buf.toString('base64'), mimeType: 'image/png' };
}

2) Edit image: prompt or precision mask

There are two ways to edit an image:

Language‑only edit (e.g., “make the hair blonde”, “add a red hat to the cat”, “make it in Van Gogh style”). This is easy. Nova Canvas, Nano Banana, Qwen Image Edit+, SeedEdit 3.0, and Seedream 4 support it. Nova Canvas sucks. Nano Banana is really good. It just “understands” the request and works well. The other models are top notch too, but I have a thing for Nano Banana. What can I say 🤷.

Mask‑guided edit (supported by Nova Canvas only) where you first select the object (cat, sky, shirt) and then apply the instruction. After Grounded SAM returns the mask, I invert it to match Nova Canvas’ inpainting semantics (black = object, white = background), then send both the image and the mask to Nova Canvas. Precision mask: a black‑and‑white image where white = keep/change, black = ignore. I generate masks with Grounded SAM, aka Grounding DINO & Segment Anything, then optionally edit it manually within the UI.

Notice how Grounded SAM spots the smaller flowers on the background too! These are included in the mask. I then take care of manually editing the mask with a brush.

How does precision masking work behind the scenes?

Traditional segmentation has a chicken-and-egg problem: Segment Anything (SAM) can segment perfectly but needs you to tell it where to look. Object detectors can find things but only for trained classes (person, car, dog).

Grounded SAM solves this by combining Grounding DINO (finds objects from text prompts) with SAM (traces exact contours).

The two-stage process:

Grounding DINO – “Find it with language”

Input: your text prompt (“person”)
Output: bounding boxes for every match + confidence scores
Why it works: vision-language model that “grounds” text to visual features (zero-shot)

SAM – “Segment it precisely”

Input: bounding boxes from DINO
Output: pixel-perfect masks (white = object, black = background)
Why it works: analyzes content inside each box and traces exact contours

Concrete example: Photo of 5 people at a party, prompt “person”

DINO finds 5 bounding boxes: [(100,150,250,400), (300,120,420,390), ...]
SAM traces each person’s exact outline, one mask per box
Result: 5 masks combined into one

After Grounded SAM returns the mask, I invert it to match Nova Canvas semantics (black = change, white = preserve) and send both the image and mask to Nova Canvas for editing.

Grounded SAM Gotchas:
Getting this to work took three debugging sessions:

Parameter names: the API expects mask_prompt, not prompt (subtle!)
MIME types: I was hardcoding all images as PNG but uploaded a JPEG—silent failure
Output selection: Grounded SAM returns 4 URLs (annotated, negative, mask, inverted). I was blindly taking the first one (the annotated visualization) instead of the pure mask.

Each bug was non-obvious and took Claude several attempts to figure out. Read the API docs carefully!

3) Generate video (with optional webhooks)

For video I use Veo 3.1. Two modes:

Standard: optional first/last frame, custom duration.
Reference (R2V): pick 1–3 reference images → 8s, 16:9 locked, last frame ignored.

If WEBHOOK_BASE_URL is set, Replicate pings /api/webhooks/replicate when the video is ready. The webhook downloads the MP4, saves it, and creates a MediaAsset (saved=true). You can close the tab and come back later.

One gotcha: my auth middleware initially blocked webhooks (no cookies = redirect to /login). This caused videos to generate successfully on Replicate but never appear in my gallery. The webhook requests were getting 307 redirects, so they never processed. Debugging this was frustrating—the Replicate dashboard showed “completed” but my gallery stayed empty. The fix: whitelist /api/webhooks in middleware.ts. I even had to write a recovery script to manually fetch videos that got stuck during debugging.

Veo 3.1 is truly amazing. See how I provided first and last frame, then prompted it with the rather general The frame tragically evolves into the grim last frame. Google does the rest.

Fun fact: both the first and the last frame are AI generated. The first image I created is the dark one. Prompted Nano 🍌 with A single comic book panel in a gritty, noir art style with high-contrast black and white inks. In the foreground, a detective in a trench coat stands under a flickering streetlamp, rain soaking his shoulders. In the background, the neon sign of a desolate bar reflects in a puddle. A caption box at the top reads \"The city was a tough place to keep secrets.\" The lighting is harsh, creating a dramatic, somber mood. Landscape.

Then I turned into Edit mode and prompted Nano 🍌 with Change the theme of the image to make it like Asterix and Obelix. Mind-blown by the results.

Sound on! 🔉

4) Claude 4.5 via Bedrock: my “prompt coach” and “smart debugger”

I use Claude 4.5 Sonnet in two places:

1) Proactive prompt improvement (“Make my prompt better”)

When you type a prompt (create, edit, or video), a small ✨ Improve button automatically gets displayed. It sends your prompt plus context (mode, selected image, optional video frames) to Claude 4.5 Sonnet’s vision model. The key: I use Bedrock’s Converse API with the vision-capable model eu.anthropic.claude-sonnet-4-5-20250929-v1:0. For edit and video modes, I attach images so Claude can “see” what I am working with:

const response = await client.send(new ConverseCommand({
  modelId: "eu.anthropic.claude-sonnet-4-5-20250929-v1:0",
  messages: [{ role: "user", content: contentArray }],
  inferenceConfig: { maxTokens: 400, temperature: 0.5 }
}));

For images, I add them to the content array:

contentArray.push({
  image: { format: "png", source: { bytes: new Uint8Array(binaryData) } }
});

The system instruction forces a clean output format that I can parse easily in the UI:

TIPS: [short advice]
EXAMPLE1: [improved prompt]
EXAMPLE2: [alternative style]

I parse these three fields and surface them right under the original text. Click “Use” and EXAMPLE1 replaces the original prompt. Why this works: Claude sees prompt AND image/video frames, then replies with two better versions plus tips. Faster iterations, better outputs.

2) Reactive error help (“Why did my edit fail?”)

When an image edit fails (model rejected an invalid mask, vague instruction, backend error, etc.), I ask Claude to explain what went wrong and suggest a fix. Same Bedrock client, same model, different system prompt. This feeds the “what happened + try this instead” box in the UI.

Frontend notes

Pages & structure. The app uses Next.js App Router with one chat-style home, a gallery, a usage dashboard, and a login page. The API lives under app/api/*. (See the tree below.)

app/
├─ page.tsx              # chat-like home
├─ gallery/page.tsx
├─ usage/page.tsx
├─ login/page.tsx
└─ api/
   ├─ images/{create,edit,save,[id]...}
   ├─ masks/generate
   ├─ videos/create
   ├─ gallery
   ├─ media/[...path]
   ├─ predictions/[id]
   ├─ auth/{login,logout}
   └─ webhooks/replicate

The key insight: Next.js App Router co-locates API routes with pages. Everything in one place, no separate “backend” folder.

Login page
Chat page in Create mode
Chat page in Edit mode
Gallery with saved assets
Usage and costs

Core UX decisions

Mode switcher: Create / Edit selected / Video toggle at the top. The chat auto-scrolls, shows media inline, and persists locally.

Prompt assist: The ✨ button asks Claude for better phrasing before you submit. If a request fails (e.g., “make hair blonde” on a landscape), Claude suggests fixes.

Image selection: Click “Edit This” on any image to select it (button turns green with “✓ Selected”). Click again to deselect. If nothing is selected, an “Upload Image” button appears.

Precision masking: Three-button flow under Edit mode: Regenerate, Edit Mask, Use This Mask. Type a word like “person”, Grounded SAM finds all matches, returns a black/white mask. Black = change, white = preserve. (We invert to Nova Canvas convention under the hood.)

Manual mask editor: If SAM over/under-selects, click Edit Mask to open a canvas tool with brush, eraser, size slider, undo/redo, keyboard shortcuts (B, E, Ctrl+Z, Esc). The mask editor is a full-featured canvas app: dual-layer architecture (original image + editable overlay), 20-state undo/redo history, auto-scaling for performance, and smooth brush/eraser tools. All client-side using vanilla Canvas API—no external dependencies. This was one of the most complex features to build, but essential for pixel-perfect control.

Media streaming: All images/videos stream through /api/media/[...path], which checks ownership, sets correct MIME types, enables CORS for canvas reads, and adds Accept-Ranges: bytes for video seeking.

Video polling: While webhooks are preferred, the frontend can poll /api/predictions/[id] to update status.

Frontend data flow (a couple of diagrams)

Frontend data flow (three key operations). Each operation follows a similar pattern:

User action in browser
POST to API route
Provider call (Replicate or Bedrock)
Save result to disk + database
Display in chat, optionally save to gallery

Here’s the detailed flow for each:

[User in Browser]
     |
     | 1) CREATE
     |    - Choose model (Imagen 4 / Nova Canvas / Nano Banana)
     |    - Type prompt → Click "Create"
     v
[POST /api/images/create]
     |
     |→ Provider call (Replicate or Bedrock)
     |→ Bytes returned (URL or base64)
     |→ saveMediaFile() → /var/visualneurons/media/<session>/
     |→ prisma.mediaAsset.create() + actions log
     v
[Chat timeline shows image]
     |
     |  Click "Save" → saved=true
     v
[Gallery lists saved assets only]

[User in Browser]
     |
     | 2) EDIT (with optional precision mask)
     |    - Click "Edit This" (✓ Selected) or Upload
     |    - (Optional) Generate Precision Mask:
     |          POST /api/masks/generate → Grounded SAM
     |          → pure black/white mask (invert for Nova Canvas)
     |          → (Optional) Manual Mask Editor (brush/eraser; undo/redo)
     |    - Type instruction → Click "Edit selected"
     v
[POST /api/images/edit]  (Nova Canvas or Replicate editors)
     |
     |→ Provider call with image (+ mask if provided)
     |→ saveMediaFile() → prisma.mediaAsset.create()
     v
[Edited image appears in chat] → "Save" → [Gallery]

[User in Browser]
     |
     | 3) VIDEO
     |    - Choose Standard or Reference (R2V)
     |    - (Optional) First/Last frame
     |    - Prompt → Click "Video"
     v
[POST /api/videos/create]
     |
     |→ Replicate job created (predictionId)
     |→ EITHER:
     |     a) Poll: GET /api/predictions/{id}
     |     b) Webhook: POST /api/webhooks/replicate  (preferred)
     |          → server downloads MP4
     |          → saveMediaFile() + prisma.mediaAsset.create(saved=true)
     v
[Chat shows completion + video tile] → [Gallery lists it]

Authentication with AWS Cognito (ELI5)

Visual Neurons uses AWS Cognito for user management. You sign up, log in, and Cognito handles everything: password storage, session tokens, “remember me” functionality.

How it works

You log in with email/password
Cognito validates credentials and issues a JWT token
Your browser stores the token in an httpOnly cookie
Every API request includes this cookie for authentication
The server maps your Cognito username to a Session in the database

Why the session mapping matters: Your gallery, usage stats, and generated media are tied to your Session, not just your cookies. This means your stuff persists across devices—log in from your phone and you’ll see the same gallery.

One gotcha: Visual Neurons is a single-page app (SPA) that runs entirely in the browser. SPAs can’t securely store secrets, so when configuring Cognito, you must create an “SPA” app client (without a client secret). If you accidentally create a client with a secret, Cognito will expect a SECRET_HASH that the browser can’t provide, and authentication will fail.

The middleware checks for valid cookies on every request and redirects to `/login` if missing. Webhooks are whitelisted so Replicate can ping the server without auth.

Deployment: nginx, TLS, and process management

Quick definitions

nginx: reception desk that terminates TLS. E.g. it handles HTTPS (certificates, encryption) and forwards plain requests to Next.js on localhost:3000
TLS: the protocol that puts the padlock in your browser (encrypts traffic)
Reverse proxy: nginx doesn’t serve the app; it forwards to Next.js on localhost:3000

What runs in prod

EC2 instance with nginx
Free TLS via Let’s Encrypt
Next.js server in a screen session: screen -dmS visualneurons npm run dev
CORS + streaming served by /api/media/...
Helper scripts in scripts/ (nginx install, SSL, Cognito setup)

From S3 static site → EC2 app (Route53 + nginx + TLS)

Goal: Point my existing domain visualneurons.com (which used to serve a static S3 website) to the EC2 instance running the Visual Neurons app behind nginx and HTTPS.
Why: I want a site that’s always on — I can open a browser any time, log in, and create/edit images or videos without starting local processes. The single EC2 + nginx approach gives me that.

1) Switch DNS in Route53 (A record → EC2 public IP)

I updated the A record to point to my EC2 public IP (TTL 300). Added a CNAME for www. Confirmed with dig visualneurons.com.

2) Put nginx in front (reverse proxy on ports 80/443)

Installed nginx using the helper script: sudo bash scripts/setup-nginx.sh

The nginx config (in scripts/nginx-config.conf) does three things: 1️⃣ Redirects HTTP → HTTPS 2️⃣ Proxies everything to Next.js on localhost:3000 3️⃣ Sets generous timeouts for long AI operations. Key settings:

upstream nextjs_backend {
  server 127.0.0.1:3000;
  keepalive 64;
}

# Proxy with long timeouts for AI jobs
location / {
  proxy_pass http://nextjs_backend;
  proxy_connect_timeout 600s;
  proxy_read_timeout 600s;
  proxy_buffering off;
}

Source: scripts/nginx-config.conf

Remember: nginx is the reception desk. It answers the internet on ports 80 and 443, handles the security certificate, and forwards the request to the app quietly running on localhost:3000 (not directly exposed). The user never touches port 3000.

3) Add HTTPS with Let’s Encrypt (certbot)

Once DNS pointed to the instance, I issued a free TLS certificate for both visualneurons.com and www.visualneurons.com:

sudo bash scripts/setup-ssl.sh
# this runs: certbot --nginx -d visualneurons.com -d www.visualneurons.com

The script verifies nginx, obtains the cert, configures the HTTPS server block, and tests auto‑renewal. If DNS isn’t pointing to the EC2 IP yet, it’ll remind you to fix that and try again.

4) Keep the app running (24/7)

I run the Next.js server in a screen session so it keeps running even when I log out of SSH:

# Start in background (stays up after you disconnect)
screen -dmS visualneurons npm run dev

# Check status / view logs
screen -ls
screen -r visualneurons

This simple process model — nginx in front, Next.js on localhost:3000, screen keeping it alive — means the app is accessible any time at https://visualneurons.com. No laptop needed, no tunnel, no local dev. Just open the site, log in with Cognito, and create.

Why I like this: a single EC2 instance with nginx gives me a “personal studio” that’s always on, with proper TLS and clean URLs. When inspiration strikes, I just navigate to the site and it’s ready. (Security group notes: only 80/443 are public; 22 is locked to my IP; 3000 stays closed or restricted.)

Why dev mode in production?
For a single-user app, dev mode is perfect: hot reload, full logs, no build step, instant iteration. Performance doesn’t matter when it’s just me. If I ever scale to multiple users, I can switch to npm start with zero code changes.

5) Quick checklist (for future me)

Route53 A record → EC2 IP (and www CNAME or A) → verify with dig.
sudo bash scripts/setup-nginx.sh → validates config and enables site.
sudo bash scripts/setup-ssl.sh → gets cert for apex + www.
screen -dmS visualneurons npm run dev → app stays up after SSH logout.
Security group: open 80/443 to world, 22 to my IP, 3000 closed/restricted.

How this fits the overall flow

Browser → https://visualneurons.com (443)
Route53 → resolves to EC2 IP
Security Group → allows 80/443
nginx on EC2 → TLS + reverse proxy
Next.js on EC2 (localhost:3000, in screen) → serves UI + API

That’s the entire path from the public web to the app you see. It’s simple, reliable, and it keeps the app available 24/7.

Usage & cost tracking

The Usage page shows counts and an estimated cost by provider/model. Pricing constants live in lib/pricing.ts.

// lib/pricing.ts (excerpt)
export const API_PRICING = {
  IMAGEN_4_ULTRA: 0.06,            // per image
  NANO_BANANA:    0.039,
  NOVA_CANVAS_PREMIUM_2K: 0.08,
  VEO_3_1_PER_VIDEO: 3.20,
  // ...
};

Usage & Costs dashboard with numbers aggregated by different time frames.

It’s a wrap

Visual Neurons exists because I got tired of switching between platforms. Each with its own account, billing, and quirks. I wanted one studio with all the models. Three gotchas from building it:

Complexity kills vibe-coding. Amplify and Vercel looked great on paper but became debugging nightmares. The moment I switched to local EC2 development, the project took off. Claude could finally see what I saw.
Planning beats prompting. No matter how smart the LLM, it won’t build a feature end-to-end without careful task breakdowns. Cursor’s planning mode became essential.
Simple scales. Prisma + SQLite + files on disk. No S3, no DynamoDB, no complexity. The whole thing runs on a $40/month t3.xlarge and handles everything I throw at it. The app has generated 128 operations costing $62 total. Nano Banana is my workhorse at $0.039 per image. Veo 3.1 is expensive at $3.20 per video but worth every penny.

Lines of code I wrote personally: zero. Time invested: 8 months (with long breaks). Was it worth it? Absolutely. Code on GitHub.

And yes, Nano Banana is still my favourite. Fight me. 🍌

Twitter

Building Visual Neurons: image + video generation with Next.js, Replicate, and Amazon Bedrock