Why We Moved AI Workflows Off the API Server

In the early days of Eloovor, a user clicked "Analyze Company" and waited. The spinner kept spinning. Then we deployed a new version of the API and the job silently died. The user refreshed. The analysis was gone. That moment made the problem painfully clear: long running AI tasks do not belong inside a request.

AI powered workflows are heavy. They pull in context, run model calls, and store structured results. When that happens inside your API server, everything becomes fragile.

The problem with synchronous AI

Running AI analysis in the request path creates a few predictable failures:

Memory spikes when multiple analyses run at once
Timeouts for users and for upstream services
Deploys that interrupt running work
Lack of visibility when something fails

Even when the system works, it feels slow. The UI waits on the server, and the server waits on the model. That makes the whole product feel less reliable than it actually is.

The move to background jobs

We decided to move AI analysis into background jobs using Trigger.dev. That let the API stay fast and predictable while the heavy work ran in an isolated environment.

The flow now looks like this:

The API receives a request and stores a new analysis record.
A Trigger.dev job picks up the work in the background.
Results are written back when the job completes.

Users still get the analysis, but the UI no longer blocks and the server no longer risks timeouts.

What a job actually does

A typical analysis job includes several steps:

Gather context from the job description and user profile
Pull in relevant company or market signals
Build a structured prompt with guardrails
Run model calls and validate the output
Store results and update status

Each step can fail for different reasons. A background job gives us the space to handle those failures gracefully without freezing the product.

Why Trigger.dev helped

Trigger.dev gives us a reliable runtime for background work. It provides scheduling, retries, and observability so we can see exactly where a job is failing.

The change was simple, but the impact was large:

The API stays responsive even under load.
Deploys no longer interrupt long running work.
Failed tasks can retry safely without custom logic.
We get visibility into every step of the pipeline.

This is less about raw performance and more about reliability. A job search platform has to feel dependable, even when the underlying work is complex.

Designing for background jobs

Moving work off the API is not just a tool change. It requires a different mindset:

Jobs need to be idempotent so retries are safe.
Status updates should be visible to the user.
Errors should be captured with enough context to fix quickly.
Long tasks should be broken into clear, observable steps.

Trigger.dev gives us the infrastructure to do this well, but the product still needs to communicate clearly about what is happening behind the scenes.

The UX side of reliability

A background job only feels reliable if the user understands what is happening. That is why we show progress states and clear status labels. When a job finishes, the UI updates in a predictable way. When a job fails, we surface an explanation and a way to retry.

This small layer of communication changes how the system feels, even when the underlying work is complex.

When to use background jobs

Not every task needs to be a background job. We treat work as asynchronous when it is slow, depends on external services, or needs retries. That includes AI analysis, multi step data gathering, and anything that could take longer than a typical request.

Fast tasks stay synchronous. This keeps the product responsive and avoids unnecessary complexity.

What we learned after the move

The migration was not just a technical refactor. It changed how we design features. We now think in terms of job states, user feedback, and clear success criteria. We built better status tracking. We improved error messages. We created a more predictable experience for users who might be running multiple analyses at once.

In short, the move forced us to design the product for reality, not just for best case scenarios.

Scaling without surprises

Background jobs also make scaling more predictable. When usage spikes, the API does not collapse under heavy analysis traffic. Instead, jobs are queued and processed in a controlled way. That is healthier for the system and more respectful to users, because the experience degrades gracefully instead of failing abruptly.

It also simplifies operations. We can tune job concurrency without touching the API, and we can separate system health from long running analysis workloads. That separation makes performance easier to reason about and lowers the risk of cascading failures.

The result

Moving AI workflows off the main server was not just a technical optimization. It made the system more reliable, which made the product more trustworthy. When a user asks for an analysis, we want them to feel confident it will finish, even if we deploy five minutes later.

That is the real goal.