How I use AI to modernize legacy projects without breaking anything
We've all inherited that project. The one nobody wants to touch. The one with 500-line functions, no tests, no documentation, and that somehow works in production.
Before, refactoring something like that was a gamble: you prayed not to break anything and crossed your fingers on every deploy. Now, with AI as a copilot, the process is different. Not magical, but safer and more methodical.
I've been using this approach on real projects for a while and I want to share the methodology that has worked for me.
The most common mistake: starting to refactor directly
The temptation is strong. You see that spaghetti code and you want to fix it now. But without understanding the business and without tests, you're walking blind.
I've seen (and made) this mistake: you refactor something that "clearly is wrong," only to discover that weird behavior was intentional and half the system depended on it.
AI amplifies both your successes and your mistakes. If you don't know what the code does, AI won't guess it either.
Step 1: Document the business first
Before touching a single line of code, you need to understand what the system does. Not the how (you see that in the code), but the what and the why.
If there's no documentation, create it. And here AI is tremendously useful.
How I do it
I don't start with the code. I start with the database.
The data model tells the business story better than any code. Tables, relationships, fields... all of that reflects business decisions someone made at some point.
I pass the database schema to AI and ask for reverse engineering:
Analyze this database schema and explain:
1. What business entities it represents
2. How they relate to each other
3. What business rules you can infer from constraints and fields
4. What likely business flows exist
5. Generate an entity-relationship diagram in Mermaid format
[tables/collections schema]
AI generates something like this:
erDiagram
USER ||--o{ ORDER : places
ORDER ||--|{ ORDER_ITEM : contains
ORDER ||--o{ ORDER_STATUS_HISTORY : has
ORDER }o--|| PAYMENT : "paid by"
PRODUCT ||--o{ ORDER_ITEM : "included in"
USER {
string id PK
string email
string name
datetime created_at
}
ORDER {
string id PK
string user_id FK
string status
datetime paid_at
datetime shipped_at
datetime cancelled_at
}
You include this diagram directly in your markdown documentation and it renders automatically. It's gold for understanding the system at a glance.
For example, if you see an orders table with fields like status, paid_at, shipped_at, cancelled_at, you already know there's an order status flow. If there's an order_status_history table, you know they need audit trails for status changes.
Once I understand the data model, I move to the code. I pass sections of code and ask it to explain the business logic:
Analyze this controller and explain:
1. What business process it represents
2. What business rules are implicit in the code
3. What edge cases it's handling
4. What external dependencies it has
[controller code]
The combination of data model + code gives you a much more complete picture than either one alone.
AI doesn't always get it 100% right, but it gives you a starting point to validate with the team or stakeholders.
The deliverable
A markdown document with:
- General module description
- Main business flows
- Identified business rules
- Dependencies and integrations
- Special cases or exceptions
This document has a dual purpose: it helps you now and helps future developers later.
Step 2: Create tests before refactoring
This is the golden rule: never refactor code without tests that validate the current behavior.
I'm not talking about perfect unit tests with mocks for everything. I'm talking about tests that capture the real system behavior, including the database.
My approach: integration tests with a real database
In a recent project with MongoDB without domain classes (direct queries everywhere), trying to mock was impossible. The logic was so coupled to the document structure that any mock would be a lie.
The solution was to create integration tests with a real test database:
// test/integration/users/get-user-by-id.e2e-spec.ts
describe('UserController - getUserById', () => {
let app: INestApplication
let testDb: TestDatabase
beforeAll(async () => {
testDb = await TestDatabase.create()
app = await createTestApp(testDb)
})
afterAll(async () => {
await testDb.cleanup()
await app.close()
})
beforeEach(async () => {
await testDb.clear()
})
it('should return user when exists', async () => {
// Arrange
const userId = await testDb.insertUser({
name: 'John Doe',
email: 'john@example.com',
status: 'active',
})
// Act
const response = await request(app.getHttpServer())
.get(`/users/${userId}`)
.expect(200)
// Assert
expect(response.body).toMatchObject({
name: 'John Doe',
email: 'john@example.com',
status: 'active',
})
})
it('should return 404 when user does not exist', async () => {
const fakeId = new ObjectId().toString()
await request(app.getHttpServer()).get(`/users/${fakeId}`).expect(404)
})
})
Tests per method, not per controller
An important tip: create one test file per controller method, not one giant file for the whole controller.
Why?
- Easier to debug: if something fails, you know exactly where
- Easier to create incrementally: you can add tests one by one
- Easier to maintain: each file has a clear scope
The structure looks like this:
test/
integration/
users/
get-user-by-id.e2e-spec.ts
create-user.e2e-spec.ts
update-user.e2e-spec.ts
delete-user.e2e-spec.ts
list-users.e2e-spec.ts
Use AI to generate the tests
This is where AI shines. I pass the method code and ask:
Generate integration tests for this method.
Include:
- Success case
- Error cases (not found, validation, etc.)
- Edge cases you see in the code
Use this format: [test example]
The database is MongoDB, use TestDatabase to insert test data.
Review what it generates, adjust for your context, and run it against the real system to validate.
Step 3: Refactor with the safety net
Now yes, with documentation and tests in place, you can start refactoring.
Order matters
Don't try to refactor everything at once. This is the order that has worked for me:
1. Extract the data layer first
Create well-defined repositories by responsibility:
// Before: direct query in the controller
const user = await this.db.collection('users').findOne({ _id: userId })
// After: dedicated repository
const user = await this.userRepository.findById(userId)
Each repository should have a single responsibility. If you have a UserRepository that also handles roles and permissions, split it.
2. Extract business logic to services
Once the data is encapsulated, extract the logic to specialized services:
// Before: everything in the controller
@Get(':id')
async getUser(@Param('id') id: string) {
const user = await this.db.collection('users').findOne({ _id: id })
if (!user) throw new NotFoundException()
const permissions = await this.db.collection('permissions')
.find({ userId: id }).toArray()
return {
...user,
permissions: permissions.map(p => p.name),
isAdmin: permissions.some(p => p.name === 'admin')
}
}
// After: separated into services
@Get(':id')
async getUser(@Param('id') id: string) {
return this.userFacade.getUserWithPermissions(id)
}
3. Convert the controller into a facade
The controller (or the main service if you use that pattern) becomes an orchestrator that delegates to specialized services:
// user.facade.ts
@Injectable()
export class UserFacade {
constructor(
private readonly userService: UserService,
private readonly permissionService: PermissionService,
private readonly notificationService: NotificationService
) {}
async getUserWithPermissions(id: string): Promise<UserWithPermissionsDto> {
const user = await this.userService.findById(id)
const permissions = await this.permissionService.getForUser(id)
return {
...user,
permissions: permissions.map((p) => p.name),
isAdmin: permissions.some((p) => p.name === 'admin'),
}
}
}
I use the Facade suffix to make it clear that it's just an orchestrator, not where the real logic lives.
Update tests as you add dependencies
Every time you extract a new dependency, you need to add it to the tests. A tip: create a helper that groups the test module configuration:
// test/helpers/create-test-module.ts
export async function createUserTestModule(testDb: TestDatabase) {
return Test.createTestingModule({
imports: [
/* common imports */
],
controllers: [UserController],
providers: [
UserFacade,
UserService,
PermissionService,
UserRepository,
PermissionRepository,
{ provide: 'DATABASE', useValue: testDb.getConnection() },
],
}).compile()
}
This way when you add a new dependency, you only add it in one place.
Step 4: Refactor the tests
Once the code is clean and everything works, it's time to clean up the tests.
During refactoring, tests probably ended up with duplicate code, improvised helpers, and inconsistent structures. Now is the time to organize them.
Important: refactor the tests without changing their functionality. Tests should continue to validate exactly the same things, just with cleaner code.
This includes:
- Extracting common helpers
- Unifying the arrange/act/assert structure
- Improving test names
- Removing duplicate code
Step 5: Document the results
The last step is to create a markdown file with the test coverage summary:
# Users Module Tests
## Coverage
| Endpoint | Cases covered | Status |
| ----------------- | ------------- | ------ |
| GET /users/:id | 5 | ✅ |
| POST /users | 8 | ✅ |
| PUT /users/:id | 6 | ✅ |
| DELETE /users/:id | 4 | ✅ |
## Validated business cases
- User creation with unique email validation
- Automatic permission assignment by role
- Soft delete with history preservation
- ...
## How to run
\`\`\`bash
npm run test:e2e -- --grep "UserController"
\`\`\`
This document serves as a living contract of the system's behavior.
Additional considerations
Don't blindly trust AI
AI is a copilot, not the pilot. Review everything it generates, especially:
- Business logic it might misinterpret
- Edge cases it doesn't detect
- Code that "works" but doesn't follow project conventions
Make small and frequent commits
Each step should be a commit:
- "docs: document user creation flow"
- "test: add integration tests for getUserById"
- "refactor: extract UserRepository"
- "refactor: extract PermissionService"
If something goes wrong, you can revert without losing days of work.
Keep the system working at all times
The most important rule: after each change, tests must pass and the system must work. If you break something, fix it before continuing.
This means you can deploy at any point in the process. You're not in a "broken state" for weeks.
Communicate progress
If you work on a team, keep everyone informed. A silent refactor generates merge conflicts and frustration.
A simple Slack message is enough:
"I'm refactoring the users module. I'm going to extract the repositories today. If you need to make changes there, let me know to coordinate."
Don't seek perfection
The goal is not perfect code, it's better code. If the legacy code was a 2/10 and you leave it at 7/10, that's a huge success.
You can always keep improving later, but now you have tests and documentation that make that much safer.
Process summary
- Document the business — Use AI to understand what the system does and create documentation
- Create integration tests — Capture real behavior before touching anything
- Refactor in layers — Data first, then logic, then facade
- Clean up the tests — Once everything works, organize the test code
- Document the results — Leave a summary of what's covered
AI accelerates each of these steps, but the methodology is what gives you safety. Without tests, you're guessing. Without documentation, the next developer (or you in 6 months) will start from scratch.
Legacy code doesn't have to be scary. With the right approach and the right tools, modernizing it is just a matter of time and discipline.
David Alfonso Pereira
24/12/2025