Introduction: The "Day 2" Problem
In the first volume of this journal, we addressed the immediate bleeding: the hour-long compile times and the CI pipelines that drained our budget. We implemented aggressive caching, governed our dependency graph, and optimized our Angular compiler settings. We successfully brought our build times down from hours to minutes.
But as any architect who has survived a major migration knows, "Day 1" is just the deployment. "Day 2" is when the reality of operations sets in.
Once the initial euphoria of fast builds wore off, we faced a new, more insidious set of challenges. We call this the "Day 2 Monorepo Hangover."
- The Silent Cache Decay: Why did the build take 2 minutes on Tuesday but 15 minutes on Friday, with no significant code changes?
- The "Works on My Machine" Schism: Why did a unit test pass on a developer’s M1 Mac but segfault on the Linux CI runner?
- The Coupling Trap: Why did a CSS change in the "Settings" page force a redeployment of the "checkout" flow?
Speed without stability is chaos. This continuation of the journal documents our journey from Optimization to Resilience. We are moving beyond just making things fast; we are focusing on making them observable, consistent, and scalable via Federation.
Part IV: The Observable Monorepo (Governance & Metrics)
If you cannot measure it, you cannot optimize it. In the early days, we relied on anecdotal evidence ("The build feels faster"). At enterprise scale, feelings are not metrics. We needed hard data to defend our architectural decisions and to catch regressions before they stalled the engineering team.
1. The Cache Hit Ratio (CHR) Dashboard
The remote cache (implemented in Part I via S3) is the lifeblood of the monorepo. However, it is fragile. A single developer accidentally committing a change to a global file (like .eslintrc.json or package-lock.json) can invalidate the cache for the entire organization, forcing a "rebuild the world" scenario.
We realized we were flying blind. We didn't know when the cache was missed, only that the pipeline was suddenly slow.
The Solution: Telemetry Extraction Nx provides powerful cloud dashboards, but due to our strict data residency requirements, we needed to own the data. We implemented a custom telemetry pipeline.
We utilized the NX_VERBOSE_LOGGING=true flag and a custom Node.js script that parses the execution output in our CI pipeline. We extract:
- Total Tasks Scheduled: The size of the graph for this PR.
- Tasks Retrieved from Cache: The savings.
- Tasks Executed: The cost.
- Cache Miss Roots: Which specific projects triggered the misses.
We push this data to our internal Grafana instance via InfluxDB.
The Metric that Matters: The 7-Day Moving CHR We established a Key Performance Indicator (KPI): The 7-Day Moving Cache Hit Ratio.
- Target: > 75%
- Critical Alert: < 50%
When the CHR dips below 50%, an alert fires in our #dev-ops Slack channel. This allows us to immediately investigate the commit that "broke the cache." Usually, it’s a benign mistake—a developer adding a comment to tsconfig.base.json—but catching it early prevents days of lost productivity.
2. Combating Runtime Drift
As our team grew to 50+ engineers, the variance in local development environments became a massive liability.
- Dev A was running Node 18.10.
- Dev B was running Node 20.2 (LTS).
- The CI Server was running Node 16 (Legacy).
We encountered a "phantom bug" where a specific Angular schematic would fail only on CI. The root cause? A subtle difference in how npm handled peer dependencies between Node versions.
The Solution: Strict Environmental Determinism We stopped treating the environment as a suggestion and started treating it as part of the codebase.
A. Engines and Corepack We added strict engine locking to the root package.json.
{
"name": "enterprise-monorepo",
"engines": {
"node": "18.16.0",
"pnpm": "8.6.0"
},
"packageManager": "pnpm@8.6.0"
}We then enabled Node.js Corepack in our setup scripts. Corepack ensures that if a developer tries to run pnpm install, it silently downloads and uses the exact version specified in package.json, regardless of what is installed globally on their machine.
B. The Hermetic Build (Dev Containers) For onboarding, we adopted the Dev Container standard (.devcontainer). This allows a new hire to open the repository in VS Code, click "Reopen in Container," and instantly be inside a Dockerized environment that is a bit-perfect clone of our CI runner.
// .devcontainer/devcontainer.json
{
"name": "Enterprise Angular Monorepo",
"image": "mcr.microsoft.com/devcontainers/typescript-node:18",
"features": {
"ghcr.io/devcontainers/features/angular-cli:1": {},
"ghcr.io/devcontainers/features/docker-in-docker:2": {}
},
"customizations": {
"vscode": {
"extensions": [
"nrwl.angular-console",
"dbaeumer.vscode-eslint",
"esbenp.prettier-vscode"
]
}
},
"postCreateCommand": "pnpm install"
}This killed "It works on my machine" permanently. If it works in the Dev Container, it works on CI.
Part V: Scaling the Developer (Distributed Task Execution)
In Part III, we parallelized our CI pipeline using matrix strategies. This solved the bottleneck for the build server, but it left our developers behind.
A developer running nx test on a large feature branch might trigger 400 unit tests. Even with cache, this could take 15 minutes on a local laptop, causing thermal throttling and creating a context-switching nightmare. We reached the physical limits of the MacBook Pro.
We needed to decouple the request for computation from the execution of computation.
1. Distributed Task Execution (DTE) vs. Binning
Initially, we tried "Binning" (sharding). We wrote scripts to split our tests into 4 chunks and run them on 4 different CI agents.
- Problem: Binning is dumb. If Bin A has one slow E2E test and Bin B has 50 fast unit tests, Bin B finishes in 10 seconds while Bin A takes 10 minutes. The total build time is defined by the slowest bin.
We moved to Distributed Task Execution (DTE).
How DTE Changed Our Workflow With DTE enabled (via Nx Agents), the workflow shifts:
- Local Command: Developer types
nx test --affected. - Graph Calculation: The local CLI calculates the graph and hashes all inputs.
- Cloud Coordination: The CLI sends the graph to the Nx Cloud coordinator.
- Distribution: The coordinator spins up dynamic agents (we configured 15 agents for our pool). It assigns tasks to agents in real-time. As soon as an agent finishes a task, it grabs the next one.
- Streaming: The logs and artifacts stream back to the developer’s terminal instantly.
The Result: A test suite that took 20 minutes locally now finishes in 2 minutes, because it is running on 15 machines simultaneously.
2. The Flaky Test Dilemma
Parallelism exposes weakness. Tests that passed when run sequentially often fail when run in parallel due to shared resources (e.g., two tests trying to bind to port 4200 simultaneously or writing to the same temporary file).
We had to architect for Test Isolation:
- Port Zero: We reconfigured our Angular E2E setups (Cypress/Playwright) to use
port: 0(system-assigned random port) rather than a hardcoded4200. - Database Teardowns: We enforced strict
beforeEachandafterEachhooks to clean up mocked states.
Handling Flakiness via Retries Despite our best efforts, 0.1% of tests remained flaky in a distributed environment. In a monorepo with 10,000 tests, a 0.1% failure rate means every build fails. We configured the nx.json to handle this gracefully:
// nx.json
"targetDefaults": {
"test": {
"configurations": {
"ci": {
"ci": true,
"codeCoverage": true,
"retry": 2 // The crucial addition
}
}
}
}This simple configuration allows the runner to retry a failed task twice before declaring a failure. This eradicated the noise from transient network blips on the agents.
Part VI: The Endgame — Architectural Independence via Module Federation
We had achieved speed (Caching), stability (Governance), and scale (DTE). But we still had a Deployment Coupling problem.
Our application was a monolithic Angular build. It had grown to 8MB of JavaScript.
- The Problem: The "Checkout" team wanted to deploy a critical hotfix. To do so, they had to build the entire application. This meant they were blocked if the "Inventory" team had a broken build, even though the two features were unrelated.
- The Risk: A deployment meant replacing the entire artifact. One bad line of code in a utility library could bring down the whole platform.
We needed to break the monolith. We chose Webpack Module Federation.
1. Shifting to Vertical Slicing
We restructured our Nx graph. We moved away from a single "App" importing "Libraries." We created a Shell and Remote architecture.
- The Shell (Host): A lightweight Angular application. It contains only the top-level navigation, the authentication guard, and the footer. It knows nothing about business logic.
- The Remotes (Micro-Frontends):
mfe-inventorymfe-user-profilemfe-checkoutmfe-dashboard
Each remote is a standalone Angular application. It has its own project.json, its own tests, and crucially, its own CI/CD pipeline.
2. The Configuration Strategy
The complexity of Module Federation lies in the configuration. If the Host and Remote disagree on shared dependencies, the application crashes at runtime with cryptic NullInjectorError messages.
We used the @nx/angular:module-federation executor to generate our configs, but we had to customize the Shared Mappings.
The Singleton Problem Angular relies on Singletons for services like the Router, HttpClient, and NgZone. If the Shell loads Angular v17.1 and the Remote loads Angular v17.2, Webpack might instantiate two versions of Angular. This breaks change detection and routing.
The Monorepo Advantage This is where the Monorepo shines over a "Polyrepo" micro-frontend approach. Because all our MFEs live in the same repo, they share a single package.json. We guarantee that the Shell and Remotes are using the exact same version of Angular.
We created a shared configuration factory to enforce this:
// libs/shared/mfe/module-federation.config.ts
import { ModuleFederationConfig } from '@nx/webpack';
export const config: ModuleFederationConfig = {
shared: (libraryName, defaultConfig) => {
// Critical: Enforce singleton for core Angular packages
if (['@angular/core', '@angular/common', '@angular/router', '@ngrx/store'].includes(libraryName)) {
return {
singleton: true,
strictVersion: true,
requiredVersion: 'auto', // Reads from root package.json
eager: false
};
}
// Return default for everything else
return defaultConfig;
},
};3. CI/CD for Federation
With this architecture, our CI pipeline evolved again. We implemented Affected Deployments.
When a developer changes code in libs/checkout-lib:
- Nx detects that only
mfe-checkoutis affected. - The CI pipeline builds only
mfe-checkout. - The artifact is uploaded to a versioned folder in S3 (e.g.,
/mfe-checkout/v1.0.5/remoteEntry.js). - We update a "Manifest JSON" file that the Shell reads at runtime to know which version of the checkout remote to load.
The Result:
- Independent Deployments: The Checkout team can deploy 10 times a day without coordinating with the Inventory team.
- Fault Isolation: If
mfe-checkoutcrashes, the Shell catches the error, displays a "Service Unavailable" banner for that specific route, but keeps the rest of the application (Dashboard, Profile) alive.
Conclusion: The Architect as a Gardener
Reflecting on this 18-month journey—from a sluggish, fragile monolith to a high-speed, federated fleet—the role of the Architect has clarified.
When you are a junior engineer, you build features. You plant the seeds. When you are a senior engineer, you design systems. You build the greenhouse. But when you are an Architect managing enterprise scale, you are a Gardener.
You cannot manually inspect every leaf (pull request). You cannot force the plants to grow faster than biology allows (compiler limits). Your job is to:
- Enforce Boundaries (Governance): Ensure the roots don't tangle (prevent circular dependencies).
- Provide Nutrients (Caching & DTE): Ensure the environment supports rapid growth.
- Monitor Health (Observability): Watch for the yellowing leaves (Cache drift) before the plant dies.
- Prune Aggressively (Refactoring): Cut back dead code to let new growth flourish.
The transition to Nx and Angular at scale is not just a tooling change; it is a cultural shift. It requires moving from "Hope-Driven Development" (hoping the build passes) to "Data-Driven Engineering."
We have built a system that is resilient, fast, and observable. But the landscape of frontend development changes every six months. The next challenge is already on the horizon: integrating Rust-based tooling (like Turbopack or Rspack) to replace Webpack entirely, and exploring Qwik/City for hydration-free loading.
The journal remains open. The work is never done. But for now, the build is green, and it finished in 90 seconds.
Appendix: Implementation Cheatsheet
For those looking to implement the strategies discussed in Volume 2, here is a consolidated checklist of the high-impact actions we took.
A. Telemetry Script (Simplified)
Use this snippet to start gathering your own cache metrics if you aren't using Nx Cloud Enterprise.
// tools/scripts/track-metrics.js
const { execSync } = require('child_process');
const fs = require('fs');
try {
// Capture the build output
const output = execSync('nx run-many --target=build --all', { encoding: 'utf-8' });
// Regex to find cache hits
const cacheHits = (output.match(/existing outputs match the cache/g) || []).length;
const totalTasks = (output.match(/Running target "build"/g) || []).length;
const data = {
timestamp: new Date().toISOString(),
cacheHits,
totalTasks,
ratio: totalTasks > 0 ? (cacheHits / totalTasks) : 0
};
// Append to a local JSON log (or push to API)
fs.appendFileSync('build-metrics.json', JSON.stringify(data) + '\n');
} catch (e) {
console.error("Metric tracking failed", e);
}B. Recommended .npmrc for Enterprise
This configuration prevents "phantom dependencies" and speeds up installation in CI.
# .npmrc
# Enforce strict peer dependency resolution
strict-peer-dependencies=true
# Automatically hoist patterns (useful for Monorepos)
public-hoist-pattern[]=*eslint*
public-hoist-pattern[]=*prettier*
# Prefer offline installation if packages are in store
prefer-offline=true
# Lock the store location for easier caching in CI
store-dir=.pnpm-storeC. Module Federation Host Config
The configuration for the "Shell" application that loads remotes dynamically.
// apps/shell/module-federation.config.ts
import { ModuleFederationConfig } from '@nx/webpack';
const config: ModuleFederationConfig = {
name: 'shell',
remotes: [
// We do NOT hardcode URLs here. We load them from a manifest at runtime.
// This allows us to promote artifacts from Staging to Prod without rebuilding.
'mfe-inventory',
'mfe-checkout',
],
shared: (libraryName, defaultConfig) => {
if (libraryName === '@angular/core') {
return { ...defaultConfig, singleton: true, strictVersion: true };
}
return defaultConfig;
}
};
export default config;
