Sustaining reliability across a mature, enterprise-scale platform
Nulab has spent over 20 years building collaboration tools trusted by enterprise customers. Two prominent tools include Backlog and Cacoo. Backlog serves 15,000+ paid organizations and Cacoo supports 4M+ users.
At the core of Nulab’s ecosystem are two foundational platform services: Nulab Account, which handles unified authentication and billing across every Nulab product, and Nulab Pass, which enforces organization-wide security and access controls. Together they run on 50+ services, 1,500+ containers, 300+ hosts, and 35+ AWS accounts.
Despite this vast infrastructure, a small SRE team is responsible for the shared platform every Nulab service depends on. Hisatomo Futahashi, Principal Engineer at Nulab, handles monitoring, incident response, and platform health across the full stack. Given the team’s limited size relative to the platform’s scale, Nulab began looking for ways to multiply engineering capacity and extend what one person could do.
Accelerating investigations with AI-powered SRE
To address these challenges, Nulab adopted Bits AI SRE. Futahashi integrated Bits AI SRE into Nulab’s monitoring workflow. When alerts come in, engineers trigger Bits investigations from Slack. Bits became a force multiplier almost immediately, enabling faster and more consistent investigations.
During a real-world incident, a late-night DDoS attack was investigated and understood in just four minutes, even while processing massive volumes of telemetry. “Investigations often finish with just Bits now,” says Futahashi. “I don’t even open AWS Console or Terminal anymore.”
Bits analyzes logs, traces, and metrics together, mirroring how experienced SREs troubleshoot systems. It identifies patterns, surfaces anomalies, and provides clear conclusions without requiring deep prior context. This allows Nulab’s small SRE team to move from manual triage to guided, AI-assisted investigations, reducing cognitive load.
“Investigations often finish with just Bits now,” says Futahashi. “I don't even open AWS Console or Terminal anymore.”
Expanding SRE beyond incidents to everyday workflows
For a small team, every manual task carries real weight. Beyond critical incidents, Nulab uses Bits to handle the operational work that would otherwise consume that engineer’s day.
Many low-priority alerts that previously required at least 30 minutes on average to investigate can now be triaged with Bits in under 5 minutes. Latency investigations automatically incorporate historical context, helping catch recurring issues before they become regressions. “Bits gives us deep, precise insights without prior context,” says Futahashi. “Knowledge that used to exist only in engineers’ heads is now surfaced automatically.”
Engineers can now investigate logs and costs by chatting with Bits in natural language — no complex query writing, no custom dashboards. Work that once required deep specialist knowledge can now be offloaded to Bits, extending the team’s reach without adding headcount.
“Bits gives us deep, precise insights without prior context,” says Futahashi. “Knowledge that used to exist only in engineers' heads is now surfaced automatically.”
Building a new model for human and AI collaboration
For Nulab, adopting Bits AI SRE marks a fundamental shift in how incident response is approached. “Bits protects the ’now’, I protect the ‘future’,” says Futahashi.
By offloading real-time investigation work to AI, the team can focus on improving systems, refining processes, and driving long-term reliability. At the same time, Bits continuously learns from data, context, and usage, creating a feedback loop that improves reliability over time.
Nulab treats Bits as a member of the team, investing in better telemetry, stronger context, and best practices to maximize its effectiveness.
Turning AI-driven SRE into a competitive advantage
Today, Nulab has changed how the team handles incident response from a reactive, manual process into a faster, more scalable, and more intelligent workflow. Investigations that once required deep expertise and significant time can now be completed in minutes. Engineers operate with greater confidence, reduced cognitive load, and improved visibility across complex systems. “Bits AI SRE is not just automation. It fundamentally changes how we approach incident response and reliability as a team,” says Futahashi.
By combining Datadog observability with AI-driven SRE, Nulab is building a more resilient platform while enabling teams to move faster and focus on what matters most: delivering reliable, high-quality experiences to its users.