France’s SNCF has long been recognized as a global leader in transportation, having introduced one of the world’s first high-speed railways, the Train à Grande Vitesse (TGV), in 1981. Since the launch of that pioneering public resource, SNCF has gone from strength to strength. The company has expanded beyond France and is now succeeding in the global arena, where it provides engineering and logistics expertise to assist in public transportation projects worldwide. SNCF is now also a thriving software company in addition to a public transportation provider, building countless internal and public-facing applications to support its operations, products, and services.
But just as SNCF has evolved with the times, the landscape for its core business is also changing. Notably, the European Union has gradually been deregulating national railways in recent years, subjecting SNCF to new rivalries on its home turf while also opening doors to new opportunities. As a result, conditions are increasingly requiring SNCF to continue to operate and innovate at an elite level, and in a more competitive environment than ever.
It was against the background of these market changes, in 2016, that SNCF began a major digital transformation initiative that was aimed at updating its IT infrastructure and improving its competitiveness. Through the multi-year project, SNCF would attempt to migrate 90% of its applications to the cloud and further improve the agility of IT assets by embracing PaaS and containerization.
But as the cloud migrations were set in motion, SNCF discovered a previously unknown issue that threatened the goals of the digital transformation project: The company had no coordinated approach to monitoring. Business units had been adopting monitoring solutions independently, which led to the company using a total of 11 different monitoring tools. Even in those cases when the same monitoring tools were used on different teams, they were configured in an incompatible way that prevented information sharing.
This lack of a single, standard monitoring tool severely restricted the scope of what each team monitored, making it difficult for different IT teams to cooperate on shared problems. As a result, every business unit was making IT decisions unilaterally, which was a clear impediment to the organization’s goal to improve its competitiveness and agility. Childéric Rouanet, an IT Project Manager at SNCF, summarized the problem simply: “When every business unit is using a different tool, there’s no shared base of information.”
“When every business unit is using a different tool, there’s no shared base of information.”
IT Project Manager, SNCF
Particularly problematic for SNCF was the lack of monitoring for containerized applications. As part of the company’s digital transformation, some teams were building new containerized apps on Azure Kubernetes Service (AKS). These containerized apps were intended to be a linchpin of SNCF’s modernization strategy. However, since the company’s older tools couldn’t properly monitor containerized environments, these apps went into production with only minimal monitoring and support.
A final significant challenge with SNCF’s existing monitoring tools was that they weren’t cloud-native. Consequently, their operation and maintenance seemed unwieldy in their new cloud environment, which led to user friction and extra administrative overhead.
Need to standardize on a single monitoring system
To promote its efforts to improve competitiveness through digital transformation, SNCF determined that it was crucial for its teams to standardize on a single, centralized monitoring solution. To start its search, SNCF first identified broader business goals, such as improving collaboration, breaking down department silos, reducing time-to-market for new products and services, and improving the quality of its IT services. Having defined these higher-level goals, SNCF was then able to identify a number of more specific requirements for the monitoring solution:
The new tool had to centralize alerting, log collection and management, and application performance monitoring.
SNCF needed the tool to be a SaaS-based and cloud-native solution, so as to reduce friction in adoption and eliminate any maintenance requirements on the part of SNCF engineers.
The monitoring solution had to support the multiple cloud environments that SNCF was moving into, including AWS, Microsoft Azure, IBM Cloud, and Oracle Cloud.
The new tool had to support monitoring containerized applications in the cloud.
After a highly successful pilot program with Datadog’s Infrastructure Monitoring, Log Management, and APM products, SNCF decided to go all-in on deploying Datadog company-wide. Within eight months, SNCF migrated 4,800 servers (including 2,000 production servers), 655 applications, and 13,000 containers from older monitoring tools to Datadog. SNCF teams now use Datadog in many ways, such as for alerting, troubleshooting, reviewing application logs and performance metrics, and checking billing with cloud providers.
Since adopting Datadog, SNCF has seen a number of important benefits, not the least of which is a single source of truth for monitoring data across the entire organization. For starters, Datadog gives all technical roles at SNCF access to monitoring data, whereas before, only the administrators of the various solutions used any monitoring tool. And crucially, DevOps engineers, project managers, and admins across different business units are able to view the same data on the same monitoring platform. This shared point of reference helps ensure that teams are accountable for the applications that they build. By making its data available company-wide, Datadog has also helped SNCF improve communication and collaboration among teams. There’s no more siloing of monitoring data.
“ Every engineer can now observe how well applications are functioning, which translates into a better service for customers who are using our trains.”
IT Project Manager, SNCF
Datadog has also brought important benefits directly related to improved end-to-end, 24/7 monitoring. Each team now has direct and full visibility into the services for which it is responsible, including all associated containers and middleware. As a result, teams now have the real-time data they need to either anticipate and prevent incidents, or to identify and resolve them quickly when they do occur. The production team, for example, has demonstrated an improved mean-time-to-identify (MTTI) for its resources and can now quickly determine whether issues originate in an application or its infrastructure. As the IT Manager at SNCF who headed the transformation, Alain Charpy, commented: “Every engineer can now observe how well applications are functioning, which translates into a better service for customers who are using our trains.”
The fact that Datadog is a cloud-native service has been another boon to SNCF. Teams across the company now have nearly complete visibility into cloud-based services, from creation to production, via a monitoring platform built to support modern application architectures. This visibility is already translating into many concrete returns on investment, such as an improvement in the quality of IT. For example, the operations team was able to alert an internal client to a MongoDB issue in the cloud before the client even saw the problem. And the migration of a big application to the cloud was perfectly tracked, thanks to constant monitoring provided through Datadog. This line of sight into modern infrastructure empowers engineers, reassures customers, and strengthens the business overall.
Datadog has assisted SNCF with achieving its goals of modernizing its infrastructure—including helping it migrate many of its apps to the cloud and improving its organizational agility. Datadog has also supported SNCF’s overall shift to modern application development, and particularly its broad move to containerization. SNCF until now has achieved these great results by using Datadog Infrastructure Monitoring, Log Management, and APM. But because of its success, the company is now investigating additional Datadog products, such as RUM and Network Device Monitoring.
But most fundamentally, by giving SNCF engineers broad access to the same monitoring data in a single, central location, Datadog has helped SNCF improve collaboration and cooperation between teams. “Before Datadog, monitoring metrics was reserved for the 150 people who were most directly responsible for managing operations,” said Alain Charpy. “Today, those metrics are available to all 2,300 SNCF engineers and administrators, across all teams and environments.”
“ Before Datadog, monitoring metrics was reserved for the 150 people who were most directly responsible for managing operations. Today, those metrics are available to all 2,300 SNCF engineers and administrators, across all teams and environments.”
IT Project Manager, SNCF