Captur is an AI-powered platform that enables real-time image verification and quality control for businesses. The company leverages AI to process images and automate tasks that previously required manual review.
Software
11–50 Employees
London
“Now, instead of spending half a day trying to figure out what caused a crash, we can look at it in Datadog and determine what caused it in seconds.”
“Now, instead of spending half a day trying to figure out what caused a crash, we can look at it in Datadog and determine what caused it in seconds.”
Sumanas SarmaChief Technology OfficerCaptur
Why Datadog?
Dynamic session sampling without needing to publish new versions of their library
Cost control at scale
Precise crash detection
Proactive issue identification
Workflow Automation accelerates triage
Challenge
Captur's SDK is integrated directly into client apps, making identifying the root cause of crashes challenging. Crashes originating from the host app were often misattributed to Captur, and genuine SDK related issues were difficult to isolate. This lack of visibility led to time-consuming investigations and frequent customer frustration.
Captur is an AI-powered platform that enables real-time image verification and quality control for businesses. The company uses AI to process images and automate tasks that previously required manual review. This enables its customers to improve efficiency, reduce costs, and enhance their customers’ experience.
The Captur SDK operates through a sophisticated two-stage process. First, when a client app initiates verification, the SDK dynamically downloads the appropriate ML model based on the specific asset and location being verified. Then the SDK activates the device camera and processes live video frames through the ML model directly on-device in real time. The entire process runs offline on the user’s device, with no frames sent to the cloud until a final “confirmed” image is uploaded. This design enhances user privacy but creates significant debugging challenges and unique technical constraints. “The SDK must be stable and frictionless for end users while continuously adapting to unpredictable environments,” says Sumanas Sarma, CTO at Captur. “Our ML models need to perform consistently whether a customer is verifying deliveries in Texas or scooter parking in European cities, with completely different lighting and architecture.”
Unlike traditional mobile applications, Captur’s observability requirements center on SDK versions and service facets rather than client app versions. The team tracks critical metadata that includes OS versions, device brands and models, and service tags to isolate errors and ensure crashes are attributed correctly to avoid false accountability.
Before finding the right observability solution, Captur faced operational challenges. Debugging issues could take a day or more, making it difficult to trace root causes without proper stack traces. The company’s previous observability platform flooded them with unsampled session data, resulting in high costs while lacking dynamic sampling or filtering controls. Most critically, namespace collisions actually broke observability for their clients, which created visibility issues across their platform. Captur needed an observability solution that minimized user friction during client app flows, reduced false positives, and delivered real-time feedback without slowing down the user experience.
Gaining comprehensive SDK visibility and real-time crash intelligence
When evaluating potential solutions, Datadog Real User Monitoring (RUM) emerged as the clear choice for their unique requirements. The Captur team is now using RUM to improve crash triage. Their goal is to capture 100 percent of crashes while sampling normal sessions at just one to 10 percent. They use custom RUM events and filters to isolate crashes that explicitly reference the Captur SDK in stack traces. This proved invaluable when a client on Android experienced more than 100,000 crashes in a single week during internal testing. Using RUM, Captur quickly adjusted filters to reduce cost exposure in real time.
Datadog’s encapsulated SDK scope was crucial for avoiding the naming collisions that had broken observability for Captur clients with previous solutions. Meanwhile, the dynamic filters and sampling capabilities enabled Captur to adjust capture rates in real time without requiring SDK redeployments—a critical advantage given their clients’ long release cycles of three to four weeks. “RUM is really flexible and quick to respond,” says Justin Powell, software engineer at Captur.
There are times when we roll out a big change to a customer and RUM allows us to isolate that particular SDK version and temporarily get 100 percent of everything for a day or two and monitor how it progresses, then reduce sampling once stability is confirmed, which has been ideal.
Justin Powell Software Engineer, Captur
Using Datadog Workflow Automation, the team has also built a workflow that pulls custom metrics on RUM events by extracting crash context, including SDK versions, OS, device, location, and service. Using these metrics, the workflow matches them to specific repository code lines and prioritizes based on recent volume and impact.
Achieving proactive crash detection and cost-effective scaling
Today, Captur uses Datadog RUM daily to maintain full crash visibility from the SDK side. This has transformed their approach from reactive troubleshooting to proactive issue detection. Instead of hearing about crashes from clients after they occur, the team can now isolate and filter out sessions containing errors not caused by their SDK. They can also dynamically adjust sampling and retention settings to focus only on relevant crash data. “Before, clients would tell us we had a crash,” says Sumanas.
That's gone 180 degrees now to the point where we reach out to clients proactively to let them know we've noticed a problem and this is how we're going to fix it. That gives them the confidence to increase traffic.
Sumanas Sarma Chief Technology Officer, Captur
Captur has reduced irrelevant session ingestion by over 75 percent, dropping from four million sessions per month to under one million. This has drastically improved prioritization and incident response capabilities for its small engineering team. “Before we added Datadog RUM, we had to devote a lot of time to investigating crashes,” says Sarma.
Now, instead of spending half a day trying to figure out what caused the crash, we can look at it in Datadog and determine what caused it in seconds.
Sumanas Sarma Chief Technology Officer, Captur
RUM has also proven valuable for addressing the challenges related to managing clients’ long deployment cycles. The team can now change sampling rates instantly via the UI to focus on traffic spikes or suppress noise. They run 100 percent sampling for crash sessions and low sampling for normal sessions, enabling precise monitoring without added cost. This allowed them to scale from 60,000 sessions per day to over 850,000 sessions per day with only a 2x increase in cost. Meanwhile, Workflow Automation has accelerated support capabilities by posting crash incident summaries to Slack in real time and tying sessions directly to client reports and SDK activities.
Ultimately, Datadog serves as a scaling enabler for Captur, allowing them to grow session volume and expand their client base while maintaining tight control over errors and costs.
Everything that Datadog has done so far has enabled us to scale far more easily.
Imad Shatali Senior Platform Engineer, Captur
“We’re excited about applying the same functionality across a much larger number of clients.”
Request a personalized demo with a Datadog engineer
I'd like Datadog to share the latest news about Datadog services and related offerings with me by email or telephone. You may unsubscribe at any time by following the instructions in the communications received from Datadog.