Top 4 Observability Risks

Discarding or sampling data is like throwing out 500 pieces of a 1,000-piece puzzle. You can get an idea of the image in the puzzle, but you can’t see the full picture. You need that full picture to truly take advantage of observability data to predict potential incidents before they happen, and ensure your applications and infrastructure are operating as they should. Otherwise, you risk dangerous consequences.
1. Faulty AI Models
To predict when end users will experience poor performance, especially when it comes to streaming video and web applications, companies may use AI for anomaly detection. However, to train those models, they need data. The models need historical data to identify patterns, what’s normal and abnormal, and fine-tune those models so that they can detect performance issues based on past behavioral indicators. For example, companies need to be able to predict when a performance issue will occur, so they can reroute traffic, for example, to a different content delivery network (CDN). Without data that shows the warning signs, they can’t make those predictions.
2. Unresolved Cyberthreats
To detect and shut down low and slow attacks, like advanced persistent threats (APT), investigators must look back many months or even years to understand what happened, what the root cause was, who was affected, where it started and so on.
But when data is discarded, it increases the chances of not having all those details. It’s like throwing out the needle with the haystack. Important information may go down the drain, leaving gaps in the attack chain that can’t be filled. That means investigators may not find and shut down the root cause, increasing the risk of the attack persisting — or it may take more time to connect the dots, leaving an open window for more attacks.
3. Compliance Issues
Compliance regulations include mandates to store log data for security, auditing and legal purposes. For example, the Sarbanes-Oxley Act (SOX) requires companies to maintain detailed logs for auditing and financial reporting. The Gramm-Leach-Bliley Act (GLBA) requires financial institutions to secure customer data, which involves log storage to monitor access and changes. The Payment Card Industry Data Security Standard (PCI DSS) requires storing logs for at least one year. The Health Insurance Portability and Accountability Act (HIPAA) requires health-care organizations to log and monitor access to electronic protected health information.
If companies do not meet the data retention requirements, they could face steep fines, among other penalties.
4. Resource Inefficiencies
What if you are using 1,000 servers but you only need 750? How would you know? By analyzing log data from cloud services, you can see where you need to scale up resources and how well those servers are performing. Without visibility into how your services are performing, you may be running services on containers that are overprovisioned.
Or, you may miss an issue with a service, such as a bug that is causing timeouts and repeated retries, resulting in excessive compute charges. You need access to log data to understand why failures are occurring and know where you may need more or fewer resources. In other words, you can’t spot the problem child without watching all the children.
Next Steps
Avoid these consequences by keeping all your data (without breaking your budget).
Ingesting, retaining and analyzing all of your data is key for maintaining a healthy, functional and secure infrastructure. Managed observability service Hydrolix for AWS — which recently launched on the AWS Marketplace and is part of AWS CloudFront Service Ready program — makes that doable.