OpenSearch Operator

Step by step instructions: How to deploy OpenSearch Operator inside Kubernetes Cluster (EKS)

Prerequirements

Install Helm

brew install helm
brew install helmfile 
brew install kubectl

Add Plugin Helm  

helm plugin install https://github.com/databus23/helm-diff
helm plugin install https://github.com/hypnoglow/helm-s3.git

Add Helm Repository S3 Bucket

### LAB ###
helm s3 init s3://devopscorner-helm-chart/lab
AWS_REGION=ap-southeast-1 helm repo add devopscorner-lab s3://devopscorner-helm-chart/lab

### STAGING ###
helm s3 init s3://devopscorner-helm-chart/staging
AWS_REGION=ap-southeast-1 helm repo add devopscorner-staging s3://devopscorner-helm-chart/staging

### PRODUCTION ###
helm s3 init s3://devopscorner-helm-chart/prod
AWS_REGION=ap-southeast-1 helm repo add devopscorner s3://devopscorner-helm-chart/prod
helm repo update

Update Repository  

helm repo add stable https://charts.helm.sh/stable
helm repo add opensearch-operator https://opster.github.io/opensearch-k8s-operator/
helm repo update
helm repo list 

NAME                URL
opensearch-operator https://opster.github.io/opensearch-k8s-operator/
stable              https://charts.helm.sh/stable

Create Namespace

kubectl create namespace observability

Install OpenSearch Operator

helm install opsearch opensearch-operator/opensearch-operator --create-namespace -n observability

NAME: opsearch
LAST DEPLOYED: Sat Nov  4 09:08:25 2023
NAMESPACE: observability
STATUS: deployed
REVISION: 1
TEST SUITE: None

Install Cluster OpenSearch 

Change this default OpenSearch-Cluster.yaml

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: my-cluster
  namespace: default
spec:
  general:
    version: "1.3.0"
    httpPort: 9200
    vendor: opensearch
    serviceName: my-cluster
    monitoring:
      enable: true
    pluginsList: ["repository-s3"]
  dashboards:
    version: "1.3.0"
    enable: true
    replicas: 2
    resources:
      requests:
        memory: "1Gi"
        cpu: "500m"
      limits:
        memory: "1Gi"
        cpu: "500m"
  confMgmt:
    smartScaler: true
  nodePools:
    - component: masters
      replicas: 3
      diskSize: "30Gi"
      nodeSelector:
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "2Gi"
          cpu: "500m"
      roles:
        - "master"
        - "data"
    - component: nodes
      replicas: 3
      diskSize: "30Gi"
      nodeSelector:
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "2Gi"
          cpu: "500m"
      roles:
        - "data"
    - component: coordinators
      replicas: 3
      diskSize: "30Gi"
      nodeSelector:
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "2Gi"
          cpu: "500m"
      roles:
        - "ingest"

LAB Configuration (Simple Cluster) 

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: opsearch
  namespace: observability
spec:
  general:
    version: "1.3.0"
    httpPort: 9200
    vendor: opensearch
    serviceName: opsearch
    monitoring:
      enable: true
    pluginsList: ["repository-s3"]
  dashboards:
    version: "1.3.0"
    enable: true
    replicas: 1
    nodeSelector:
      node: devopscorner-monitoring
    resources:
      requests:
        memory: "200Mi"
        cpu: "500m"
      limits:
        memory: "512Mi"
        cpu: "500m"
  confMgmt:
    smartScaler: true
  nodePools:
    - component: masters
      replicas: 2
      diskSize: "10Gi"
      nodeSelector:
        node: devopscorner-monitoring
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "2Gi"
          cpu: "500m"
      roles:
        - "master"
        - "data"

LAB Configuration (HA Cluster) 

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: opsearch
  namespace: observability
spec:
  general:
    version: "1.3.0"
    httpPort: 9200
    vendor: opensearch
    serviceName: opsearch
    monitoring:
      enable: true
    pluginsList: ["repository-s3"]
  dashboards:
    version: "1.3.0"
    enable: true
    replicas: 1
    nodeSelector:
      node: devopscorner-monitoring
    resources:
      requests:
        memory: "200Mi"
        cpu: "500m"
      limits:
        memory: "512Mi"
        cpu: "500m"
  confMgmt:
    smartScaler: true
  nodePools:
    - component: masters
      replicas: 2
      diskSize: "30Gi"
      nodeSelector:
        node: devopscorner-monitoring
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "2Gi"
          cpu: "500m"
      roles:
        - "master"
        - "data"
    - component: nodes
      replicas: 2
      diskSize: "30Gi"
      nodeSelector:
        node: devopscorner-monitoring
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "2Gi"
          cpu: "500m"
      roles:
        - "data"
    - component: coordinators
      replicas: 2
      diskSize: "30Gi"
      nodeSelector:
        node: devopscorner-monitoring
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "2Gi"
          cpu: "500m"
      roles:
        - "ingest"

Production Configuration (HA Cluster) 

apiVersion: opensearch.opster.io/v1
kind: OpenSearchCluster
metadata:
  name: opsearch
  namespace: observability
spec:
  general:
    version: "1.3.0"
    httpPort: 9200
    vendor: opensearch
    serviceName: opsearch
    monitoring:
      enable: true
    pluginsList: ["repository-s3"]
  dashboards:
    version: "1.3.0"
    enable: true
    replicas: 2
    nodeSelector:
      node: devopscorner-monitoring
    resources:
      requests:
        memory: "200Mi"
        cpu: "500m"
      limits:
        memory: "1Gi"
        cpu: "500m"
  confMgmt:
    smartScaler: true
  nodePools:
    - component: masters
      replicas: 3
      diskSize: "30Gi"
      nodeSelector:
        node: devopscorner-monitoring
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "2Gi"
          cpu: "500m"
      roles:
        - "master"
        - "data"
    - component: nodes
      replicas: 3
      diskSize: "30Gi"
      nodeSelector:
        node: devopscorner-monitoring
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "2Gi"
          cpu: "500m"
      roles:
        - "data"
    - component: coordinators
      replicas: 3
      diskSize: "30Gi"
      nodeSelector:
        node: devopscorner-monitoring
      resources:
        requests:
          memory: "2Gi"
          cpu: "500m"
        limits:
          memory: "2Gi"
          cpu: "500m"
      roles:
        - "ingest"

Apply manifest 

kubectl create -f opensearch-cluster.yaml -n observability

Access Portforward

kubectl get po -n observability 


NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0 40m
opsearch-coordinators-0 1/1 Running 0 9m57s
opsearch-coordinators-1 1/1 Running 0 10m
opsearch-dashboards-7fcc5595c7-fhf28 1/1 Running 0 22m
opsearch-masters-0 1/1 Running 0 10m
opsearch-masters-1 1/1 Running 0 13m
opsearch-nodes-0 1/1 Running 0 9m59s
opsearch-nodes-1 1/1 Running 0 13m
opsearch-opensearch-operator-controller-manager-7cc6dd6fd8qx5xd 2/2 Running 0 37m

kubectl get po opsearch-dashboards-7fcc5595c7-fhf28 -n observability

NAME READY STATUS RESTARTS AGE
opsearch-dashboards-7fcc5595c7-fhf28 1/1 Running 0 23m

kubectl get po opsearch-dashboards-7fcc5595c7-fhf28 -n observability

...
Containers:
dashboards:

Container ID: containerd://02966406e8e5d2c9cef7c1e139b74887537386111374e29cf5c50ab3cbda19ae
Image: docker.io/opensearchproject/opensearch-dashboards:1.3.0
Image ID: docker.io/opensearchproject/opensearch-dashboards@sha256:7dcc706ab6c71ab00013e341246e7a701c11c61a7668e4dbecd298d6d7aef758
Port: 5601/TCP
Host Port: 0/TCP
...

kubectl port-forward opsearch-dashboards-7fcc5595c7-fhf28 5601:8080 -n observability

Forwarding from 127.0.0.1:5601 -> 8080
Forwarding from [::1]:5601 -> 8080

UserName: admin
Password: admin

Setup Pattern

Goto Stack Management

Create Index Pattern

Select Index Pattern

Setup Timestamp for sorting Index Pattern

Discover Logs

Using Network LoadBalancer (NLB) via NGINX Controller

Get Deployment Manifest

wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/aws/deploy.yaml

Change Manifest

sed  -i 's/externalTrafficPolicy: Local/externalTrafficPolicy: Cluster/g' deploy.yaml

Deploy Manifest

kubectl create -f deploy.yaml

Expose LB for OpenSearch Dashboard

kubectl expose deployment opsearch-dashboards --name=opensearch-lb --type=LoadBalancer --port=80 --target-port=5601 --protocol=TCP --namespace=observability

Helm Release Version

helm list --all-namespaces
helm list --namespace=observability

Prometheus Operator

Step by step instructions: How to deploy Prometheus Operator inside Kubernetes Cluster (EKS)

Prerequirements

Install Helm

brew install helm
brew install helmfile 
brew install kubectl

Add Plugin Helm  

helm plugin install https://github.com/databus23/helm-diff
helm plugin install https://github.com/hypnoglow/helm-s3.git

Add Helm Repository S3 Bucket

### LAB ###
helm s3 init s3://devopscorner-helm-chart/lab
AWS_REGION=ap-southeast-1 helm repo add devopscorner-lab s3://devopscorner-helm-chart/lab

### STAGING ###
helm s3 init s3://devopscorner-helm-chart/staging
AWS_REGION=ap-southeast-1 helm repo add devopscorner-staging s3://devopscorner-helm-chart/staging

### PRODUCTION ###
helm s3 init s3://devopscorner-helm-chart/prod
AWS_REGION=ap-southeast-1 helm repo add devopscorner s3://devopscorner-helm-chart/prod
helm repo update

Update Repository  

helm repo add stable https://charts.helm.sh/stable
helm repo add grafana https://grafana.github.io/helm-charts
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm repo list 

NAME                URL
prometheus-community https://prometheus-community.github.io/helm-charts
grafana             https://grafana.github.io/helm-charts
stable              https://charts.helm.sh/stable

Create Namespace

kubectl create namespace observability

Install Prometheus Operator

helm install prometheus-operator prometheus-community/kube-prometheus-stack --create-namespace -n observability

NAME: prometheus-operator
LAST DEPLOYED: Sun Nov  5 02:03:06 2023
NAMESPACE: observability
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
kubectl --namespace observability get pods -l "release=prometheus-operator"

kubectl get po -n observability

NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          32m
prometheus-grafana-55fb596bf5-5257r                      3/3     Running   0          32m
prometheus-kube-prometheus-operator-757f8788d4-v6tk5     1/1     Running   0          32m
prometheus-kube-state-metrics-898dd9b88-98qlj            1/1     Running   0          32m
prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running   0          32m
prometheus-prometheus-node-exporter-llfn2                1/1     Running   0          32m
prometheus-prometheus-node-exporter-nrpkq                1/1     Running   0          32m

kubectl --namespace observability get pods -l "release=prometheus-operator"

NAME                                                      READY   STATUS    RESTARTS   AGE
prometheus-operator-kube-p-operator-7cc49d6ffb-ktjnv      1/1     Running   0          2m16s
prometheus-operator-kube-state-metrics-797d9866bd-s4xhb   1/1     Running   0          2m16s
prometheus-operator-prometheus-node-exporter-vg42d        1/1     Running   0          2m16s

kc get po -n observability

kubectl get svc -n observability

NAME                                           TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
alertmanager-operated                          ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   11m
prometheus-operated                            ClusterIP   None             <none>        9090/TCP                     11m
prometheus-operator-grafana                    ClusterIP   172.20.33.136    <none>        80/TCP                       11m
prometheus-operator-kube-p-alertmanager        ClusterIP   172.20.136.150   <none>        9093/TCP,8080/TCP            11m
prometheus-operator-kube-p-operator            ClusterIP   172.20.219.78    <none>        443/TCP                      11m
prometheus-operator-kube-p-prometheus          ClusterIP   172.20.195.49    <none>        9090/TCP,8080/TCP            11m
prometheus-operator-kube-state-metrics         ClusterIP   172.20.214.227   <none>        8080/TCP                     11m
prometheus-operator-prometheus-node-exporter   ClusterIP   172.20.230.46    <none>        9100/TCP                     11m

kc get svc -n observability 

Edit Prometheus Service 

Change to type LoadBalancer from ClusterIP

kubectl edit svc prometheus-kube-prometheus-prometheus -n observability

Edit Grafana Service 

Change to type LoadBalancer from ClusterIP

kubectl edit svc prometheus-grafana -n observability

Access Load Balancer

Grafana Access

http://a7754204d8c2e41969dfa8134d4a3d78-2128039805.ap-southeast-1.elb.amazonaws.com

UserName: admin
Password: prom-operator

Change Credentials Grafana

kubectl get secret -n  observability

NAME                                                                TYPE                 DATA   AGE
alertmanager-prometheus-operator-kube-p-alertmanager                Opaque               1      40h
alertmanager-prometheus-operator-kube-p-alertmanager-generated      Opaque               1      40h
alertmanager-prometheus-operator-kube-p-alertmanager-tls-assets-0   Opaque               0      40h
alertmanager-prometheus-operator-kube-p-alertmanager-web-config     Opaque               1      40h
prometheus-kube-prometheus-admission                                Opaque               3      4d12h
prometheus-operator-grafana                                         Opaque               3      40h
prometheus-operator-kube-p-admission                                Opaque               3      40h
prometheus-operator-kube-p-prometheus                               Opaque               0      40h
prometheus-prometheus-operator-kube-p-prometheus                    Opaque               1      40h
prometheus-prometheus-operator-kube-p-prometheus-tls-assets-0       Opaque               1      40h
prometheus-prometheus-operator-kube-p-prometheus-web-config         Opaque               1      40h

Change Base64 Credentials

  • Change User Encode (admin-user)

echo “devopscorner-admin” | base64
4oCcZGV2b3BzY29ybmVyLWFkbWlu4oCdCg==

  • Change Password Encode (admin-password)

echo “devopscorner-secret” | base64
4oCcZGV2b3BzY29ybmVyLXNlY3JldOKAnQo=

  • Edit admin-password and admin-user from encoded base64 string

kubectl edit secret prometheus-operator-grafana -n observability

Prometheus Access

http://af58dac1154de4b57be3d0d63d60936b-925795512.ap-southeast-1.elb.amazonaws.com:9090

Using Jumpods

References:
https://github.com/devopscorner/devopscorner-helm/tree/master/helmfile/jumppod

Test with Curl (Inside Jumpods)

curl prometheus-kube-prometheus-prometheus.observability.svc.cluster.local:9090

curl prometheus-grafana.observability.svc.cluster.local

Using Network LoadBalancer (NLB) via NGINX Controller

Get Deployment Manifest

wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/aws/deploy.yaml

Change Manifest

sed  -i 's/externalTrafficPolicy: Local/externalTrafficPolicy: Cluster/g' deploy.yaml

Deploy Manifest

kubectl create -f deploy.yaml

Expose LB for Grafana Dashboard

kubectl expose deployment prometheus-operator-grafana --name=grafana-lb --type=LoadBalancer --port=80 --target-port=3000 --protocol=TCP --namespace=observability

Helm Release Version

helm list --all-namespaces
helm list --namespace=observability

Infrastructure Kubernetes (EKS) Cost Monitoring & Optimization

As an integral part of the DevOps culture, Cost Monitoring & Optimization is the most important element in monitoring and optimizing the use of infrastructure, especially in today’s cloud computing era. In this event, we will discuss the strategy of cost monitoring & optimization of infrastructure in using Kubernetes (EKS) on AWS.

In this session, we will discuss provisioning estimation costs, autoscaling systems, downscale schedules, and alerting systems for cost usage notifications from cost limitation budgets.

Don’t miss ZX Talk – Infrastructure Kubernetes (EKS) Cost Monitoring & Optimization which will be held on:

Date: Thursday, 23 June 2022
Time: 14.00 – 15.30 (2 – 3.30 pm) Jakarta
Place: Virtual Meet

ZXTalk: Infrastructure Kubernetes (EKS) Cost Monitoring & Optimization

Registration Links:
https://bit.ly/3Qy7Ejx

#ZebraX #DigitalTransformation #MonitoringTools #Kubernetes #CostMonitoring #Optimization #Industry40 #ZXTalk #AWSCommunityBuilders #DevOpsCorner

[RFC] Rate Limits API Requests

This documentation refer to rate limits of AWS and Datadog API.

A. AWS Throttle API Requests

References:
https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-request-throttling.html

Account-level throttling per Region

By default, API Gateway limits the steady-state requests per second (rps) across all APIs within an AWS account, per Region. It also limits the burst (that is, the maximum bucket size) across all APIs within an AWS account, per Region. In API Gateway, the burst limit corresponds to the maximum number of concurrent request submissions that API Gateway can fulfill at any moment without returning 429 Too Many Requests error responses. For more information on throttling quotas, see Amazon API Gateway quotas and important notes.

To help understand these throttling limits, here are a few examples, given a burst limit of 5,000 and an account-level rate limit of 10,000 requests per second in the Region:

  • If a caller submits 10,000 requests in a one-second period evenly (for example, 10 requests every millisecond), API Gateway processes all requests without dropping any.
  • If the caller sends 10,000 requests in the first millisecond, API Gateway serves 5,000 of those requests and throttles the rest in the one-second period.
  • If the caller submits 5,000 requests in the first millisecond and then evenly spreads another 5,000 requests through the remaining 999 milliseconds (for example, about 5 requests every millisecond), API Gateway processes all 10,000 requests in the one-second period without returning 429 Too Many Requests error responses.
  • If the caller submits 5,000 requests in the first millisecond and waits until the 101st millisecond to submit another 5,000 requests, API Gateway processes 6,000 requests and throttles the rest in the one-second period. This is because at the rate of 10,000 rps, API Gateway has served 1,000 requests after the first 100 milliseconds and thus emptied the bucket by the same amount. Of the next spike of 5,000 requests, 1,000 fill the bucket and are queued to be processed. The other 4,000 exceed the bucket capacity and are discarded.
  • If the caller submits 5,000 requests in the first millisecond, submits 1,000 requests at the 101st millisecond, and then evenly spreads another 4,000 requests through the remaining 899 milliseconds, API Gateway processes all 10,000 requests in the one-second period without throttling.

B. Datadog API

References:
https://docs.datadoghq.com/api/latest/rate-limits/

Rate Limits

All of the API endpoints are rate limited. Once you exceed a certain number of requests in a specific period, Datadog returns an error.

If you are rate limited, you will see a 429 in the response code. Datadog recommends to either wait the time designated by the X-RateLimit-Limit before making calls again, or you should switch to making calls at a frequency slightly longer than the X-RateLimit-Limit / X-RateLimit-Period.

Rate limits can be increased from the defaults by contacting the Datadog support team.

Regarding the API rate limit policy:

  • Datadog does not rate limit on data point/metric submission (see metrics section for more info on how the metric submission rate is handled). Limits encounter is dependent on the quantity of custom metrics based on your agreement.
  • The rate limit for metric retrieval is 100 per hour per organization.
  • The rate limit for event submission is 500,000 events per hour per organization.
  • The rate limit for event aggregation is 1000 per aggregate per day per organization. An aggregate is a group of similar events.
  • The rate limit for the Query a Timeseries API call is 1600 per hour per organization. This can be extended on demand.
  • The rate limit for the Log Query API call is 300 per hour per organization. This can be extended on demand.
  • The rate limit for the Graph a Snapshot API call is 60 per hour per organization. This can be extended on demand.
  • The rate limit for the Log Configuration API is 6000 per minute per organization. This can be extended on demand.
Rate Limit HeadersDescription
X-RateLimit-Limitnumber of requests allowed in a time period.
X-RateLimit-Periodlength of time in seconds for resets (calendar aligned).
X-RateLimit-Remainingnumber of allowed requests left in the current time period.
X-RateLimit-Resettime in seconds until next reset.

[RFC] Postmortem Report

This is sample postmortem reporting to review chronologies, provide the mitigation from the issue and solving the problem during period time

Title

  • YYYY-MM-DD Issue Name.
    eg:
    2020-09-01 Failed to Replicate Database Slave in Node-2.

Issue Summary

  • Summary of issue that describe all chronologies.
    eg:
    We had issue in replication slave server database in node-2. This issue running at 07:00 due to can’t connect the slave server DNS to DNS server master. Impacted to unable connected for some of microservices that using slave server as pointing reading / query read to database.

    List of microservices impacted:
    • Microservices 1: Auth
    • Microservices 2: OTP

Impact

  • List of microservices or other infrastructure resources impacted for this issue.
    eg:
    Impacted microservices:
    • Microservices 1: Auth
    • Microservices 2: OTP

Impacted infra:
DNS slave

Trigger

  • List of trigger issue.
    eg:
    • Cloud provider running on maintenance starting at 2020-09-01 02:00 GMT+7 and end at 2020-09-01 03:00.
    • Some of DNS changed as the impacted of maintenance.

Detection

  • List of detection issue.
    eg:
    • Detect on Metrics for failed replication (with snapshot picture)
    • Detect on Log for dns changes (with snapshot picture)

Root Cause

  • List of root cause for the issue.
    eg:
    • Slave server database in node-2 can’t running due to can’t connect to DNS server master.
    • DNS server master had been moved to other pointing address due to cloud provider maintenance.

Timeline

  • List timeline issue from beginning until end (resolved).
    eg:
    2020-09-01 07:00 Metrics show failed to replicate the slave server database in node-2
    2020-09-01 07:10 Raise the alert on P3 Escalation
    2020-09-01 07:12 Oncall ack the issue
    2020-09-01 07:15 Taking action for manual replication slave server
    2020-09-01 07:30 All Replication had been restored
    2020-09-01 07:35 Monitoring phase replication (for about 10-15 minutes)
    2020-09-01 08:00 Operation slave server database in node-2 is back to normal

Resolution & Recovery

  • List of resolution & recovery action
    eg:
    • Manual replication for slave server
    • Repointing DNS slave node-2 to new DNS master

Corrective and Preventive Measurements

  • List of action item / procedure to make correction & prevention (as mitigation)
    eg:
    • Update threshold metrics for alerting, raise to P2 for escalation level.
    • Raise open ticket for cloud provider dns issue moving impact.

Financial Impact

Product Impacted Start DateTime – End DateTime Impact Type
(Outage, Error Rates, Latency Spike)
Monitoring Links Log Links
         
         
  • Detail of Financial Impact

Division / Team Name

List of division / team which impacted for this postmortem

Related documentation for this issue (JIRA / Confluences)

[RFC] Performance Testing K6

Monitoring Dashboard

  • Monitoring Dashboard URL

Logging

  • Logging Dashboard URL

Operations (Executors)

PIC Name Department
DevOps Engineer – 1 DevOps
DevOps Engineer – 2
QA Engineer – 1 QA
QA Engineer – 2
Software Engineer – 1 Engineering
Software Engineer – 2

Supervisors

Supervisor Name Department Remark
@zeroc0d3-devops DevOps  
@zeroc0d3-engineer Engineering  
@zeroc0d3-iot IoT  
@zeroc0d3-data Data  

HelmChart

Deployments Request Limit
CPI (mi) Mem (mb) CPU (mi) Mem (mb)
         

Performance Test Report

Cycle Virtual User (vus) Duration (seconds) Date / Time Service Component Before Inprogress After Jenkins
Link
Monitoring Link Remark (Logs)
Start End CPU (mi) Mem (mb) CPU (mi) Mem (mb) CPU (mi) Mem (mb) Performance Test Process After Performance Test Process
1         EKS <deployment-name>                    
RDS <rds-name>/<db-name>             N/A     N/A

References:

K6 Website:    https://k6.io/
K6 SourceCode: https://github.com/grafana/k6

[RFC] Logging

A. Concepts

Standardization export log path and name

eg:
----
/var/log/[microservice-name]/[microservice-name]-error.log   # error only
/var/log/[microservice-name]/[microservice-name].log         # info, warning & debug

Log using JSON formatted

Severity logs & formatting logs

eg: INFO
---
{
  "datetime": "2020-10-10 20:01:59TZ+0700"
  "severity": "info",
  "message": "yes, this is info"
}

eg: WARNING
---
{
  "datetime": "2020-10-10 20:01:59TZ+0700"
  "severity": "warning",
  "message": "this is warning"
}

eg: ERROR
---
{
  "datetime": "2020-10-10 20:01:59TZ+0700"
  "severity": "error",
  "code": 404
  "message": "not found"
}

eg: DEBUG (optional)
---
{
  "datetime": "2020-10-10 20:01:59TZ+0700"
  "severity": "debug",
  "code": 100
  "message": "describe debug information (criteria by number) "
}

Logrotation & compression

# /etc/logrotate.d/[microservice-name]
---
/var/log/[microservice-name]/[microservice-name].log {
        rotate 12
        weekly
        missingok
        notifempty
        compress
        delaycompress
        size 50M
        notifempty
        sharedscripts
        postrotate
           /usr/bin/killall -HUP [microservice-name]
        endscript
}

/var/log/[microservice-name]/[microservice-name]-error.log {
        rotate 12
        weekly
        missingok
        notifempty
        compress
        delaycompress
        size 50M
        notifempty
        sharedscripts
        postrotate
           /usr/bin/killall -HUP [microservice-name]
        endscript
}

Log4j (JAVA)

# log4j.properties
---
log4j.rootLogger=INFO, fileLogger
log4j.appender.fileLogger=org.apache.log4j.RollingFileAppender
log4j.appender.fileLogger.layout=org.apache.log4j.PatternLayout
log4j.appender.fileLogger.layout.ConversionPattern=%d [%t] %-5p (%F:%L) - %m%n
log4j.appender.fileLogger.File=example.log
log4j.appender.fileLogger.MaxFileSize=50MB
log4j.appender.fileLogger.MaxBackupIndex=12

Schedule logging (log exporter)

  • Schedule with cron (crontab)
/etc/cron.d/[microservice-name]
  • Schedule with systemd
/etc/systemd/system/[microservice-name].service
/etc/systemd/system/[microservice-name].timer

B. Tools

  • GO

https://github.com/sirupsen/logrus

  • Python
from datetime import datetime
import logging
import time
import json
       
def main():
    print("--- Staring Log Exporter Agent ---")
    logging.basicConfig(level=logging.INFO, filename="/var/log/[microservice-name]/[microservice-name].log", format="%(message)s")

if __name__ == '__main__':
    main()