Fix st2-self-check script reporting success on failed runs #5487

arm4b · 2021-12-08T20:26:53Z

When the test run contains several nested workflows the script assumes it's succeeded if at least a single sub-workflow was successful. Instead of that, the check should refer to the parent workflow with "status: succeeded".

Example

id: 61b105a82f44ed586b2d24d0
action.ref: tests.test_inquiry_chain
parameters: 
  protocol: http
  token: bbeb3f7d29e240ac83ce3d53fff90745
status: failed 
result_task: get_workflow_details_1
result: 
  failed: true
  return_code: 2
  stderr: ""
  stdout: ''
  succeeded: false 
error: ""
traceback: None
failed_on: get_workflow_details_1
start_timestamp: Wed, 08 Dec 2021 19:21:12 UTC
end_timestamp: Wed, 08 Dec 2021 19:21:23 UTC
log: 
  - status: requested
    timestamp: '2021-12-08T19:21:12.277000Z'
  - status: scheduled
    timestamp: '2021-12-08T19:21:12.434000Z'
  - status: running
    timestamp: '2021-12-08T19:21:12.697000Z'
  - status: failed
    timestamp: '2021-12-08T19:21:23.334000Z'
+--------------------------+------------------------+--------------------------+------------+-------------------------------+
| id                       | status                 | task                     | action     | start_timestamp               |
+--------------------------+------------------------+--------------------------+------------+-------------------------------+
| 61b105a8b1a7886ef90b7b15 | succeeded (4s elapsed) | execute_inquiry_workflow | core.local | Wed, 08 Dec 2021 19:21:12 UTC |
| 61b105acb1a7886ef90b7b17 | succeeded (2s elapsed) | get_inquiry_trigger      | core.local | Wed, 08 Dec 2021 19:21:16 UTC |
| 61b105afb1a7886ef90b7b19 | succeeded (0s elapsed) | get_inquiry_id           | core.local | Wed, 08 Dec 2021 19:21:19 UTC |
| 61b105b0b1a7886ef90b7b1b | succeeded (1s elapsed) | get_workflow_id          | core.local | Wed, 08 Dec 2021 19:21:20 UTC |
| 61b105b2b1a7886ef90b7b1d | failed (1s elapsed)    | get_workflow_details_1   | core.local | Wed, 08 Dec 2021 19:21:22 UTC |
+--------------------------+------------------------+--------------------------+------------+-------------------------------+

When the test run contains several nested workflows the script assumes it's succeeded if at least single sub-workflow was successful. Instead of that, the check should refer to the parent workflow "status: succeeded".

st2common/bin/st2-self-check

nzlosh · 2021-12-09T14:30:14Z

Other than a small restriction on the regex pattern, this looks good to me.

…ubstitution ``` echo ${OUTPUT} ``` results in entire output being on a single line, which is why it breaks the grep rule at the first place. Fixes #5487 when st2-self-check script reported success on failed runs.

st2common/bin/st2-self-check

nzlosh

LGTM

cognifloyd · 2021-12-09T21:53:57Z

Now that you've already figured it out, I'm somewhere I can look up how I've dealt with something similar.

First I get an execution id (I only did chatops.match_and_execute, but it should be generalizable):

match_and_execute() {
  # usage: match_and_execute "<alias>" <timeout>
  # returns the execution id
  echoerr "chatops> ${1}"
  st2 --cacert=true run chatops.match_and_execute text="${1}" source_channel=vela user=${VELA_BUILD_AUTHOR} timeout=${2:-600} --attr result.result --json | jq -r .result.result
}

Then I query an execution status from bash:

assert_status() {
  # usage: assert_status <exec_id> <expected_status>
  local exec_status
  exec_status=$(st2 --cacert=true execution get --detail --json ${1} | jq -r .status)
  if [ "${2}" != "${exec_status}" ]; then
    echo failed
    echoerr "The alias execution should be ${2} but it is ${exec_status}!"
    return 4
  fi
  echo passed
  echo
}

Note how I used --detail --json. --detail makes it put only the current execution (no children) in a table, and then --json says "actually give me the table as json". Thus, you only get info about the current execution in json format instead of a series of json objects for the execution and its children.

This is what one of my tests using those functions looks like:

ALIAS_EXEC_ID=$(match_and_execute "tc list" 120)
echo ALIAS_EXEC_ID=${ALIAS_EXEC_ID}
assert_status "${ALIAS_EXEC_ID}" "succeeded"

cognifloyd · 2021-12-09T21:54:36Z

To be clear, I'm not recommending you change the approach for this PR. I just wanted to share it while I was thinking about it as a possible future enhancement.

arm4b · 2021-12-09T22:11:41Z

@cognifloyd That does multiple requests, adding smart code, complexity, and jq as a dependency.

Form the dull code we have:

    OUTPUT=$(st2 run ${TEST} protocol=${PROTOCOL} token=${ST2_AUTH_TOKEN})
    echo "${OUTPUT}" | grep "status" | grep -q "succeeded"
    EXIT_CODE=$?

    if [ ${EXIT_CODE} -ne 0 ]; then
        echo "Test output: ${OUTPUT}"
        ((ERRORS++))
    fi

In this case, the workflow would show the original human-readable execution output as-is, as someone would run it.
That's helpful in identifying more quickly in logs what's wrong when st2-self-check errored.
When the e2e tests failed, someone is looking at logs, and the instance is destroyed already, the things should be as simple and obvious as possible.

Just a different approach depending on what matters more in a particular situation.

arm4b · 2021-12-09T22:20:52Z

@amanda11 @nzlosh @cognifloyd Thanks everyone for the reviews 😉 We found a better way!

Fix st2-self-check script reporting success on failed runs

b3c4491

When the test run contains several nested workflows the script assumes it's succeeded if at least single sub-workflow was successful. Instead of that, the check should refer to the parent workflow "status: succeeded".

arm4b added the bug label Dec 8, 2021

pull-request-size bot added the size/XS PR that changes 0-9 lines. Quick fix/merge. label Dec 8, 2021

Add the changelog for #5487

8e09d0e

arm4b modified the milestone: 3.7.0 Dec 8, 2021

arm4b requested a review from a team December 8, 2021 21:22

nzlosh reviewed Dec 9, 2021

View reviewed changes

st2common/bin/st2-self-check Outdated Show resolved Hide resolved

arm4b requested a review from nzlosh December 9, 2021 15:16

arm4b commented Dec 9, 2021

View reviewed changes

st2common/bin/st2-self-check Show resolved Hide resolved

nzlosh approved these changes Dec 9, 2021

View reviewed changes

arm4b merged commit 8420fef into master Dec 9, 2021

arm4b deleted the fix/st2-self-check branch December 9, 2021 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix st2-self-check script reporting success on failed runs #5487

Fix st2-self-check script reporting success on failed runs #5487

Uh oh!

arm4b commented Dec 8, 2021 •

edited

Loading

Uh oh!

nzlosh commented Dec 9, 2021

Uh oh!

nzlosh left a comment

cognifloyd commented Dec 9, 2021

cognifloyd commented Dec 9, 2021

arm4b commented Dec 9, 2021 •

edited

Loading

arm4b commented Dec 9, 2021 •

edited

Loading

Labels

5 participants

Uh oh!

Fix st2-self-check script reporting success on failed runs #5487

Fix st2-self-check script reporting success on failed runs #5487

Uh oh!

Conversation

arm4b commented Dec 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nzlosh commented Dec 9, 2021

Uh oh!

nzlosh left a comment

Choose a reason for hiding this comment

cognifloyd commented Dec 9, 2021

cognifloyd commented Dec 9, 2021

arm4b commented Dec 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

arm4b commented Dec 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Labels

5 participants

arm4b commented Dec 8, 2021 •

edited

Loading

arm4b commented Dec 9, 2021 •

edited

Loading

arm4b commented Dec 9, 2021 •

edited

Loading