Skip to content

Conversation

@arm4b
Copy link
Member

@arm4b arm4b commented Dec 8, 2021

When the test run contains several nested workflows the script assumes it's succeeded if at least a single sub-workflow was successful. Instead of that, the check should refer to the parent workflow with "status: succeeded".

Example

id: 61b105a82f44ed586b2d24d0
action.ref: tests.test_inquiry_chain
parameters: 
  protocol: http
  token: bbeb3f7d29e240ac83ce3d53fff90745
status: failed 
result_task: get_workflow_details_1
result: 
  failed: true
  return_code: 2
  stderr: ""
  stdout: ''
  succeeded: false 
error: ""
traceback: None
failed_on: get_workflow_details_1
start_timestamp: Wed, 08 Dec 2021 19:21:12 UTC
end_timestamp: Wed, 08 Dec 2021 19:21:23 UTC
log: 
  - status: requested
    timestamp: '2021-12-08T19:21:12.277000Z'
  - status: scheduled
    timestamp: '2021-12-08T19:21:12.434000Z'
  - status: running
    timestamp: '2021-12-08T19:21:12.697000Z'
  - status: failed
    timestamp: '2021-12-08T19:21:23.334000Z'
+--------------------------+------------------------+--------------------------+------------+-------------------------------+
| id                       | status                 | task                     | action     | start_timestamp               |
+--------------------------+------------------------+--------------------------+------------+-------------------------------+
| 61b105a8b1a7886ef90b7b15 | succeeded (4s elapsed) | execute_inquiry_workflow | core.local | Wed, 08 Dec 2021 19:21:12 UTC |
| 61b105acb1a7886ef90b7b17 | succeeded (2s elapsed) | get_inquiry_trigger      | core.local | Wed, 08 Dec 2021 19:21:16 UTC |
| 61b105afb1a7886ef90b7b19 | succeeded (0s elapsed) | get_inquiry_id           | core.local | Wed, 08 Dec 2021 19:21:19 UTC |
| 61b105b0b1a7886ef90b7b1b | succeeded (1s elapsed) | get_workflow_id          | core.local | Wed, 08 Dec 2021 19:21:20 UTC |
| 61b105b2b1a7886ef90b7b1d | failed (1s elapsed)    | get_workflow_details_1   | core.local | Wed, 08 Dec 2021 19:21:22 UTC |
+--------------------------+------------------------+--------------------------+------------+-------------------------------+
When the test run contains several nested workflows the script assumes it's succeeded if at least single sub-workflow was successful.
Instead of that, the check should refer to the parent workflow "status: succeeded".
@arm4b arm4b added the bug label Dec 8, 2021
@pull-request-size pull-request-size bot added the size/XS PR that changes 0-9 lines. Quick fix/merge. label Dec 8, 2021
@arm4b arm4b modified the milestone: 3.7.0 Dec 8, 2021
@arm4b arm4b requested a review from a team December 8, 2021 21:22
@nzlosh
Copy link
Contributor

nzlosh commented Dec 9, 2021

Other than a small restriction on the regex pattern, this looks good to me.

@arm4b arm4b requested a review from nzlosh December 9, 2021 15:16
…ubstitution

```
echo ${OUTPUT}
```
results in entire output being on a single line, which is why it breaks the grep rule at the first place.

Fixes #5487 when st2-self-check script reported success on failed runs.
Copy link
Contributor

@nzlosh nzlosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cognifloyd
Copy link
Member

Now that you've already figured it out, I'm somewhere I can look up how I've dealt with something similar.

First I get an execution id (I only did chatops.match_and_execute, but it should be generalizable):

match_and_execute() {
  # usage: match_and_execute "<alias>" <timeout>
  # returns the execution id
  echoerr "chatops> ${1}"
  st2 --cacert=true run chatops.match_and_execute text="${1}" source_channel=vela user=${VELA_BUILD_AUTHOR} timeout=${2:-600} --attr result.result --json | jq -r .result.result
}

Then I query an execution status from bash:

assert_status() {
  # usage: assert_status <exec_id> <expected_status>
  local exec_status
  exec_status=$(st2 --cacert=true execution get --detail --json ${1} | jq -r .status)
  if [ "${2}" != "${exec_status}" ]; then
    echo failed
    echoerr "The alias execution should be ${2} but it is ${exec_status}!"
    return 4
  fi
  echo passed
  echo
}

Note how I used --detail --json. --detail makes it put only the current execution (no children) in a table, and then --json says "actually give me the table as json". Thus, you only get info about the current execution in json format instead of a series of json objects for the execution and its children.

This is what one of my tests using those functions looks like:

ALIAS_EXEC_ID=$(match_and_execute "tc list" 120)
echo ALIAS_EXEC_ID=${ALIAS_EXEC_ID}
assert_status "${ALIAS_EXEC_ID}" "succeeded"
@cognifloyd
Copy link
Member

To be clear, I'm not recommending you change the approach for this PR. I just wanted to share it while I was thinking about it as a possible future enhancement.

@arm4b
Copy link
Member Author

arm4b commented Dec 9, 2021

@cognifloyd That does multiple requests, adding smart code, complexity, and jq as a dependency.

Form the dull code we have:

    OUTPUT=$(st2 run ${TEST} protocol=${PROTOCOL} token=${ST2_AUTH_TOKEN})
    echo "${OUTPUT}" | grep "status" | grep -q "succeeded"
    EXIT_CODE=$?

    if [ ${EXIT_CODE} -ne 0 ]; then
        echo "Test output: ${OUTPUT}"
        ((ERRORS++))
    fi

In this case, the workflow would show the original human-readable execution output as-is, as someone would run it.
That's helpful in identifying more quickly in logs what's wrong when st2-self-check errored.
When the e2e tests failed, someone is looking at logs, and the instance is destroyed already, the things should be as simple and obvious as possible.

Just a different approach depending on what matters more in a particular situation.

@arm4b arm4b merged commit 8420fef into master Dec 9, 2021
@arm4b arm4b deleted the fix/st2-self-check branch December 9, 2021 22:11
@arm4b
Copy link
Member Author

arm4b commented Dec 9, 2021

@amanda11 @nzlosh @cognifloyd Thanks everyone for the reviews 😉 We found a better way!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug size/XS PR that changes 0-9 lines. Quick fix/merge.

5 participants