-
Notifications
You must be signed in to change notification settings - Fork 4.9k
refactor: separate validation result handling from coercer #69113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. Helpful Resources
PR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
|
|
Note Detected that there are differences in the Gradle dependencies. |
| @Singleton | ||
| class MetricTracker { | ||
|
|
||
| private val metrics: MutableMap<String, Double> = ConcurrentHashMap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we're tracking a set of known metrics and we should always publish 0, should we just make this a class instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gosusnp Maybe...hadn't gotten that far yet. We should either a) declare all of the metrics in the tracker/some other class that the tracker can access and initialize them to 0 on creation or b) make the caller initialize them. Seems like A makes more sense and roughly follows what we do in the platform the the metric registry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gosusnp I took a swing at adding a metric registry (name to be determined) that handles the initialization to 0 for each known metric.
| * value.nullify( | ||
| * AirbyteRecordMessageMetaChange.Reason.DESTINATION_FIELD_SIZE_LIMITATION | ||
| * ) | ||
| * ShouldNullify(AirbyteRecordMessageMetaChange.Reason.DESTINATION_FIELD_SIZE_LIMITATION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous was doing nullify but shouldn't this be truncate instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gosusnp Not sure why the previous was doing nullify. If we want to change this, we should do this as a separate change after we get whatever refactor we need in place to make it easy to track/revert the change.
| } | ||
|
|
||
| override fun validate(value: EnrichedAirbyteValue): EnrichedAirbyteValue { | ||
| override fun validate(value: EnrichedAirbyteValue): ValidationResult = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the only place where we can mutate and introduce meta.changes? I thought this might happen in .map as well, example I am thinking about would be bad timestamps maybe.
If this is the case, shouldn't the ValidationResult be more a Envelop of value with the corresponding mutation tracked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gosusnp As far as I can tell, the answer is yes in the new "data flow" CDK. However, those methods (nullify and truncate) are in the EnrichedAirbyteValue so it may be that they are getting called from other places in the destinations that use the new data flow CDK when we get in there. That seems to be a mistake (over exposing those methods) so in a way, this change will encapsulate it to make it harder for code to call these methods outside of the coercer/validation framework.
|
@gosusnp @subodh1810 I have added a change to the |
What
How
validationmethod interface of theValueCoercerto return aValidationResultValidationResultto perform the "munging" in a separate singleton that is also responsible for tracking metrics related to the mungingN.B. This is a breaking interface change that will require modifications in each destination that uses the new data flow to match the new interface. It is not a breaking change from an end-user point of view. Also note that I did not remove the
nullifyandtruncatemethods from theEnrichedAirbyteValue, as those are used still in non-data flow CDK code. There is probably a missing abstraction/consolidation piece to avoid having the pipe the result ofcoercer.validate()intovalidationResultHandler.handle. If we like this approach, I can work on that next in this PR to introduce one layer that the converter can call to get the fully validated and munged result in one call.Review guide
ValueCoercer.ktValidationResultHandler.ktMetricTracker.ktJsonConverter.kt/ProtobufConverter.ktCan this PR be safely reverted and rolled back?