Fixed panics during large file parsing #86
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes the parsing issues when it comes to big' ole real-life GenbankFlatFiles.
Note: I have no idea WHY exactly it fixed them all. Basically, all I did was add length checks in 4 places:
quickMetaCheck(if len(line) == 0, flag is false)quickSubMetaCheck(if len(line) == 0, flag is false)check if len(startEndSplit) is > 1 before doing
end, _ = strconv.Atoi(partialRegex.ReplaceAllString(startEndSplit[1], ""))finally, check if there is still line to parse over when trying to find paratheses in
switch expression[firstInnerParentheses+i]. Apparently, sometimes there ain't?And commented out a line of code
attributeValue = strings.TrimSpace(attributeSplit[1])I am not sure why this line is added - it is completely untested, and if you actually encounter it the system panics.
In particular, I tested with the following genbank file https://siasky.net/AABxAejn_kkw0H5O8ECVdPhFuMXy1uib_Z8B5JE5DctSuQ
It is pretty large, so I didn't actually push it to the main branch. It also takes quite a few seconds to parse, so there is that too.