All iupac variants #106

tijeco · 2021-04-23T14:50:01Z

This is for #92, a port of all_iupac_variants from easy_dna.

IUPAC codes are stored as a map of rune slices. The cartesian products are then generated using a retooled version of the function found in github.com/schwarmco/go-cartesian-product, which uses recursion and go routines, so it should scale quite nicely to large sequences.

TimothyStiles · 2021-04-23T15:21:10Z

sequence.go

+	}
+}
+
+func allVariantsIUPAC(seq string) []string {


Make public (capitalize allVariantsIUPAC -> AllVariantsIUPAC ) and create export comment starting as // AllVariantsIUPAC (insert description here)

@TimothyStiles Does the export comment go on the same line as the function declaration?

TimothyStiles · 2021-04-23T15:21:33Z

sequence.go

+
+// the following functions Iter and iter, derive from github.com/schwarmco/go-cartesian-product
+// which uses interfaces, so I modified it to use runes
+func Iter(params ...[]rune) chan []rune {


make private.

TimothyStiles · 2021-04-23T15:43:18Z

This is awesome @tijeco! I've added a couple things and will review more later today when I get the chance.

Commenting looks good. Variables could have more descriptive names. Are there any standard test cases we could use to test it?

tijeco · 2021-04-23T15:53:30Z

@TimothyStiles thanks! From easy_dna the sample was as follows

>>> all_iupac_variants('ATN')
>>> ['ATA', 'ATC', 'ATG', 'ATT']

The output from the port I wrote won't be sorted, and I think because it uses go routines the order could be different each time.

So a test would either need to sort the output, or the function would need to return sorted output.

I don't know which is preferred.

TimothyStiles · 2021-04-23T17:17:16Z

@tijeco In this case I think we can just sort the output for the test. There may be something we can do about it being returned from the function sorted but we don't have to do that right now.

Do you think we could find a stronger test case? Something like "The quick brown fox jumped over the lazy dog", but for degenerate base pairs? N, B, V, etc.

tijeco · 2021-04-23T18:55:39Z

@TimothyStiles Sounds good! I will work on a test set and put it in a separate pull request for sequence_test.go

TimothyStiles · 2021-04-23T18:57:06Z

@tijeco could you put it in this pull request? Would be easier for me to manage. All you have to do is push commits to the PR in the branch and it should auto update.

tijeco · 2021-04-23T18:58:27Z

@TimothyStiles Will do! Thanks!

I don't know if I'll have time to work on it today/this weekend, but I definitely will early next week.

Koeng101 · 2021-04-25T18:36:23Z

So here is a potential problem: RAM issues.

Each goroutine takes about 2Kb of memory https://stackoverflow.com/questions/22326765/go-memory-consumption-with-many-goroutines

If you have 10N, for example, this would spawn (potentially) 1,048,576 goroutines, taking up nearly 2GB of memory. Meanwhile, if you did an iterative function without goroutines, you could possibly reduce that amount of RAM by a large amount because a new goroutine wouldn't be spawned for every sequence branch. Since append functions do not take much CPU, you might get equivalent performance with a much simpler stack that takes far less RAM.

Could you try some harder benchmarks to see where the concurrent method breaks down?

tijeco · 2021-04-29T17:20:52Z

@Koeng101 thanks for pointing that out! I'll be honest, go routines and concurrency is still a new thing to me, so I'm glad to have this opportunity to work on this.

I'll try to make a non-concurrent version of the function as well and do some benchmarks to see how memory/cpu usage varies between the two.

TimothyStiles · 2021-05-15T21:31:04Z

@tijeco can you pull down the latest from prime and merge it with your PR branch? Otherwise it'll delete some recent changes to the file. After that I should be able to run CI and do a proper review.

…_variants

tijeco · 2021-05-15T22:45:50Z

@TimothyStiles I think I did the thing?

sequence.go

Koeng101 · 2021-05-15T23:02:19Z

sequence.go

+	allVariants := make([][]rune, possibleVariants)              // this is the 2D slice where all variants will be stored
+	variantHolders := make([]rune, possibleVariants*len(inList)) // this is an empty slice with a length totaling the size of all input characters
+	variantChoices := make([]int, len(inList))                   // these will be all the possible variants


Why are these allocations necessary? Can you run them without?

These allocations are helping to reduce the total memory footprint. Full disclosure: this is a retool of the cartesian product from Rosetta code https://rosettacode.org/wiki/Cartesian_product_of_two_or_more_lists

sequence.go

TimothyStiles · 2021-05-15T23:51:55Z

@tijeco I did some finagling and fixed some bugs and the initial test itself. Think you could add an example function that will be rendered to our docs similar to this?

tijeco added 2 commits April 23, 2021 10:41

all_iupac_variants port from easy_dna

61cf852

allVariantsIUPAC bebop#92

599862f

TimothyStiles reviewed Apr 23, 2021

View reviewed changes

tijeco added 2 commits May 15, 2021 12:24

single thread version and test (bebop#92)

9ac589b

capitalize AllVariantsIUPAC (bebop#92)

555f3d6

Merge branch 'prime' of https://github.com/tijeco/poly into all_iupac…

194ff25

…_variants

Koeng101 reviewed May 15, 2021

View reviewed changes

TimothyStiles added 3 commits May 15, 2021 16:37

merging upstream.

2b4c53a

fixed typo induced bug.

f924f50

fixed spacing in RandomExampleRandomProteinSequence.

c2b2dfa

tijeco added 3 commits May 15, 2021 20:31

upper case, error handling, and MENDEL example

529459b

catch error in test

950a49c

change to ExampleAllVariantsIUPAC

90ecc31

TimothyStiles merged commit 72be193 into bebop:prime May 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

All iupac variants #106

All iupac variants #106

Uh oh!

tijeco commented Apr 23, 2021

TimothyStiles Apr 23, 2021

tijeco May 15, 2021

TimothyStiles Apr 23, 2021

TimothyStiles commented Apr 23, 2021

tijeco commented Apr 23, 2021

TimothyStiles commented Apr 23, 2021 •

edited

Loading

tijeco commented Apr 23, 2021

TimothyStiles commented Apr 23, 2021

tijeco commented Apr 23, 2021

Koeng101 commented Apr 25, 2021

tijeco commented Apr 29, 2021

TimothyStiles commented May 15, 2021

tijeco commented May 15, 2021

Uh oh!

Koeng101 May 15, 2021

tijeco May 15, 2021

Uh oh!

TimothyStiles commented May 15, 2021

Labels

3 participants

Uh oh!

All iupac variants #106

All iupac variants #106

Uh oh!

Conversation

tijeco commented Apr 23, 2021

TimothyStiles Apr 23, 2021

Choose a reason for hiding this comment

tijeco May 15, 2021

Choose a reason for hiding this comment

TimothyStiles Apr 23, 2021

Choose a reason for hiding this comment

TimothyStiles commented Apr 23, 2021

tijeco commented Apr 23, 2021

TimothyStiles commented Apr 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

tijeco commented Apr 23, 2021

TimothyStiles commented Apr 23, 2021

tijeco commented Apr 23, 2021

Koeng101 commented Apr 25, 2021

tijeco commented Apr 29, 2021

TimothyStiles commented May 15, 2021

tijeco commented May 15, 2021

Uh oh!

Koeng101 May 15, 2021

Choose a reason for hiding this comment

tijeco May 15, 2021

Choose a reason for hiding this comment

Uh oh!

TimothyStiles commented May 15, 2021

Labels

3 participants

TimothyStiles commented Apr 23, 2021 •

edited

Loading