So I need to find an efficient way to iterate over the big list in python.
Given: array of integers and number(length of a sublist)
Constraints: array up to 100K elements, elements in range(1,2**31)
Task: For every sublist find difference between max and min number. Print out the biggest difference.
Ex: [4,6,3,4,8,1,9], number = 3
As far as I understand I have to go through every sublist:
[4,6,3] max - min = 6 - 3 = 3
[6,3,4] 3
[3,4,8] 5
[4,8,1] 7
[8,1,9] 8
final max = 8
So my solution is:
import time
def difference(arr, number):
maxDiff = 0
i = 0
while i+number != len(arr)+1:
diff = max(arr[i:i+number]) - min(arr[i:i+number])
if diff > maxDiff:
maxDiff = diff
i += 1
print maxDiff
length = 2**31
arr = random.sample(xrange(length),100000) #array wasn't given. My sample
t0 = time.clock()
difference(arr,3)
print 'It took :',time.clock() - t0
Answer:
2147101251
It took : 5.174262
I also did the same with for loops which gives worse time:
def difference(arr,d):
maxDiff = 0
if len(arr) == 0:
maxDiff = 0
elif len(arr) == 1:
maxDiff = arr[0]
else:
i = 0
while i + d != len(arr)+1:
array = []
for j in xrange(d):
array.append(arr[i + j])
diff = max(array) - min(array)
if diff > maxDiff:
maxDiff = diff
i += 1
print maxDiff
length = 2**31
arr = random.sample(xrange(length),100000) #array wasn't given. My sample
t0 = time.clock()
difference(arr,1000)
print 'It took :',time.clock() - t0
Answer:
2147331163
It took : 14.104639
My challenge was to reduce time to 2 sec.
What would be the most efficient way to do this???
Based on answer and comment of @rchang and @gknicker I was able to get improvement. I'm wondering if there is something else I can do?
def difference(arr,d):
window = arr[:d]
arrayLength = len(arr)
maxArrayDiff = max(arr) - min(arr)
maxDiff = 0
while d < arrayLength:
localMax = max(window)
if localMax > maxDiff:
diff = localMax - min(window)
if diff == maxArrayDiff:
return diff
break
elif diff > maxDiff:
maxDiff = diff
window.pop(0)
window.append(arr[d])
d += 1
return maxDiff
#arr = [3,4,6,15,7,2,14,8,1,6,1,2,3,10,1]
length = 2**31
arr = random.sample(xrange(length),100000)
t0 = time.clock()
print difference(arr,1000)
print 'It took :',time.clock() - t0
Answer:
2147274599
It took : 2.54171
Not bad. Any other suggestions?
numpywhich is a module specialized to handle (large) numerical data. It is usually quite a bit faster that pure python.numpy@Peter. I've not yet been able to get anumpyimplementation to outperform the answer I posted - it's possible this body of data is not sufficiently large to reap the benefits. Or there could be something jacked up about how I'm doing this withnumpy. I'll explore later today if I get free time.