-4

This script takes lines from a file of lottery numbers, and is supposed to check for errors. For example, in the following:

week 1;17,19,35,23,8,20,36
week 2;24,28,35,8,3,22
week x;23,29,38,1,35,18,25
week 4;21,2,22,14,4,28,38
week 5;5,37,20,15,3,14,9
week 6;6,29,7,14,16,18,1
week 7;24,31,14,23,4,3,29
week 8;32,21,26,1,15aa,14,17
week 9;8,13,25,12,33,34,35
week 10;29,27,30,13,7,38,26
week 11;34,3,7,24,16,20,38
week 12;15,28,2,29,16,10,8
week 13;32,22,13,14,21,28,26
week 14;37,4,20,3,1,33,10
week 1a5;17,8,38,18,9,32,25

Weeks 3 and 15 should be removed for the errors in week number.

I used an integer check to do this, and that worked with every week except week 3. I added a "print True" step, and it's not printed, but then also not removed. I asked AI, which said it was because it didn't bring up an error, but it seems to, and I get the same error result for integer checks with 'x' and '1a5'.

def filter_incorrect():
    with open("lottery_numbers.csv") as sourcefile:
        entries = []
        for line in sourcefile:
            line = line.replace("\n","")
            line = line.replace(";",",")
            parts = line.split(",")
            entries.append(parts)
        for result in entries:
            
            try:
                int(result[0].lstrip("week ")) == result[0].lstrip("week ")
                print(result[0] + " True")
                pass
            except:
                entries.remove(result)
                print (len(entries))
            try:
                for i in range(1,len(result)):
                    int(result[i]) == result[i]
                    pass
            except:
                entries.remove(result)
            
            if len(result) != 8:
                entries.remove(result)
            try:
                for i in range(1,len(result)):
                    if int(result[i]) > 39 or int(result[i]) < 1:
                        entries.remove(result)
            except:
                pass
            for i in range(1,len(result)):
                for j in range(1,len(result)):
                    if result[j] == result[i] and i != j:
                        entries.remove(result)
        
        print(len(entries))
        with open("correct_numbers.csv","w") as resultsfile:
            for entry in entries:
                resultsfile.write(f"{entry[0]};{entry[1]},{entry[2]},{entry[3]},{entry[4]},{entry[5]},{entry[6]},{entry[7]}\n")

I included the whole thing but it's just lines 12 to 18 that I think I'm looking at now. I think AI gave me a different way to do it, but I'd also like to know what's wrong with this. Actually it misses some other errors, and I'm struggling to learn from my mistakes.

1
  • Looks like week 2 is also invalid as it only has 6 numbers whereas 7 seems to be prerequisite Commented Aug 13, 2025 at 11:26

2 Answers 2

2

You’re hitting two separate issues:

  1. You’re removing from the list you’re iterating. Doing for result in entries: and then entries.remove(result) will skip elements unpredictably. That’s exactly why the bad “week x” line sometimes survives.

  2. Your week parsing/check is brittle:

    • lstrip("week ") does not remove the exact prefix; it removes any of those characters until a different one appears. Use startswith and slice or split.

    • int(result[0].lstrip("week ")) == result[0].lstrip("week ") compares an int to a str and its result is ignored anyway. If int(...) succeeds, that’s all you need; otherwise it will raise ValueError.

    • Bare except: hides real bugs. Catch ValueError explicitly.

There are more inefficiencies:

  • Don’t replace(";", ","). Split once on ;, then split the right side on ,.

  • Convert numbers to int once and reuse.

  • Don’t repeatedly call entries.remove(result) throughout; validate then keep or drop.

A compact, correct rewrite that keeps only valid rows:

def filter_incorrect(src="lottery_numbers.csv", dst="correct_numbers.csv"):
    good = []
    with open(src) as f:
        for line in f:
            line = line.strip()
            if not line or ";" not in line:
                continue

            left, right = line.split(";", 1)

            # week prefix and numeric week id
            prefix = "week "
            if not left.startswith(prefix):
                continue
            wk_str = left[len(prefix):]
            if not wk_str.isdigit():
                continue  # rejects 'x', '1a5', etc.

            nums = right.split(",")
            if len(nums) != 7:
                continue

            try:
                xs = [int(n) for n in nums]
            except ValueError:
                continue  # rejects non-integers like '15aa'

            if any(x < 1 or x > 39 for x in xs):
                continue
            if len(set(xs)) != 7:
                continue  # duplicates

            good.append((left, xs))

    with open(dst, "w") as out:
        for left, xs in good:
            out.write(f"{left};{','.join(map(str, xs))}\n")

If you want to minimally patch your current approach without the full rewrite:

  • iterate over a copy: for result in entries[:]

  • replace lstrip("week ") with:

if not result[0].startswith("week "):
    entries.remove(result); continue
wk = result[0].split(" ", 1)[1]
try:
    int(wk)
except ValueError:
    entries.remove(result); continue

But the first rewrite is simpler and avoids in-loop mutation altogether.

Sign up to request clarification or add additional context in comments.

1 Comment

Waa, thank you for all the feedback and advice.
1

You're over-complicating this.

You should start by splitting each line on semicolon. You expect 2 tokens. The first should be a string of the form "week N". Check that it starts with "week" and that N is a valid integer.

The second token should be a string of 7 numbers separated by comma. Split that string (on comma) and check that all resulting tokens are integers and that they are all different.

You should also probably check that the numbers are all within a certain range (not implemented here as I have no idea what that range should be).

IN = "lottery_numbers.csv"
OUT = "correct_numbers.csv"

with open(IN) as source, open(OUT, "w") as target:
    for line in map(str.rstrip, source):
        try:
            week, numbers = line.split(";") # split on semicolon
            w, n = week.split() # split on whitespace
            if w == "week" and n.isdecimal():
                ns = numbers.split(",") # split on comma
                if len(set(map(int, ns))) == 7:
                    print(line, file=target)
        except ValueError:
            # This exception could arise for one of 2 reasons in this case
            # 1. A string split may not result in the expected number of tokens (unpacking)
            # 2. A string may not convert to int
            pass

Note use of str.isdecimal rather than str.isdigit. This is deliberate because if n.isdigit() is True, n cannot necessarily be converted to a Python int

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.