In this python script that checks for integer values, why is one entry not removed? It's the letter x. Works for every other line [closed]

Question

Closed. This question needs debugging details. It is not currently accepting answers.

Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.

Closed 6 months ago.

Improve this question

This script takes lines from a file of lottery numbers, and is supposed to check for errors. For example, in the following:

week 1;17,19,35,23,8,20,36
week 2;24,28,35,8,3,22
week x;23,29,38,1,35,18,25
week 4;21,2,22,14,4,28,38
week 5;5,37,20,15,3,14,9
week 6;6,29,7,14,16,18,1
week 7;24,31,14,23,4,3,29
week 8;32,21,26,1,15aa,14,17
week 9;8,13,25,12,33,34,35
week 10;29,27,30,13,7,38,26
week 11;34,3,7,24,16,20,38
week 12;15,28,2,29,16,10,8
week 13;32,22,13,14,21,28,26
week 14;37,4,20,3,1,33,10
week 1a5;17,8,38,18,9,32,25

Weeks 3 and 15 should be removed for the errors in week number.

I used an integer check to do this, and that worked with every week except week 3. I added a "print True" step, and it's not printed, but then also not removed. I asked AI, which said it was because it didn't bring up an error, but it seems to, and I get the same error result for integer checks with 'x' and '1a5'.

def filter_incorrect():
    with open("lottery_numbers.csv") as sourcefile:
        entries = []
        for line in sourcefile:
            line = line.replace("\n","")
            line = line.replace(";",",")
            parts = line.split(",")
            entries.append(parts)
        for result in entries:
            
            try:
                int(result[0].lstrip("week ")) == result[0].lstrip("week ")
                print(result[0] + " True")
                pass
            except:
                entries.remove(result)
                print (len(entries))
            try:
                for i in range(1,len(result)):
                    int(result[i]) == result[i]
                    pass
            except:
                entries.remove(result)
            
            if len(result) != 8:
                entries.remove(result)
            try:
                for i in range(1,len(result)):
                    if int(result[i]) > 39 or int(result[i]) < 1:
                        entries.remove(result)
            except:
                pass
            for i in range(1,len(result)):
                for j in range(1,len(result)):
                    if result[j] == result[i] and i != j:
                        entries.remove(result)
        
        print(len(entries))
        with open("correct_numbers.csv","w") as resultsfile:
            for entry in entries:
                resultsfile.write(f"{entry[0]};{entry[1]},{entry[2]},{entry[3]},{entry[4]},{entry[5]},{entry[6]},{entry[7]}\n")

I included the whole thing but it's just lines 12 to 18 that I think I'm looking at now. I think AI gave me a different way to do it, but I'd also like to know what's wrong with this. Actually it misses some other errors, and I'm struggling to learn from my mistakes.

Looks like week 2 is also invalid as it only has 6 numbers whereas 7 seems to be prerequisite — jackal
– jackal, Commented Aug 13, 2025 at 11:26

Dmitry543 · Accepted Answer · 2025-08-13 10:51:29Z

You’re hitting two separate issues:

You’re removing from the list you’re iterating. Doing for result in entries: and then entries.remove(result) will skip elements unpredictably. That’s exactly why the bad “week x” line sometimes survives.
Your week parsing/check is brittle:
- lstrip("week ") does not remove the exact prefix; it removes any of those characters until a different one appears. Use startswith and slice or split.
- int(result[0].lstrip("week ")) == result[0].lstrip("week ") compares an int to a str and its result is ignored anyway. If int(...) succeeds, that’s all you need; otherwise it will raise ValueError.
- Bare except: hides real bugs. Catch ValueError explicitly.

There are more inefficiencies:

Don’t replace(";", ","). Split once on ;, then split the right side on ,.
Convert numbers to int once and reuse.
Don’t repeatedly call entries.remove(result) throughout; validate then keep or drop.

A compact, correct rewrite that keeps only valid rows:

def filter_incorrect(src="lottery_numbers.csv", dst="correct_numbers.csv"):
    good = []
    with open(src) as f:
        for line in f:
            line = line.strip()
            if not line or ";" not in line:
                continue

            left, right = line.split(";", 1)

            # week prefix and numeric week id
            prefix = "week "
            if not left.startswith(prefix):
                continue
            wk_str = left[len(prefix):]
            if not wk_str.isdigit():
                continue  # rejects 'x', '1a5', etc.

            nums = right.split(",")
            if len(nums) != 7:
                continue

            try:
                xs = [int(n) for n in nums]
            except ValueError:
                continue  # rejects non-integers like '15aa'

            if any(x < 1 or x > 39 for x in xs):
                continue
            if len(set(xs)) != 7:
                continue  # duplicates

            good.append((left, xs))

    with open(dst, "w") as out:
        for left, xs in good:
            out.write(f"{left};{','.join(map(str, xs))}\n")

If you want to minimally patch your current approach without the full rewrite:

iterate over a copy: for result in entries[:]
replace lstrip("week ") with:

if not result[0].startswith("week "):
    entries.remove(result); continue
wk = result[0].split(" ", 1)[1]
try:
    int(wk)
except ValueError:
    entries.remove(result); continue

But the first rewrite is simpler and avoids in-loop mutation altogether.

jackal · Accepted Answer · 2025-08-13 16:44:13Z

You're over-complicating this.

You should start by splitting each line on semicolon. You expect 2 tokens. The first should be a string of the form "week N". Check that it starts with "week" and that N is a valid integer.

The second token should be a string of 7 numbers separated by comma. Split that string (on comma) and check that all resulting tokens are integers and that they are all different.

You should also probably check that the numbers are all within a certain range (not implemented here as I have no idea what that range should be).

IN = "lottery_numbers.csv"
OUT = "correct_numbers.csv"

with open(IN) as source, open(OUT, "w") as target:
    for line in map(str.rstrip, source):
        try:
            week, numbers = line.split(";") # split on semicolon
            w, n = week.split() # split on whitespace
            if w == "week" and n.isdecimal():
                ns = numbers.split(",") # split on comma
                if len(set(map(int, ns))) == 7:
                    print(line, file=target)
        except ValueError:
            # This exception could arise for one of 2 reasons in this case
            # 1. A string split may not result in the expected number of tokens (unpacking)
            # 2. A string may not convert to int
            pass

Note use of str.isdecimal rather than str.isdigit. This is deliberate because if n.isdigit() is True, n cannot necessarily be converted to a Python int

Collectives™ on Stack Overflow

In this python script that checks for integer values, why is one entry not removed? It's the letter x. Works for every other line [closed]

2 Answers 2

1 Comment

Comments

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Related