2

I have Python code in which I am reading data from MongoDB and insert it into a table in Oracle database. I have removed some parts of the code (importing libraries, connecting to MongoDB and Oracle, etc.) for the sake of simplicity:

from_date_sms_journals = " select nvl(max(s.creation_date_ms),1710793800939)\
                           from sms_journals s "
oracle_cursor.execute(from_date_sms_journals)
for row in oracle_cursor.fetchall():
    from_date = int(row[1])


# today
to_date_sms_journals = 1764102600000
oracle_cursor.execute(to_date_sms_journals)
for row in oracle_cursor.fetchall():
    to_date = int(row[0])


fields = {"_id": 1,
          "body": 1,
          "recipient": 1,
          }


sms_insert_query = """ insert /*+ APPEND */ into sms_journals \
                      (S_ID,S_BODY,RECIPIENT,gd_creation_date)\
              values (:1, :2, :3)"""


batch_size = 10000
counter = 0


print(f'-----------------------------New 2 - Insert Data to Oracle : START at {update}')
sms_journals_query = mycol_act.find({"date": {"$gt": from_date, "$lte": to_date}}, projection=fields,batch_size=batch_size)


rows_to_insert = []
for doc in sms_journals_query:
    s_id = str(doc["_id"])
    s_body = str(doc.get("body", ''))
    recipient = str(doc.get("recipient"))
    gd_creation_date = str(datetime.datetime.fromtimestamp(float(doc["date"]) / 1000).astimezone(pytz.timezone('Asia/Tehran')).strftime('%d-%b-%y'))
    rows_to_insert.append((s_id, s_body, recipient,gd_creation_date ))
 
    counter += 1
    if counter % batch_size == 0:
        oracle_cursor.executemany(sms_insert_query, rows_to_insert)
        oracleDB.commit()
        rows_to_insert = []
if rows_to_insert:
    oracle_cursor.executemany(sms_insert_query, rows_to_insert)
    oracleDB.commit()
print("Insert data finished!")

The code works correctly and insert data from date > from_date to date <= to_date and there are no duplicates and the count of records match the source table. The weird thing is, it seems like this code inserts data from to_date to from_date, by which I mean it starts from last date (to_date). When I check the destination table in Oracle (during the execution of the code with select max(gd_creation_date) from table), the first date that is inserted is not the 'from_date', it is 'to_date'. Imagine the start is 2025/01/01 and the end is 2025/01/15. Then the code works in reverse order. The first date that is being inserted is 2025/01/15.

What is happening?

2
  • The order should not matter, since rows in a table in oracle are not inherently ordered. When data is selected from the table, the ORDER BY clause in the SELECT statement determines in what order the data is represented. If no order by clause is set, the order is indeterminate Commented Nov 26 at 13:53
  • @KoenLostrie, the table is empty at first. During the execution, I select the table select max(date) from table and the result is the last date, it must show the first date and then increments but I see the to_date ! Commented Nov 26 at 14:01

2 Answers 2

2

As mentioned, like you need an ORDER BY to guarantee sorted results, you need a .sort() in MongoDB if you want guaranteed results.

Now, if you don't need specific ordering and you just wonder why, I guess you have an index on {from_date: -1}. In the execution plan you probably see something like:

direction: 'forward', indexBounds: { from_date: [ '[inf, 1710793800939)' ] }

If you add a sort ({from_date: 1}), you will see:

direction: 'backward', indexBounds: { from_date: [ '(1710793800939, inf]' ] }

The performance of forward scan is slightly better, so that's who the query planner chooses that.

Additional note: you are using the /*+ APPEND */ hint to insert into Oracle. I think this hint applies only to the INSERT ... SELECT statements. If you want to do that here, you should use /*+ APPEND_VALUES */ but be careful that it locks the table - may be ok if there's no concurrent inserts - and may waste some space

Sign up to request clarification or add additional context in comments.

1 Comment

Awesome, I did not know that. I just learned a new thing. As you said I got the plan and I saw exactly this : "indexBounds": { "date": [ "(1761424200000.0, 1761165000000.0]" ] which shows that the index is in reverse order and the reason may be because the most recent records (documents) are more important than the older.
2

As mentioned in the comments, when you want the records in a specific order, use an ORDER BY clause. This also applies to MongoDB. Your find query to get the records in the required date range does not specify the sort order:

mycol_act.find({"date": {"$gt": from_date, "$lte": to_date}}, projection=fields,batch_size=batch_size)

So the correct records (documents) get selected but the order of records is not guaranteed (random or as stored on disk).

Add a sort= clause in find or use sort on the cursor:

# sort clause in find
sms_journals_query = mycol_act.find(
    {"date": {"$gt": from_date, "$lte": to_date}},
    projection=fields,
    batch_size=batch_size,
    sort={"date": 1},  # or `{"date": pymongo.ASCENDING}`
)

# sort on cursor
sms_journals_query = mycol_act.find(
    {"date": {"$gt": from_date, "$lte": to_date}},
    projection=fields,
    batch_size=batch_size,
).sort({"date": 1})

Ascending order (or 1) for dates means oldest first (2025/01/01 before 2025/01/15).

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.