7

I retrieve some data from my MySQL database. This data has the date (not datetime) in one column and some other random data in the other columns. Let's say dtf is my dataframe. There is no index yet so I set one

dtf.set_index('date', inplace=True)

Now I would like to get data from a specific date, so I write for example:

dtf.loc['2000-01-03']

or just

dtf['2000-01-03']

This gives me a KeyError:

KeyError: '2000-01-03'

But I know it's in there; dtf.head() shows me that.
So I took a look at the type of the index of the first row:

type(dtf.index[0])

and it tells me: datetime.date. All good. Now if I just type

dtf.index

the output is

Index([2000-01-03, 2000-01-04, 2000-01-05, 2000-01-06, 2000-01-07, 2000-01-10,
       2000-01-11, 2000-01-12, 2000-01-13, 2000-01-14,
       ...
       2015-09-09, 2015-09-10, 2015-09-11, 2015-09-14, 2015-09-15, 2015-09-16,
       2015-09-17, 2015-09-18, 2015-09-21, 2015-09-22],
       dtype='object', name='date', length=2763)

I am a bit confused about the dtype='object'. Shouldn't this read datetime.date?

If I use datetime in my mysql table instead of date everything works like a charm. Is this a bug or a feature? I really would like to use datetime.date because it describes my data best.

My pandas version is 0.17.0
I am using python 3.5.0
My os is arch linux

2 Answers 2

6

You should use datetime64/Timestamp rather than datetime.datetime:

dtf.index = pd.to_datetime(dtf.index)

will mean you have a DatetimeIndex and can do nifty things like loc by strings.

dtf.loc['2000-01-03']

You won't be able to do that with datetime.datetime.

Sign up to request clarification or add additional context in comments.

2 Comments

Also, note numpy only has special support for a few dtypes, not one for every type of object. Generally these are ones where it's more efficient to pack them as a C-array of that data type, rather than with object dtype where it's a C-array of pointers to python objects (which will always be slower).
Thanks for your answer, but using datetime.datetime works perfectly well and I can do the query you tell me I can not. Furthermore I do not have a problem with datetime.datetime, but with datetime.DATE. I tried your solution but it did not change anything. I get the same KeyError :( Any other ideas?
0

When you convert df.index into dtype datetime64 using pd.to_datetime, the type of each index, in fact, becomes type datetime.datetime. You can verify:

import datetime
# sample data
df = pd.DataFrame({'A': range(5)}, index=pd.date_range('2000-01-01','2000-01-05', 5).date) 

df.index = pd.to_datetime(df.index)
isinstance(df.index[0], datetime.datetime)       # True

As Andy Hayden mentioned, once you convert the index into datetime64, you can do the sort of indexing OP wants, such as

df.loc['2000-01-03']
# or for range of dates
df.loc['2000-01-03':'2000-01-05']

Besides, null times don't render even if the dtype is datetime64, so visually, it's exactly the same.

That said, if you want to use datetime.date, you can still do so by explicitly using datetime.date. For example, to select values on 2000-01-03, you can use either loc or query:

df = pd.DataFrame({'A': range(5)}, index=pd.date_range('2000-01-01','2000-01-05', 5).date) 

df.loc[datetime.date(2000, 1, 3)]
# or
df.query("index == @datetime.date(2000, 1, 3)")

If you need to select a range of dates between dates, query is very convenient (or between works too):

date1 = datetime.date(2000, 1, 3)
date2 = datetime.date(2000, 1, 5)

df.query("@date1 <= index <= @date2")
# or
df[df.index.to_series().between(date1, date2)]

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.