50

Considering a pandas dataframe in python having a column named time of type integer, I can convert it to a datetime format with the following instruction.

df['time'] = pandas.to_datetime(df['time'], unit='s')

so now the column has entries like: 2019-01-15 13:25:43.

What is the command to revert the string to an integer timestamp value (representing the number of seconds elapsed from 1970-01-01 00:00:00)?

I checked pandas.Timestamp but could not find a conversion utility and I was not able to use pandas.to_timedelta for this.

Is there any utility for this conversion?

1

5 Answers 5

47

You can typecast to int using astype(int) and divide it by 10**9 to get the number of seconds to the unix epoch start.

import pandas as pd
df = pd.DataFrame({'time': [pd.to_datetime('2019-01-15 13:25:43')]})
df_unix_sec = pd.to_datetime(df['time']).astype(int)/ 10**9
print(df_unix_sec)
Sign up to request clarification or add additional context in comments.

9 Comments

This would be fantastic but it's not giving the expected result: I tried the following lines: df = pd.DataFrame({'time': [pd.to_datetime('2019-01-15 13:25:43')]}) df['time'] = pandas.to_datetime(df['time'], unit='s',origin='unix') It is not returning any error but I cannot see any change in the column
Psst, casting to int is in my answer ;-)
@FrancescoBoi actually initially I misunderstood the to_datetime parameters. Have a look I also asked a question on SO here stackoverflow.com/questions/54313463/…. So if you cast it to int then it'll work for you :)
Well, if you can just add you need to divide by 10 ** 9 to get a nix timestamp, I'll just delete my answer then.
Since I was getting a float type after dividing by 10**9 in my opinion is better to add another cast: res = (pd.to_datetime(df['time'], unit='s').astype(int)/10**9).astype(int)
|
41

The easiest and fastest way is to use .view(int):

df['time'] = df['time'].view(int)//1e9

Other options:

df['time'] = df['time'].apply(lambda x: x.value)//1e9
df['time'] = df['time'].astype(int)//1e9

Using %%timeit on 1000 dates I measured:

  • .view: 119 µs ± 998 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  • .astype: 129 µs ± 676 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
  • .apply: 629 µs ± 5.38 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

 

5 Comments

.astype(int) was faster for my dask.compute(): df[col] = df[col].apply(lambda x: x.value, meta=(col, int)) # 1.0s vs df[col] = df[col].astype(int) # 0.8s
@WestonA.Greene for such a small time difference, you'd want to run each command over several iterations and calculate the mean and standard error for each estimate. Or test on a much larger set of data.
@DryLabRebel exactly. You can check this using %%timeit on a notebook. I will share the results in the response.
I just updated my answer with the best option and measuring run-times. I actually came with a faster solution than .astype!
Thank you DryLabRebel and Ignacio! If you have the time, Ignacio, including in your post the full %%timeit code would help other beginners like me more quickly perform testing in future speed related questions.
9

Use .dt.total_seconds() on a timedelta64:

import pandas as pd
df = pd.DataFrame({'time': [pd.to_datetime('2019-01-15 13:25:43')]})

# pd.to_timedelta(df.time).dt.total_seconds() # Is deprecated
(df.time - pd.to_datetime('1970-01-01')).dt.total_seconds()

Output

0    1.547559e+09
Name: time, dtype: float64

Comments

9

One can also use .view(...):

import pandas as pd
df = pd.DataFrame({'time': [pd.to_datetime('2019-01-15 13:25:43')]})
df_unix_sec = pd.to_datetime(df['time']).view(int) // 10 ** 9
print(df_unix_sec)

Casting with .astype(int), recommended above, is deprecated in pandas 1.3.0, and throws a warning:

FutureWarning: casting datetime64[ns] values to int64 with .astype(...) is deprecated and will raise in a future version. Use .view(...) instead.

Comments

3

As @Ignacio recommends, this is what I am using to cast to integer:

df['time'] = df['time'].apply(lambda x: x.value)

Then, to get it back:

df['time'] = df['time'].apply(pd.Timestamp)

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.