Century handling in Pandas Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersHow to populate Pandas datetime64[ns] values into MS AccessType of Series (Pandas) in to_datetime() method
Putting Ant-Man on house arrest
How do I deal with an erroneously large refund?
How is an IPA symbol that lacks a name (e.g. ɲ) called?
Would I be safe to drive a 23 year old truck for 7 hours / 450 miles?
What is the ongoing value of the Kanban board to the developers as opposed to management
Why did Israel vote against lifting the American embargo on Cuba?
Is Bran literally the world's memory?
Trying to enter the Fox's den
Is it OK if I do not take the receipt in Germany?
Unix AIX passing variable and arguments to expect and spawn
How to get a single big right brace?
What is the evidence that custom checks in Northern Ireland are going to result in violence?
Does traveling In The United States require a passport or can I use my green card if not a US citizen?
Does GDPR cover the collection of data by websites that crawl the web and resell user data
How to break 信じようとしていただけかも知れない into separate parts?
“Since the train was delayed for more than an hour, passengers were given a full refund.” – Why is there no article before “passengers”?
Can I ask an author to send me his ebook?
Does Prince Arnaud cause someone holding the Princess to lose?
Recursive calls to a function - why is the address of the parameter passed to it lowering with each call?
A German immigrant ancestor has a "Registration Affidavit of Alien Enemy" on file. What does that mean exactly?
Meaning of "Not holding on that level of emuna/bitachon"
When speaking, how do you change your mind mid-sentence?
Is there a verb for listening stealthily?
Why these surprising proportionalities of integrals involving odd zeta values?
Century handling in Pandas
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersHow to populate Pandas datetime64[ns] values into MS AccessType of Series (Pandas) in to_datetime() method
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I have following data in one of my columns:
df['DOB']
0 01-01-84
1 31-07-85
2 24-08-85
3 30-12-93
4 09-12-77
5 08-09-90
6 01-06-88
7 04-10-89
8 15-11-91
9 01-06-68
Name: DOB, dtype: object
I want to convert this to a datatype column.
I tried following:
print(pd.to_datetime(df1['Date.of.Birth']))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 2068-01-06
Name: DOB, dtype: datetime64[ns]
What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance
python pandas
add a comment |
I have following data in one of my columns:
df['DOB']
0 01-01-84
1 31-07-85
2 24-08-85
3 30-12-93
4 09-12-77
5 08-09-90
6 01-06-88
7 04-10-89
8 15-11-91
9 01-06-68
Name: DOB, dtype: object
I want to convert this to a datatype column.
I tried following:
print(pd.to_datetime(df1['Date.of.Birth']))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 2068-01-06
Name: DOB, dtype: datetime64[ns]
What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance
python pandas
add a comment |
I have following data in one of my columns:
df['DOB']
0 01-01-84
1 31-07-85
2 24-08-85
3 30-12-93
4 09-12-77
5 08-09-90
6 01-06-88
7 04-10-89
8 15-11-91
9 01-06-68
Name: DOB, dtype: object
I want to convert this to a datatype column.
I tried following:
print(pd.to_datetime(df1['Date.of.Birth']))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 2068-01-06
Name: DOB, dtype: datetime64[ns]
What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance
python pandas
I have following data in one of my columns:
df['DOB']
0 01-01-84
1 31-07-85
2 24-08-85
3 30-12-93
4 09-12-77
5 08-09-90
6 01-06-88
7 04-10-89
8 15-11-91
9 01-06-68
Name: DOB, dtype: object
I want to convert this to a datatype column.
I tried following:
print(pd.to_datetime(df1['Date.of.Birth']))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 2068-01-06
Name: DOB, dtype: datetime64[ns]
What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance
python pandas
python pandas
asked Apr 18 at 5:46
MadanMadan
5514
5514
add a comment |
add a comment |
5 Answers
5
active
oldest
votes
In this specific case, I would use this:
pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])
Note that this will break if you have DOBs after 1999!
Output:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
Getting error series not defined. Hope that was a typo and have to use column name.
– Madan
Apr 18 at 6:27
@Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.
– gmds
Apr 18 at 6:36
@jezrael Yup, will edit question to specify that clearly
– gmds
Apr 18 at 6:38
Thanks @jezrael. I will not get dates with year > 1999 in my file.
– Madan
Apr 18 at 6:38
add a comment |
You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:
df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
#same like
#mask = df['DOB'].dt.year >= 2020
#df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.
Notice: Solution working also for years 00 for 2000, up to 2020.
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
mask = df['DOB'].str[-2:].astype(int) <= 20
df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
If all years are below 2000:
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
– Madan
Apr 18 at 6:25
@Madan - first convert values to datetimes and then if some years is higher as2020subtract 100 years withdateoffset
– jezrael
Apr 18 at 6:27
add a comment |
Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:
from datetime import datetime, date
df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))
add a comment |
In general (in case of uncertainty), it would be better to explicitly specify the year:
pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
I ran this with the following data frame:
0 1
0 0 01-01-84
1 1 31-07-85
2 2 24-08-85
3 3 30-12-93
4 4 09-12-77
5 5 08-09-90
6 6 01-06-88
7 7 04-10-89
8 8 15-11-91
9 9 01-06-68
pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
Name: 1, dtype: datetime64[ns]
add a comment |
You can use the code below if there are only 19 and 20 as starts, like:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))
And if there are no 20s anywhere else:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))
And now:
print(df['DOB'])
Is:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55739779%2fcentury-handling-in-pandas%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
In this specific case, I would use this:
pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])
Note that this will break if you have DOBs after 1999!
Output:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
Getting error series not defined. Hope that was a typo and have to use column name.
– Madan
Apr 18 at 6:27
@Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.
– gmds
Apr 18 at 6:36
@jezrael Yup, will edit question to specify that clearly
– gmds
Apr 18 at 6:38
Thanks @jezrael. I will not get dates with year > 1999 in my file.
– Madan
Apr 18 at 6:38
add a comment |
In this specific case, I would use this:
pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])
Note that this will break if you have DOBs after 1999!
Output:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
Getting error series not defined. Hope that was a typo and have to use column name.
– Madan
Apr 18 at 6:27
@Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.
– gmds
Apr 18 at 6:36
@jezrael Yup, will edit question to specify that clearly
– gmds
Apr 18 at 6:38
Thanks @jezrael. I will not get dates with year > 1999 in my file.
– Madan
Apr 18 at 6:38
add a comment |
In this specific case, I would use this:
pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])
Note that this will break if you have DOBs after 1999!
Output:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
In this specific case, I would use this:
pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])
Note that this will break if you have DOBs after 1999!
Output:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
edited Apr 18 at 6:39
answered Apr 18 at 6:12
gmdsgmds
7,371832
7,371832
Getting error series not defined. Hope that was a typo and have to use column name.
– Madan
Apr 18 at 6:27
@Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.
– gmds
Apr 18 at 6:36
@jezrael Yup, will edit question to specify that clearly
– gmds
Apr 18 at 6:38
Thanks @jezrael. I will not get dates with year > 1999 in my file.
– Madan
Apr 18 at 6:38
add a comment |
Getting error series not defined. Hope that was a typo and have to use column name.
– Madan
Apr 18 at 6:27
@Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.
– gmds
Apr 18 at 6:36
@jezrael Yup, will edit question to specify that clearly
– gmds
Apr 18 at 6:38
Thanks @jezrael. I will not get dates with year > 1999 in my file.
– Madan
Apr 18 at 6:38
Getting error series not defined. Hope that was a typo and have to use column name.
– Madan
Apr 18 at 6:27
Getting error series not defined. Hope that was a typo and have to use column name.
– Madan
Apr 18 at 6:27
@Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.
– gmds
Apr 18 at 6:36
@Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.
– gmds
Apr 18 at 6:36
@jezrael Yup, will edit question to specify that clearly
– gmds
Apr 18 at 6:38
@jezrael Yup, will edit question to specify that clearly
– gmds
Apr 18 at 6:38
Thanks @jezrael. I will not get dates with year > 1999 in my file.
– Madan
Apr 18 at 6:38
Thanks @jezrael. I will not get dates with year > 1999 in my file.
– Madan
Apr 18 at 6:38
add a comment |
You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:
df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
#same like
#mask = df['DOB'].dt.year >= 2020
#df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.
Notice: Solution working also for years 00 for 2000, up to 2020.
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
mask = df['DOB'].str[-2:].astype(int) <= 20
df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
If all years are below 2000:
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
– Madan
Apr 18 at 6:25
@Madan - first convert values to datetimes and then if some years is higher as2020subtract 100 years withdateoffset
– jezrael
Apr 18 at 6:27
add a comment |
You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:
df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
#same like
#mask = df['DOB'].dt.year >= 2020
#df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.
Notice: Solution working also for years 00 for 2000, up to 2020.
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
mask = df['DOB'].str[-2:].astype(int) <= 20
df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
If all years are below 2000:
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
– Madan
Apr 18 at 6:25
@Madan - first convert values to datetimes and then if some years is higher as2020subtract 100 years withdateoffset
– jezrael
Apr 18 at 6:27
add a comment |
You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:
df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
#same like
#mask = df['DOB'].dt.year >= 2020
#df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.
Notice: Solution working also for years 00 for 2000, up to 2020.
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
mask = df['DOB'].str[-2:].astype(int) <= 20
df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
If all years are below 2000:
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:
df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
#same like
#mask = df['DOB'].dt.year >= 2020
#df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.
Notice: Solution working also for years 00 for 2000, up to 2020.
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
mask = df['DOB'].str[-2:].astype(int) <= 20
df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
If all years are below 2000:
s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
print (df)
DOB
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-12-09
5 1990-09-08
6 1988-06-01
7 1989-10-04
8 1991-11-15
9 1968-06-01
edited Apr 18 at 6:28
answered Apr 18 at 5:48
jezraeljezrael
362k26327409
362k26327409
Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
– Madan
Apr 18 at 6:25
@Madan - first convert values to datetimes and then if some years is higher as2020subtract 100 years withdateoffset
– jezrael
Apr 18 at 6:27
add a comment |
Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
– Madan
Apr 18 at 6:25
@Madan - first convert values to datetimes and then if some years is higher as2020subtract 100 years withdateoffset
– jezrael
Apr 18 at 6:27
Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
– Madan
Apr 18 at 6:25
Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
– Madan
Apr 18 at 6:25
@Madan - first convert values to datetimes and then if some years is higher as
2020 subtract 100 years with dateoffset– jezrael
Apr 18 at 6:27
@Madan - first convert values to datetimes and then if some years is higher as
2020 subtract 100 years with dateoffset– jezrael
Apr 18 at 6:27
add a comment |
Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:
from datetime import datetime, date
df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))
add a comment |
Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:
from datetime import datetime, date
df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))
add a comment |
Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:
from datetime import datetime, date
df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))
Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:
from datetime import datetime, date
df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))
answered Apr 18 at 6:13
Itamar MushkinItamar Mushkin
315110
315110
add a comment |
add a comment |
In general (in case of uncertainty), it would be better to explicitly specify the year:
pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
I ran this with the following data frame:
0 1
0 0 01-01-84
1 1 31-07-85
2 2 24-08-85
3 3 30-12-93
4 4 09-12-77
5 5 08-09-90
6 6 01-06-88
7 7 04-10-89
8 8 15-11-91
9 9 01-06-68
pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
Name: 1, dtype: datetime64[ns]
add a comment |
In general (in case of uncertainty), it would be better to explicitly specify the year:
pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
I ran this with the following data frame:
0 1
0 0 01-01-84
1 1 31-07-85
2 2 24-08-85
3 3 30-12-93
4 4 09-12-77
5 5 08-09-90
6 6 01-06-88
7 7 04-10-89
8 8 15-11-91
9 9 01-06-68
pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
Name: 1, dtype: datetime64[ns]
add a comment |
In general (in case of uncertainty), it would be better to explicitly specify the year:
pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
I ran this with the following data frame:
0 1
0 0 01-01-84
1 1 31-07-85
2 2 24-08-85
3 3 30-12-93
4 4 09-12-77
5 5 08-09-90
6 6 01-06-88
7 7 04-10-89
8 8 15-11-91
9 9 01-06-68
pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
Name: 1, dtype: datetime64[ns]
In general (in case of uncertainty), it would be better to explicitly specify the year:
pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
I ran this with the following data frame:
0 1
0 0 01-01-84
1 1 31-07-85
2 2 24-08-85
3 3 30-12-93
4 4 09-12-77
5 5 08-09-90
6 6 01-06-88
7 7 04-10-89
8 8 15-11-91
9 9 01-06-68
pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
Name: 1, dtype: datetime64[ns]
answered Apr 18 at 5:58
bubblebubble
1,050713
1,050713
add a comment |
add a comment |
You can use the code below if there are only 19 and 20 as starts, like:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))
And if there are no 20s anywhere else:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))
And now:
print(df['DOB'])
Is:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
add a comment |
You can use the code below if there are only 19 and 20 as starts, like:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))
And if there are no 20s anywhere else:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))
And now:
print(df['DOB'])
Is:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
add a comment |
You can use the code below if there are only 19 and 20 as starts, like:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))
And if there are no 20s anywhere else:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))
And now:
print(df['DOB'])
Is:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
You can use the code below if there are only 19 and 20 as starts, like:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))
And if there are no 20s anywhere else:
df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))
And now:
print(df['DOB'])
Is:
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 1968-01-06
dtype: datetime64[ns]
answered Apr 18 at 9:27
U9-ForwardU9-Forward
18.6k51744
18.6k51744
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55739779%2fcentury-handling-in-pandas%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown