Century handling in Pandas Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern) Data science time! April 2019 and salary with experience The Ask Question Wizard is Live!Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersHow to populate Pandas datetime64[ns] values into MS AccessType of Series (Pandas) in to_datetime() method

Putting Ant-Man on house arrest

How do I deal with an erroneously large refund?

How is an IPA symbol that lacks a name (e.g. ɲ) called?

Would I be safe to drive a 23 year old truck for 7 hours / 450 miles?

What is the ongoing value of the Kanban board to the developers as opposed to management

Why did Israel vote against lifting the American embargo on Cuba?

Is Bran literally the world's memory?

Trying to enter the Fox's den

Is it OK if I do not take the receipt in Germany?

Unix AIX passing variable and arguments to expect and spawn

How to get a single big right brace?

What is the evidence that custom checks in Northern Ireland are going to result in violence?

Does traveling In The United States require a passport or can I use my green card if not a US citizen?

Does GDPR cover the collection of data by websites that crawl the web and resell user data

How to break 信じようとしていただけかも知れない into separate parts?

“Since the train was delayed for more than an hour, passengers were given a full refund.” – Why is there no article before “passengers”?

Can I ask an author to send me his ebook?

Does Prince Arnaud cause someone holding the Princess to lose?

Recursive calls to a function - why is the address of the parameter passed to it lowering with each call?

A German immigrant ancestor has a "Registration Affidavit of Alien Enemy" on file. What does that mean exactly?

Meaning of "Not holding on that level of emuna/bitachon"

When speaking, how do you change your mind mid-sentence?

Is there a verb for listening stealthily?

Why these surprising proportionalities of integrals involving odd zeta values?



Century handling in Pandas



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 23, 2019 at 23:30 UTC (7:30pm US/Eastern)
Data science time! April 2019 and salary with experience
The Ask Question Wizard is Live!Selecting multiple columns in a pandas dataframeRenaming columns in pandasAdding new column to existing DataFrame in Python pandasDelete column from pandas DataFrame by column name“Large data” work flows using pandasHow to iterate over rows in a DataFrame in Pandas?Select rows from a DataFrame based on values in a column in pandasGet list from pandas DataFrame column headersHow to populate Pandas datetime64[ns] values into MS AccessType of Series (Pandas) in to_datetime() method



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








9















I have following data in one of my columns:



df['DOB']

0 01-01-84
1 31-07-85
2 24-08-85
3 30-12-93
4 09-12-77
5 08-09-90
6 01-06-88
7 04-10-89
8 15-11-91
9 01-06-68
Name: DOB, dtype: object


I want to convert this to a datatype column.
I tried following:



print(pd.to_datetime(df1['Date.of.Birth']))
0 1984-01-01
1 1985-07-31
2 1985-08-24
3 1993-12-30
4 1977-09-12
5 1990-08-09
6 1988-01-06
7 1989-04-10
8 1991-11-15
9 2068-01-06
Name: DOB, dtype: datetime64[ns]


What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance










share|improve this question




























    9















    I have following data in one of my columns:



    df['DOB']

    0 01-01-84
    1 31-07-85
    2 24-08-85
    3 30-12-93
    4 09-12-77
    5 08-09-90
    6 01-06-88
    7 04-10-89
    8 15-11-91
    9 01-06-68
    Name: DOB, dtype: object


    I want to convert this to a datatype column.
    I tried following:



    print(pd.to_datetime(df1['Date.of.Birth']))
    0 1984-01-01
    1 1985-07-31
    2 1985-08-24
    3 1993-12-30
    4 1977-09-12
    5 1990-08-09
    6 1988-01-06
    7 1989-04-10
    8 1991-11-15
    9 2068-01-06
    Name: DOB, dtype: datetime64[ns]


    What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance










    share|improve this question
























      9












      9








      9








      I have following data in one of my columns:



      df['DOB']

      0 01-01-84
      1 31-07-85
      2 24-08-85
      3 30-12-93
      4 09-12-77
      5 08-09-90
      6 01-06-88
      7 04-10-89
      8 15-11-91
      9 01-06-68
      Name: DOB, dtype: object


      I want to convert this to a datatype column.
      I tried following:



      print(pd.to_datetime(df1['Date.of.Birth']))
      0 1984-01-01
      1 1985-07-31
      2 1985-08-24
      3 1993-12-30
      4 1977-09-12
      5 1990-08-09
      6 1988-01-06
      7 1989-04-10
      8 1991-11-15
      9 2068-01-06
      Name: DOB, dtype: datetime64[ns]


      What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance










      share|improve this question














      I have following data in one of my columns:



      df['DOB']

      0 01-01-84
      1 31-07-85
      2 24-08-85
      3 30-12-93
      4 09-12-77
      5 08-09-90
      6 01-06-88
      7 04-10-89
      8 15-11-91
      9 01-06-68
      Name: DOB, dtype: object


      I want to convert this to a datatype column.
      I tried following:



      print(pd.to_datetime(df1['Date.of.Birth']))
      0 1984-01-01
      1 1985-07-31
      2 1985-08-24
      3 1993-12-30
      4 1977-09-12
      5 1990-08-09
      6 1988-01-06
      7 1989-04-10
      8 1991-11-15
      9 2068-01-06
      Name: DOB, dtype: datetime64[ns]


      What should be my code change to get the date as 1968-01-06 instead of 2068-01-06? Thanks in advance







      python pandas






      share|improve this question













      share|improve this question











      share|improve this question




      share|improve this question










      asked Apr 18 at 5:46









      MadanMadan

      5514




      5514






















          5 Answers
          5






          active

          oldest

          votes


















          4














          In this specific case, I would use this:



          pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])


          Note that this will break if you have DOBs after 1999!



          Output:



          0 1984-01-01
          1 1985-07-31
          2 1985-08-24
          3 1993-12-30
          4 1977-09-12
          5 1990-08-09
          6 1988-01-06
          7 1989-04-10
          8 1991-11-15
          9 1968-01-06
          dtype: datetime64[ns]





          share|improve this answer

























          • Getting error series not defined. Hope that was a typo and have to use column name.

            – Madan
            Apr 18 at 6:27











          • @Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.

            – gmds
            Apr 18 at 6:36











          • @jezrael Yup, will edit question to specify that clearly

            – gmds
            Apr 18 at 6:38











          • Thanks @jezrael. I will not get dates with year > 1999 in my file.

            – Madan
            Apr 18 at 6:38


















          4














          You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:



          df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
          df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
          #same like
          #mask = df['DOB'].dt.year >= 2020
          #df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
          print (df)
          DOB
          0 1984-01-01
          1 1985-07-31
          2 1985-08-24
          3 1993-12-30
          4 1977-12-09
          5 1990-09-08
          6 1988-06-01
          7 1989-10-04
          8 1991-11-15
          9 1968-06-01



          Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.



          Notice: Solution working also for years 00 for 2000, up to 2020.



          s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
          s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
          mask = df['DOB'].str[-2:].astype(int) <= 20
          df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))

          print (df)
          DOB
          0 1984-01-01
          1 1985-07-31
          2 1985-08-24
          3 1993-12-30
          4 1977-09-12
          5 1990-08-09
          6 1988-01-06
          7 1989-04-10
          8 1991-11-15
          9 1968-01-06



          If all years are below 2000:



          s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
          df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
          print (df)
          DOB
          0 1984-01-01
          1 1985-07-31
          2 1985-08-24
          3 1993-12-30
          4 1977-12-09
          5 1990-09-08
          6 1988-06-01
          7 1989-10-04
          8 1991-11-15
          9 1968-06-01





          share|improve this answer

























          • Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)

            – Madan
            Apr 18 at 6:25












          • @Madan - first convert values to datetimes and then if some years is higher as 2020 subtract 100 years with dateoffset

            – jezrael
            Apr 18 at 6:27


















          1














          Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:



          from datetime import datetime, date

          df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
          df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
          df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))





          share|improve this answer






























            0














            In general (in case of uncertainty), it would be better to explicitly specify the year:



            pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


            I ran this with the following data frame:



             0 1
            0 0 01-01-84
            1 1 31-07-85
            2 2 24-08-85
            3 3 30-12-93
            4 4 09-12-77
            5 5 08-09-90
            6 6 01-06-88
            7 7 04-10-89
            8 8 15-11-91
            9 9 01-06-68


            pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


            0 1984-01-01
            1 1985-07-31
            2 1985-08-24
            3 1993-12-30
            4 1977-09-12
            5 1990-08-09
            6 1988-01-06
            7 1989-04-10
            8 1991-11-15
            9 1968-01-06
            Name: 1, dtype: datetime64[ns]





            share|improve this answer






























              0














              You can use the code below if there are only 19 and 20 as starts, like:



              df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))


              And if there are no 20s anywhere else:



              df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))


              And now:



              print(df['DOB'])


              Is:



              0 1984-01-01
              1 1985-07-31
              2 1985-08-24
              3 1993-12-30
              4 1977-09-12
              5 1990-08-09
              6 1988-01-06
              7 1989-04-10
              8 1991-11-15
              9 1968-01-06
              dtype: datetime64[ns]





              share|improve this answer























                Your Answer






                StackExchange.ifUsing("editor", function ()
                StackExchange.using("externalEditor", function ()
                StackExchange.using("snippets", function ()
                StackExchange.snippets.init();
                );
                );
                , "code-snippets");

                StackExchange.ready(function()
                var channelOptions =
                tags: "".split(" "),
                id: "1"
                ;
                initTagRenderer("".split(" "), "".split(" "), channelOptions);

                StackExchange.using("externalEditor", function()
                // Have to fire editor after snippets, if snippets enabled
                if (StackExchange.settings.snippets.snippetsEnabled)
                StackExchange.using("snippets", function()
                createEditor();
                );

                else
                createEditor();

                );

                function createEditor()
                StackExchange.prepareEditor(
                heartbeatType: 'answer',
                autoActivateHeartbeat: false,
                convertImagesToLinks: true,
                noModals: true,
                showLowRepImageUploadWarning: true,
                reputationToPostImages: 10,
                bindNavPrevention: true,
                postfix: "",
                imageUploader:
                brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
                contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
                allowUrls: true
                ,
                onDemand: true,
                discardSelector: ".discard-answer"
                ,immediatelyShowMarkdownHelp:true
                );



                );













                draft saved

                draft discarded


















                StackExchange.ready(
                function ()
                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55739779%2fcentury-handling-in-pandas%23new-answer', 'question_page');

                );

                Post as a guest















                Required, but never shown

























                5 Answers
                5






                active

                oldest

                votes








                5 Answers
                5






                active

                oldest

                votes









                active

                oldest

                votes






                active

                oldest

                votes









                4














                In this specific case, I would use this:



                pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])


                Note that this will break if you have DOBs after 1999!



                Output:



                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06
                dtype: datetime64[ns]





                share|improve this answer

























                • Getting error series not defined. Hope that was a typo and have to use column name.

                  – Madan
                  Apr 18 at 6:27











                • @Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.

                  – gmds
                  Apr 18 at 6:36











                • @jezrael Yup, will edit question to specify that clearly

                  – gmds
                  Apr 18 at 6:38











                • Thanks @jezrael. I will not get dates with year > 1999 in my file.

                  – Madan
                  Apr 18 at 6:38















                4














                In this specific case, I would use this:



                pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])


                Note that this will break if you have DOBs after 1999!



                Output:



                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06
                dtype: datetime64[ns]





                share|improve this answer

























                • Getting error series not defined. Hope that was a typo and have to use column name.

                  – Madan
                  Apr 18 at 6:27











                • @Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.

                  – gmds
                  Apr 18 at 6:36











                • @jezrael Yup, will edit question to specify that clearly

                  – gmds
                  Apr 18 at 6:38











                • Thanks @jezrael. I will not get dates with year > 1999 in my file.

                  – Madan
                  Apr 18 at 6:38













                4












                4








                4







                In this specific case, I would use this:



                pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])


                Note that this will break if you have DOBs after 1999!



                Output:



                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06
                dtype: datetime64[ns]





                share|improve this answer















                In this specific case, I would use this:



                pd.to_datetime(df['DOB'].str[:-2] + '19' + df['DOB'].str[-2:])


                Note that this will break if you have DOBs after 1999!



                Output:



                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06
                dtype: datetime64[ns]






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Apr 18 at 6:39

























                answered Apr 18 at 6:12









                gmdsgmds

                7,371832




                7,371832












                • Getting error series not defined. Hope that was a typo and have to use column name.

                  – Madan
                  Apr 18 at 6:27











                • @Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.

                  – gmds
                  Apr 18 at 6:36











                • @jezrael Yup, will edit question to specify that clearly

                  – gmds
                  Apr 18 at 6:38











                • Thanks @jezrael. I will not get dates with year > 1999 in my file.

                  – Madan
                  Apr 18 at 6:38

















                • Getting error series not defined. Hope that was a typo and have to use column name.

                  – Madan
                  Apr 18 at 6:27











                • @Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.

                  – gmds
                  Apr 18 at 6:36











                • @jezrael Yup, will edit question to specify that clearly

                  – gmds
                  Apr 18 at 6:38











                • Thanks @jezrael. I will not get dates with year > 1999 in my file.

                  – Madan
                  Apr 18 at 6:38
















                Getting error series not defined. Hope that was a typo and have to use column name.

                – Madan
                Apr 18 at 6:27





                Getting error series not defined. Hope that was a typo and have to use column name.

                – Madan
                Apr 18 at 6:27













                @Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.

                – gmds
                Apr 18 at 6:36





                @Madan Yup, I wanted to change my answer to fit the question and forgot to modify the second reference. Fixed.

                – gmds
                Apr 18 at 6:36













                @jezrael Yup, will edit question to specify that clearly

                – gmds
                Apr 18 at 6:38





                @jezrael Yup, will edit question to specify that clearly

                – gmds
                Apr 18 at 6:38













                Thanks @jezrael. I will not get dates with year > 1999 in my file.

                – Madan
                Apr 18 at 6:38





                Thanks @jezrael. I will not get dates with year > 1999 in my file.

                – Madan
                Apr 18 at 6:38













                4














                You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:



                df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
                df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
                #same like
                #mask = df['DOB'].dt.year >= 2020
                #df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01



                Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.



                Notice: Solution working also for years 00 for 2000, up to 2020.



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
                mask = df['DOB'].str[-2:].astype(int) <= 20
                df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))

                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06



                If all years are below 2000:



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01





                share|improve this answer

























                • Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)

                  – Madan
                  Apr 18 at 6:25












                • @Madan - first convert values to datetimes and then if some years is higher as 2020 subtract 100 years with dateoffset

                  – jezrael
                  Apr 18 at 6:27















                4














                You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:



                df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
                df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
                #same like
                #mask = df['DOB'].dt.year >= 2020
                #df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01



                Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.



                Notice: Solution working also for years 00 for 2000, up to 2020.



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
                mask = df['DOB'].str[-2:].astype(int) <= 20
                df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))

                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06



                If all years are below 2000:



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01





                share|improve this answer

























                • Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)

                  – Madan
                  Apr 18 at 6:25












                • @Madan - first convert values to datetimes and then if some years is higher as 2020 subtract 100 years with dateoffset

                  – jezrael
                  Apr 18 at 6:27













                4












                4








                4







                You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:



                df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
                df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
                #same like
                #mask = df['DOB'].dt.year >= 2020
                #df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01



                Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.



                Notice: Solution working also for years 00 for 2000, up to 2020.



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
                mask = df['DOB'].str[-2:].astype(int) <= 20
                df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))

                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06



                If all years are below 2000:



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01





                share|improve this answer















                You can first convert to datetimes and if years are above or equal 2020 then subtract 100 years created by DateOffset:



                df['DOB'] = pd.to_datetime(df['DOB'], format='%d-%m-%y')
                df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)
                #same like
                #mask = df['DOB'].dt.year >= 2020
                #df.loc[mask, 'DOB'] = df.loc[mask, 'DOB'] - pd.DateOffset(years=100)
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01



                Or you can add 19 or 20 to years by Series.str.replace and set valuies by numpy.where with condition.



                Notice: Solution working also for years 00 for 2000, up to 2020.



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                s2 = df['DOB'].str.replace(r'-(d+)$', r'-201')
                mask = df['DOB'].str[-2:].astype(int) <= 20
                df['DOB'] = pd.to_datetime(np.where(mask, s2, s1))

                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-09-12
                5 1990-08-09
                6 1988-01-06
                7 1989-04-10
                8 1991-11-15
                9 1968-01-06



                If all years are below 2000:



                s1 = df['DOB'].str.replace(r'-(d+)$', r'-191')
                df['DOB'] = pd.to_datetime(s1, format='%d-%m-%Y')
                print (df)
                DOB
                0 1984-01-01
                1 1985-07-31
                2 1985-08-24
                3 1993-12-30
                4 1977-12-09
                5 1990-09-08
                6 1988-06-01
                7 1989-10-04
                8 1991-11-15
                9 1968-06-01






                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Apr 18 at 6:28

























                answered Apr 18 at 5:48









                jezraeljezrael

                362k26327409




                362k26327409












                • Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)

                  – Madan
                  Apr 18 at 6:25












                • @Madan - first convert values to datetimes and then if some years is higher as 2020 subtract 100 years with dateoffset

                  – jezrael
                  Apr 18 at 6:27

















                • Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)

                  – Madan
                  Apr 18 at 6:25












                • @Madan - first convert values to datetimes and then if some years is higher as 2020 subtract 100 years with dateoffset

                  – jezrael
                  Apr 18 at 6:27
















                Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)

                – Madan
                Apr 18 at 6:25






                Can you please explain this line: df.loc[df['DOB'].dt.year >= 2020, 'DOB'] -= pd.DateOffset(years=100)

                – Madan
                Apr 18 at 6:25














                @Madan - first convert values to datetimes and then if some years is higher as 2020 subtract 100 years with dateoffset

                – jezrael
                Apr 18 at 6:27





                @Madan - first convert values to datetimes and then if some years is higher as 2020 subtract 100 years with dateoffset

                – jezrael
                Apr 18 at 6:27











                1














                Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:



                from datetime import datetime, date

                df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
                df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
                df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))





                share|improve this answer



























                  1














                  Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:



                  from datetime import datetime, date

                  df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
                  df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
                  df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))





                  share|improve this answer

























                    1












                    1








                    1







                    Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:



                    from datetime import datetime, date

                    df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
                    df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
                    df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))





                    share|improve this answer













                    Another solution is to treat the DOB as a date, and take it back to the previous century only if it is in the future (i.e. after "now"). Example:



                    from datetime import datetime, date

                    df=pd.DataFrame.from_dict('DOB':['01-06-68','01-06-08'])
                    df['DOB'] = df['DOB'].apply(lambda x: datetime.strptime(x,'%d-%m-%y'))
                    df['DOB'] = df['DOB'].apply(lambda x: x if x<datetime.now() else date(x.year-100,x.month,x.day))






                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Apr 18 at 6:13









                    Itamar MushkinItamar Mushkin

                    315110




                    315110





















                        0














                        In general (in case of uncertainty), it would be better to explicitly specify the year:



                        pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                        I ran this with the following data frame:



                         0 1
                        0 0 01-01-84
                        1 1 31-07-85
                        2 2 24-08-85
                        3 3 30-12-93
                        4 4 09-12-77
                        5 5 08-09-90
                        6 6 01-06-88
                        7 7 04-10-89
                        8 8 15-11-91
                        9 9 01-06-68


                        pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                        0 1984-01-01
                        1 1985-07-31
                        2 1985-08-24
                        3 1993-12-30
                        4 1977-09-12
                        5 1990-08-09
                        6 1988-01-06
                        7 1989-04-10
                        8 1991-11-15
                        9 1968-01-06
                        Name: 1, dtype: datetime64[ns]





                        share|improve this answer



























                          0














                          In general (in case of uncertainty), it would be better to explicitly specify the year:



                          pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                          I ran this with the following data frame:



                           0 1
                          0 0 01-01-84
                          1 1 31-07-85
                          2 2 24-08-85
                          3 3 30-12-93
                          4 4 09-12-77
                          5 5 08-09-90
                          6 6 01-06-88
                          7 7 04-10-89
                          8 8 15-11-91
                          9 9 01-06-68


                          pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                          0 1984-01-01
                          1 1985-07-31
                          2 1985-08-24
                          3 1993-12-30
                          4 1977-09-12
                          5 1990-08-09
                          6 1988-01-06
                          7 1989-04-10
                          8 1991-11-15
                          9 1968-01-06
                          Name: 1, dtype: datetime64[ns]





                          share|improve this answer

























                            0












                            0








                            0







                            In general (in case of uncertainty), it would be better to explicitly specify the year:



                            pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                            I ran this with the following data frame:



                             0 1
                            0 0 01-01-84
                            1 1 31-07-85
                            2 2 24-08-85
                            3 3 30-12-93
                            4 4 09-12-77
                            5 5 08-09-90
                            6 6 01-06-88
                            7 7 04-10-89
                            8 8 15-11-91
                            9 9 01-06-68


                            pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                            0 1984-01-01
                            1 1985-07-31
                            2 1985-08-24
                            3 1993-12-30
                            4 1977-09-12
                            5 1990-08-09
                            6 1988-01-06
                            7 1989-04-10
                            8 1991-11-15
                            9 1968-01-06
                            Name: 1, dtype: datetime64[ns]





                            share|improve this answer













                            In general (in case of uncertainty), it would be better to explicitly specify the year:



                            pd.to_datetime(data['Date.of.Birth'].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                            I ran this with the following data frame:



                             0 1
                            0 0 01-01-84
                            1 1 31-07-85
                            2 2 24-08-85
                            3 3 30-12-93
                            4 4 09-12-77
                            5 5 08-09-90
                            6 6 01-06-88
                            7 7 04-10-89
                            8 8 15-11-91
                            9 9 01-06-68


                            pd.to_datetime(data[1].apply(lambda x: '-'.join(x.split('-')[:-1] + ['19' + x.split('-')[2]])))


                            0 1984-01-01
                            1 1985-07-31
                            2 1985-08-24
                            3 1993-12-30
                            4 1977-09-12
                            5 1990-08-09
                            6 1988-01-06
                            7 1989-04-10
                            8 1991-11-15
                            9 1968-01-06
                            Name: 1, dtype: datetime64[ns]






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Apr 18 at 5:58









                            bubblebubble

                            1,050713




                            1,050713





















                                0














                                You can use the code below if there are only 19 and 20 as starts, like:



                                df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))


                                And if there are no 20s anywhere else:



                                df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))


                                And now:



                                print(df['DOB'])


                                Is:



                                0 1984-01-01
                                1 1985-07-31
                                2 1985-08-24
                                3 1993-12-30
                                4 1977-09-12
                                5 1990-08-09
                                6 1988-01-06
                                7 1989-04-10
                                8 1991-11-15
                                9 1968-01-06
                                dtype: datetime64[ns]





                                share|improve this answer



























                                  0














                                  You can use the code below if there are only 19 and 20 as starts, like:



                                  df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))


                                  And if there are no 20s anywhere else:



                                  df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))


                                  And now:



                                  print(df['DOB'])


                                  Is:



                                  0 1984-01-01
                                  1 1985-07-31
                                  2 1985-08-24
                                  3 1993-12-30
                                  4 1977-09-12
                                  5 1990-08-09
                                  6 1988-01-06
                                  7 1989-04-10
                                  8 1991-11-15
                                  9 1968-01-06
                                  dtype: datetime64[ns]





                                  share|improve this answer

























                                    0












                                    0








                                    0







                                    You can use the code below if there are only 19 and 20 as starts, like:



                                    df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))


                                    And if there are no 20s anywhere else:



                                    df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))


                                    And now:



                                    print(df['DOB'])


                                    Is:



                                    0 1984-01-01
                                    1 1985-07-31
                                    2 1985-08-24
                                    3 1993-12-30
                                    4 1977-09-12
                                    5 1990-08-09
                                    6 1988-01-06
                                    7 1989-04-10
                                    8 1991-11-15
                                    9 1968-01-06
                                    dtype: datetime64[ns]





                                    share|improve this answer













                                    You can use the code below if there are only 19 and 20 as starts, like:



                                    df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20([^20]*)$', '19'))


                                    And if there are no 20s anywhere else:



                                    df['DOB'] = pd.to_datetime(df['DOB'].str.replace('20', '19'))


                                    And now:



                                    print(df['DOB'])


                                    Is:



                                    0 1984-01-01
                                    1 1985-07-31
                                    2 1985-08-24
                                    3 1993-12-30
                                    4 1977-09-12
                                    5 1990-08-09
                                    6 1988-01-06
                                    7 1989-04-10
                                    8 1991-11-15
                                    9 1968-01-06
                                    dtype: datetime64[ns]






                                    share|improve this answer












                                    share|improve this answer



                                    share|improve this answer










                                    answered Apr 18 at 9:27









                                    U9-ForwardU9-Forward

                                    18.6k51744




                                    18.6k51744



























                                        draft saved

                                        draft discarded
















































                                        Thanks for contributing an answer to Stack Overflow!


                                        • Please be sure to answer the question. Provide details and share your research!

                                        But avoid


                                        • Asking for help, clarification, or responding to other answers.

                                        • Making statements based on opinion; back them up with references or personal experience.

                                        To learn more, see our tips on writing great answers.




                                        draft saved


                                        draft discarded














                                        StackExchange.ready(
                                        function ()
                                        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55739779%2fcentury-handling-in-pandas%23new-answer', 'question_page');

                                        );

                                        Post as a guest















                                        Required, but never shown





















































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown

































                                        Required, but never shown














                                        Required, but never shown












                                        Required, but never shown







                                        Required, but never shown







                                        Popular posts from this blog

                                        Bulk add to cart function issuecart vs. mini cart issue … rwd themeRedirect Add to cart button to cart pageAdd to cart issue - Magento 2.1The requested Payment Method is not available When creating an orderM2: reason add-to-cart might not function in production modeAdd to cart issue in some android devicesMagento 2 - custom price can not add to subtotal and grand total after add to cartAdd to cart codeIssue with my cart module on pdp and cart pages, just keeps spinningBulk price and quantity update using rest api

                                        БиармияSxpst500bh2ntaf! 3h2r