Identify and count spells (Distinctive events within each group)R - list to data frameCount number of rows within each groupCounting unique / distinct values by group in a data frameR: find relative weight within each group and within the entire dataframeR: how to calculate summary for each group and all the data?count the number of distinct variables in a groupusing tidyverse; counting after and before change in value, within groups, generating new variables for each unique shiftDistinct in r within groups of datahow to get count and distinct count with group by in dataframe RNest a dataframe by group, but include extra rows within each groupChange value by group based in reference within group
In Romance of the Three Kingdoms why do people still use bamboo sticks when papers are already invented?
Can one be a co-translator of a book, if he does not know the language that the book is translated into?
Stopping power of mountain vs road bike
Should I tell management that I intend to leave due to bad software development practices?
Why does Kotter return in Welcome Back Kotter
Were any external disk drives stacked vertically?
Alternative to sending password over mail?
How do I write bicross product symbols in latex?
Why doesn't H₄O²⁺ exist?
Is there a hemisphere-neutral way of specifying a season?
Today is the Center
Doing something right before you need it - expression for this?
Why is consensus so controversial in Britain?
Neighboring nodes in the network
What reasons are there for a Capitalist to oppose a 100% inheritance tax?
Why is the 'in' operator throwing an error with a string literal instead of logging false?
How can I tell someone that I want to be his or her friend?
Blender 2.8 I can't see vertices, edges or faces in edit mode
Why does the EU insist on the backstop when it is clear in a no deal scenario they still intend to keep an open border?
Assassin's bullet with mercury
What's the point of deactivating Num Lock on login screens?
Where does SFDX store details about scratch orgs?
Can a rocket refuel on Mars from water?
Combinations of multiple lists
Identify and count spells (Distinctive events within each group)
R - list to data frameCount number of rows within each groupCounting unique / distinct values by group in a data frameR: find relative weight within each group and within the entire dataframeR: how to calculate summary for each group and all the data?count the number of distinct variables in a groupusing tidyverse; counting after and before change in value, within groups, generating new variables for each unique shiftDistinct in r within groups of datahow to get count and distinct count with group by in dataframe RNest a dataframe by group, but include extra rows within each groupChange value by group based in reference within group
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell is what I'm trying to compute. I've tried using dplyr's lead and lag, but that gets too complicated. I've tried rle but got nowhere.

ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse solution.
Assumptions
Data is sorted by
groupand then bytimeThere are no gaps in
timewithin each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rleapproach by @markus took 0.53 seconds - the
cumsumapproach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rleanddense_rankby @tmfmnk took 0.89
I ended up choosing (1) by @markus because it's fast and still somewhat intuitive (subjective). (2) by @M-M best satisfied my desire for a dplyr solution, though it is computationally inefficient.
r dataframe dplyr time-series tidyverse
add a comment |
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell is what I'm trying to compute. I've tried using dplyr's lead and lag, but that gets too complicated. I've tried rle but got nowhere.

ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse solution.
Assumptions
Data is sorted by
groupand then bytimeThere are no gaps in
timewithin each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rleapproach by @markus took 0.53 seconds - the
cumsumapproach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rleanddense_rankby @tmfmnk took 0.89
I ended up choosing (1) by @markus because it's fast and still somewhat intuitive (subjective). (2) by @M-M best satisfied my desire for a dplyr solution, though it is computationally inefficient.
r dataframe dplyr time-series tidyverse
5
For someone who is not familiar with how thespellis computed, can you share a formula or description?
– nsinghs
Apr 1 at 20:55
@nsinghs I think they mean "hospital spell"
– zx8754
Apr 1 at 21:29
Curious for the results if you timed my answer? You should also consider accepting the best answer.
– Hector Haffenden
2 days ago
add a comment |
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell is what I'm trying to compute. I've tried using dplyr's lead and lag, but that gets too complicated. I've tried rle but got nowhere.

ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse solution.
Assumptions
Data is sorted by
groupand then bytimeThere are no gaps in
timewithin each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rleapproach by @markus took 0.53 seconds - the
cumsumapproach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rleanddense_rankby @tmfmnk took 0.89
I ended up choosing (1) by @markus because it's fast and still somewhat intuitive (subjective). (2) by @M-M best satisfied my desire for a dplyr solution, though it is computationally inefficient.
r dataframe dplyr time-series tidyverse
I'm looking for an efficient way to identify spells/runs in a time series. In the image below, the first three columns is what I have, the fourth column, spell is what I'm trying to compute. I've tried using dplyr's lead and lag, but that gets too complicated. I've tried rle but got nowhere.

ReprEx
df <- structure(list(time = structure(c(1538876340, 1538876400,
1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800,
1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B",
"B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
I prefer a tidyverse solution.
Assumptions
Data is sorted by
groupand then bytimeThere are no gaps in
timewithin each group
Update
Thanks for the contributions. I've timed some of the proposed approaches on the full data (n=2,583,360)
- the
rleapproach by @markus took 0.53 seconds - the
cumsumapproach by @M-M took 2.85 seconds - the function approach by @MrFlick took 0.66 seconds
- the
rleanddense_rankby @tmfmnk took 0.89
I ended up choosing (1) by @markus because it's fast and still somewhat intuitive (subjective). (2) by @M-M best satisfied my desire for a dplyr solution, though it is computationally inefficient.
r dataframe dplyr time-series tidyverse
r dataframe dplyr time-series tidyverse
edited yesterday
Thomas Speidel
asked Apr 1 at 20:44
Thomas SpeidelThomas Speidel
366216
366216
5
For someone who is not familiar with how thespellis computed, can you share a formula or description?
– nsinghs
Apr 1 at 20:55
@nsinghs I think they mean "hospital spell"
– zx8754
Apr 1 at 21:29
Curious for the results if you timed my answer? You should also consider accepting the best answer.
– Hector Haffenden
2 days ago
add a comment |
5
For someone who is not familiar with how thespellis computed, can you share a formula or description?
– nsinghs
Apr 1 at 20:55
@nsinghs I think they mean "hospital spell"
– zx8754
Apr 1 at 21:29
Curious for the results if you timed my answer? You should also consider accepting the best answer.
– Hector Haffenden
2 days ago
5
5
For someone who is not familiar with how the
spell is computed, can you share a formula or description?– nsinghs
Apr 1 at 20:55
For someone who is not familiar with how the
spell is computed, can you share a formula or description?– nsinghs
Apr 1 at 20:55
@nsinghs I think they mean "hospital spell"
– zx8754
Apr 1 at 21:29
@nsinghs I think they mean "hospital spell"
– zx8754
Apr 1 at 21:29
Curious for the results if you timed my answer? You should also consider accepting the best answer.
– Hector Haffenden
2 days ago
Curious for the results if you timed my answer? You should also consider accepting the best answer.
– Hector Haffenden
2 days ago
add a comment |
6 Answers
6
active
oldest
votes
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell =
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
You asked for a tidyverse solution but if speed is your concern, you might use data.table. The syntax is very similar
library(data.table)
setDT(df)[, spell :=
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
, by = group][] # the [] at the end prints the data.table
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle to get back a vector of the same length as is.5.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
2 days ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
2 days ago
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag)
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag() to look for changes. We use cumsum() to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
Here is one option with rleid from data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'group', get the run-length-id (rleid) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i with a logical vector to select rows that have 'spell' values not zero, match those values of 'spell' with unique 'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat)
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
One options is using cumsum:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5 this takes care of assigning a new id (i.e. spell) whenever is.5 changes; but we want to avoid assigning new ones to those rows is.5 equal to 0 and that's why I have the second rule in cumsum function (i.e. (is.5!=0)).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.
add a comment |
A somehow different possibility (not involving cumsum()) could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
Here it, first, groups by "group" and then gets the run-length-ID of "is.5". Second, it groups by "group" and "is.5" and ranks the values on the run-length-ID. Finally, it assigns 0 to rows where "is.5" == 0.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "1"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55463310%2fidentify-and-count-spells-distinctive-events-within-each-group%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell =
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
You asked for a tidyverse solution but if speed is your concern, you might use data.table. The syntax is very similar
library(data.table)
setDT(df)[, spell :=
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
, by = group][] # the [] at the end prints the data.table
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle to get back a vector of the same length as is.5.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
2 days ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
2 days ago
add a comment |
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell =
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
You asked for a tidyverse solution but if speed is your concern, you might use data.table. The syntax is very similar
library(data.table)
setDT(df)[, spell :=
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
, by = group][] # the [] at the end prints the data.table
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle to get back a vector of the same length as is.5.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group.
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
2 days ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
2 days ago
add a comment |
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell =
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
You asked for a tidyverse solution but if speed is your concern, you might use data.table. The syntax is very similar
library(data.table)
setDT(df)[, spell :=
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
, by = group][] # the [] at the end prints the data.table
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle to get back a vector of the same length as is.5.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group.
One option using rle
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell =
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
)
# A tibble: 14 x 4
# Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
#10 2018-05-20 14:01:00 B 0 0
#11 2018-05-20 14:02:00 B 1 1
#12 2018-05-20 14:03:00 B 1 1
#13 2018-05-20 14:04:00 B 0 0
#14 2018-05-20 14:05:00 B 1 2
You asked for a tidyverse solution but if speed is your concern, you might use data.table. The syntax is very similar
library(data.table)
setDT(df)[, spell :=
r <- rle(is.5)
r$values <- cumsum(r$values) * r$values
inverse.rle(r)
, by = group][] # the [] at the end prints the data.table
explanation
When we call
r <- rle(df$is.5)
the result we get is
r
#Run Length Encoding
# lengths: int [1:10] 1 2 1 1 2 1 2 2 1 1
# values : num [1:10] 0 1 0 1 0 1 0 1 0 1
We need to replace values with the cumulative sum where values == 1 while values should remain zero otherwise.
We can achieve this when we multiple cumsum(r$values) with r$values; where the latter is a vector of 0s and 1s.
r$values <- cumsum(r$values) * r$values
r$values
# [1] 0 1 0 2 0 3 0 4 0 5
Finally we call inverse.rle to get back a vector of the same length as is.5.
inverse.rle(r)
# [1] 0 1 1 0 2 0 0 3 0 0 4 4 0 5
We do this for every group.
edited 2 days ago
answered Apr 1 at 21:05
markusmarkus
15.1k11336
15.1k11336
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
2 days ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
2 days ago
add a comment |
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
2 days ago
1
@M-M Added some explanation. Thanks for the comment.
– markus
2 days ago
1
1
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
2 days ago
I understand why and how that works, but it'd be nice if you could draw your line of thoughts into the logic. Cheers.
– M-M
2 days ago
1
1
@M-M Added some explanation. Thanks for the comment.
– markus
2 days ago
@M-M Added some explanation. Thanks for the comment.
– markus
2 days ago
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag)
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag() to look for changes. We use cumsum() to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag)
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag() to look for changes. We use cumsum() to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
add a comment |
Here's a helper function that can return what you are after
spell_index <- function(time, flag)
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag() to look for changes. We use cumsum() to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
Here's a helper function that can return what you are after
spell_index <- function(time, flag)
change <- time-lag(time)==1 & flag==1 & lag(flag)!=1
cumsum(change) * (flag==1)+0
And you can use it with your data like
library(dplyr)
df %>%
group_by(group) %>%
mutate(
spell = spell_index(time, is.5)
)
Basically the helper functions uses lag() to look for changes. We use cumsum() to increment the number of changes. Then we multiply by a boolean value so zero-out the values you want to be zeroed out.
answered Apr 1 at 20:57
MrFlickMrFlick
125k11141174
125k11141174
add a comment |
add a comment |
Here is one option with rleid from data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'group', get the run-length-id (rleid) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i with a logical vector to select rows that have 'spell' values not zero, match those values of 'spell' with unique 'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
Here is one option with rleid from data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'group', get the run-length-id (rleid) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i with a logical vector to select rows that have 'spell' values not zero, match those values of 'spell' with unique 'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
add a comment |
Here is one option with rleid from data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'group', get the run-length-id (rleid) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i with a logical vector to select rows that have 'spell' values not zero, match those values of 'spell' with unique 'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
Here is one option with rleid from data.table. Convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'group', get the run-length-id (rleid) of 'is.5' and multiply with the values of 'is.5' so as to replace the ids corresponding to 0s in is.5 to 0, assign it to 'spell', then specify the i with a logical vector to select rows that have 'spell' values not zero, match those values of 'spell' with unique 'spell' and assign it to 'spell'
library(data.table)
setDT(df)[, spell := rleid(is.5) * as.integer(is.5), group
][!!spell, spell := match(spell, unique(spell))][]
# time group is.5 spell
# 1: 2018-10-07 01:39:00 A 0 0
# 2: 2018-10-07 01:40:00 A 1 1
# 3: 2018-10-07 01:41:00 A 1 1
# 4: 2018-10-07 01:42:00 A 0 0
# 5: 2018-10-07 01:43:00 A 1 2
# 6: 2018-10-07 01:44:00 A 0 0
# 7: 2018-10-07 01:45:00 A 0 0
# 8: 2018-10-07 01:46:00 A 1 3
# 9: 2018-05-20 14:00:00 B 0 0
#10: 2018-05-20 14:01:00 B 0 0
#11: 2018-05-20 14:02:00 B 1 1
#12: 2018-05-20 14:03:00 B 1 1
#13: 2018-05-20 14:04:00 B 0 0
#14: 2018-05-20 14:05:00 B 1 2
Or after the first step, use .GRP
df[!!spell, spell := .GRP, spell]
edited 2 days ago
answered 2 days ago
akrunakrun
419k13207283
419k13207283
add a comment |
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat)
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat)
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
add a comment |
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat)
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
This works,
The data,
df <- structure(list(time = structure(c(1538876340, 1538876400, 1538876460,1538876520, 1538876580, 1538876640, 1538876700, 1538876760, 1526824800, 1526824860, 1526824920, 1526824980, 1526825040, 1526825100), class = c("POSIXct", "POSIXt"), tzone = "UTC"), group = c("A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"), is.5 = c(0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -14L))
We split our data by group,
df2 <- split(df, df$group)
Build a function we can apply to the list,
my_func <- function(dat)
rst <- dat %>%
mutate(change = diff(c(0,is.5))) %>%
mutate(flag = change*abs(is.5)) %>%
mutate(spell = ifelse(is.5 == 0
Then apply it,
l <- lapply(df2, my_func)
We can now turn this list back into a data frame:
do.call(rbind.data.frame, l)
edited Apr 1 at 21:13
answered Apr 1 at 21:02
Hector HaffendenHector Haffenden
604216
604216
add a comment |
add a comment |
One options is using cumsum:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5 this takes care of assigning a new id (i.e. spell) whenever is.5 changes; but we want to avoid assigning new ones to those rows is.5 equal to 0 and that's why I have the second rule in cumsum function (i.e. (is.5!=0)).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.
add a comment |
One options is using cumsum:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5 this takes care of assigning a new id (i.e. spell) whenever is.5 changes; but we want to avoid assigning new ones to those rows is.5 equal to 0 and that's why I have the second rule in cumsum function (i.e. (is.5!=0)).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.
add a comment |
One options is using cumsum:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5 this takes care of assigning a new id (i.e. spell) whenever is.5 changes; but we want to avoid assigning new ones to those rows is.5 equal to 0 and that's why I have the second rule in cumsum function (i.e. (is.5!=0)).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.
One options is using cumsum:
library(dplyr)
df %>% group_by(group) %>% arrange(group, time) %>%
mutate(spell = is.5 * cumsum( c(0,lag(is.5)[-1]) != is.5 & is.5!=0) )
# # A tibble: 14 x 4
# # Groups: group [2]
# time group is.5 spell
# <dttm> <chr> <dbl> <dbl>
# 1 2018-10-07 01:39:00 A 0 0
# 2 2018-10-07 01:40:00 A 1 1
# 3 2018-10-07 01:41:00 A 1 1
# 4 2018-10-07 01:42:00 A 0 0
# 5 2018-10-07 01:43:00 A 1 2
# 6 2018-10-07 01:44:00 A 0 0
# 7 2018-10-07 01:45:00 A 0 0
# 8 2018-10-07 01:46:00 A 1 3
# 9 2018-05-20 14:00:00 B 0 0
# 10 2018-05-20 14:01:00 B 0 0
# 11 2018-05-20 14:02:00 B 1 1
# 12 2018-05-20 14:03:00 B 1 1
# 13 2018-05-20 14:04:00 B 0 0
# 14 2018-05-20 14:05:00 B 1 2
c(0,lag(is.5)[-1]) != is.5 this takes care of assigning a new id (i.e. spell) whenever is.5 changes; but we want to avoid assigning new ones to those rows is.5 equal to 0 and that's why I have the second rule in cumsum function (i.e. (is.5!=0)).
However, that second rule only prevents assigning a new id (adding 1 to the previous id) but it won't set the id to 0. That's why I have multiplied the answer by is.5.
answered 2 days ago
M-MM-M
7,19962146
7,19962146
add a comment |
add a comment |
A somehow different possibility (not involving cumsum()) could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
Here it, first, groups by "group" and then gets the run-length-ID of "is.5". Second, it groups by "group" and "is.5" and ranks the values on the run-length-ID. Finally, it assigns 0 to rows where "is.5" == 0.
add a comment |
A somehow different possibility (not involving cumsum()) could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
Here it, first, groups by "group" and then gets the run-length-ID of "is.5". Second, it groups by "group" and "is.5" and ranks the values on the run-length-ID. Finally, it assigns 0 to rows where "is.5" == 0.
add a comment |
A somehow different possibility (not involving cumsum()) could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
Here it, first, groups by "group" and then gets the run-length-ID of "is.5". Second, it groups by "group" and "is.5" and ranks the values on the run-length-ID. Finally, it assigns 0 to rows where "is.5" == 0.
A somehow different possibility (not involving cumsum()) could be:
df %>%
group_by(group) %>%
mutate(spell = with(rle(is.5), rep(seq_along(lengths), lengths))) %>%
group_by(group, is.5) %>%
mutate(spell = dense_rank(spell)) %>%
ungroup() %>%
mutate(spell = ifelse(is.5 == 0, 0, spell))
time group is.5 spell
<dttm> <chr> <dbl> <dbl>
1 2018-10-07 01:39:00 A 0 0
2 2018-10-07 01:40:00 A 1 1
3 2018-10-07 01:41:00 A 1 1
4 2018-10-07 01:42:00 A 0 0
5 2018-10-07 01:43:00 A 1 2
6 2018-10-07 01:44:00 A 0 0
7 2018-10-07 01:45:00 A 0 0
8 2018-10-07 01:46:00 A 1 3
9 2018-05-20 14:00:00 B 0 0
10 2018-05-20 14:01:00 B 0 0
11 2018-05-20 14:02:00 B 1 1
12 2018-05-20 14:03:00 B 1 1
13 2018-05-20 14:04:00 B 0 0
14 2018-05-20 14:05:00 B 1 2
Here it, first, groups by "group" and then gets the run-length-ID of "is.5". Second, it groups by "group" and "is.5" and ranks the values on the run-length-ID. Finally, it assigns 0 to rows where "is.5" == 0.
edited 2 days ago
answered Apr 1 at 21:37
tmfmnktmfmnk
3,6561516
3,6561516
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f55463310%2fidentify-and-count-spells-distinctive-events-within-each-group%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
5
For someone who is not familiar with how the
spellis computed, can you share a formula or description?– nsinghs
Apr 1 at 20:55
@nsinghs I think they mean "hospital spell"
– zx8754
Apr 1 at 21:29
Curious for the results if you timed my answer? You should also consider accepting the best answer.
– Hector Haffenden
2 days ago