How to delete every two lines after 3rd lines in a file contains very large number of lines? [duplicate] The Next CEO of Stack OverflowHow to print lines number 15 and 25 out of each 50 lines?AWK command failing for large fileextract every nth character from a stringawk manipulationSum of alternate values in a column using either sed or nawkCheck if two lines start with the same character, if so the output average, if not, print actual valuehow to use awk to do subtraction with numbers in a large fileHow to take the values from two columns in a txt file and match them to values in anotherHow to find the min of a column in every nth intervals of a file, using sed, sort, tail?Remove the line if a field of the line exists in another fileHow to aggregate the below records using awk command
Small nick on power cord from an electric alarm clock, and copper wiring exposed but intact
Is a linearly independent set whose span is dense a Schauder basis?
How to compactly explain secondary and tertiary characters without resorting to stereotypes?
How can a day be of 24 hours?
How badly should I try to prevent a user from XSSing themselves?
Upgrading From a 9 Speed Sora Derailleur?
Can Sri Krishna be called 'a person'?
Is it reasonable to ask other researchers to send me their previous grant applications?
Does the Idaho Potato Commission associate potato skins with healthy eating?
Why was Sir Cadogan fired?
Find the majority element, which appears more than half the time
"Eavesdropping" vs "Listen in on"
Another proof that dividing by 0 does not exist -- is it right?
How can I prove that a state of equilibrium is unstable?
Is it correct to say moon starry nights?
What happens if you break a law in another country outside of that country?
That's an odd coin - I wonder why
Raspberry pi 3 B with Ubuntu 18.04 server arm64: what pi version
How to find if SQL server backup is encrypted with TDE without restoring the backup
My ex-girlfriend uses my Apple ID to login to her iPad, do I have to give her my Apple ID password to reset it?
Creating a script with console commands
Avoiding the "not like other girls" trope?
Prodigo = pro + ago?
Oldie but Goldie
How to delete every two lines after 3rd lines in a file contains very large number of lines? [duplicate]
The Next CEO of Stack OverflowHow to print lines number 15 and 25 out of each 50 lines?AWK command failing for large fileextract every nth character from a stringawk manipulationSum of alternate values in a column using either sed or nawkCheck if two lines start with the same character, if so the output average, if not, print actual valuehow to use awk to do subtraction with numbers in a large fileHow to take the values from two columns in a txt file and match them to values in anotherHow to find the min of a column in every nth intervals of a file, using sed, sort, tail?Remove the line if a field of the line exists in another fileHow to aggregate the below records using awk command
This question already has an answer here:
How to print lines number 15 and 25 out of each 50 lines?
4 answers
Like
If I have :
1st line (keep)
2nd line (keep)
3rd line (keep)
4rth lines (delete)
5th (del)
6th (keep)
7nth (keep)
8th lines (keep)
9th (del)
10th (del)
11th (keep)
12th (keep)
13th (keep)
14th (del)
15th (del)
etc....
bash shell awk sed
New contributor
marked as duplicate by Sundeep, jimmij, Prvt_Yadv, forcefsck, BitsOfNix 21 hours ago
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
add a comment |
This question already has an answer here:
How to print lines number 15 and 25 out of each 50 lines?
4 answers
Like
If I have :
1st line (keep)
2nd line (keep)
3rd line (keep)
4rth lines (delete)
5th (del)
6th (keep)
7nth (keep)
8th lines (keep)
9th (del)
10th (del)
11th (keep)
12th (keep)
13th (keep)
14th (del)
15th (del)
etc....
bash shell awk sed
New contributor
marked as duplicate by Sundeep, jimmij, Prvt_Yadv, forcefsck, BitsOfNix 21 hours ago
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
1
increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)
– ChuckCottrill
2 days ago
can you please clarify more,
– Jaguar Jom
2 days ago
1
the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would beprint lines 1,2,3 out of each 5 lines
for ex:seq 15 | awk 'BEGIN a[1] a[2] a[3] ; NR % 5 in a'
andseq 15 | sed -n 'p;n;p;n;p;n;n'
– Sundeep
2 days ago
also, thesed
version above might be faster than theawk
one for large files
– Sundeep
2 days ago
add a comment |
This question already has an answer here:
How to print lines number 15 and 25 out of each 50 lines?
4 answers
Like
If I have :
1st line (keep)
2nd line (keep)
3rd line (keep)
4rth lines (delete)
5th (del)
6th (keep)
7nth (keep)
8th lines (keep)
9th (del)
10th (del)
11th (keep)
12th (keep)
13th (keep)
14th (del)
15th (del)
etc....
bash shell awk sed
New contributor
This question already has an answer here:
How to print lines number 15 and 25 out of each 50 lines?
4 answers
Like
If I have :
1st line (keep)
2nd line (keep)
3rd line (keep)
4rth lines (delete)
5th (del)
6th (keep)
7nth (keep)
8th lines (keep)
9th (del)
10th (del)
11th (keep)
12th (keep)
13th (keep)
14th (del)
15th (del)
etc....
This question already has an answer here:
How to print lines number 15 and 25 out of each 50 lines?
4 answers
bash shell awk sed
bash shell awk sed
New contributor
New contributor
edited yesterday
Prvt_Yadv
3,06031329
3,06031329
New contributor
asked 2 days ago
Jaguar JomJaguar Jom
262
262
New contributor
New contributor
marked as duplicate by Sundeep, jimmij, Prvt_Yadv, forcefsck, BitsOfNix 21 hours ago
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by Sundeep, jimmij, Prvt_Yadv, forcefsck, BitsOfNix 21 hours ago
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
1
increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)
– ChuckCottrill
2 days ago
can you please clarify more,
– Jaguar Jom
2 days ago
1
the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would beprint lines 1,2,3 out of each 5 lines
for ex:seq 15 | awk 'BEGIN a[1] a[2] a[3] ; NR % 5 in a'
andseq 15 | sed -n 'p;n;p;n;p;n;n'
– Sundeep
2 days ago
also, thesed
version above might be faster than theawk
one for large files
– Sundeep
2 days ago
add a comment |
1
increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)
– ChuckCottrill
2 days ago
can you please clarify more,
– Jaguar Jom
2 days ago
1
the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would beprint lines 1,2,3 out of each 5 lines
for ex:seq 15 | awk 'BEGIN a[1] a[2] a[3] ; NR % 5 in a'
andseq 15 | sed -n 'p;n;p;n;p;n;n'
– Sundeep
2 days ago
also, thesed
version above might be faster than theawk
one for large files
– Sundeep
2 days ago
1
1
increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)
– ChuckCottrill
2 days ago
increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)
– ChuckCottrill
2 days ago
can you please clarify more,
– Jaguar Jom
2 days ago
can you please clarify more,
– Jaguar Jom
2 days ago
1
1
the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would be
print lines 1,2,3 out of each 5 lines
for ex: seq 15 | awk 'BEGIN a[1] a[2] a[3] ; NR % 5 in a'
and seq 15 | sed -n 'p;n;p;n;p;n;n'
– Sundeep
2 days ago
the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would be
print lines 1,2,3 out of each 5 lines
for ex: seq 15 | awk 'BEGIN a[1] a[2] a[3] ; NR % 5 in a'
and seq 15 | sed -n 'p;n;p;n;p;n;n'
– Sundeep
2 days ago
also, the
sed
version above might be faster than the awk
one for large files– Sundeep
2 days ago
also, the
sed
version above might be faster than the awk
one for large files– Sundeep
2 days ago
add a comment |
6 Answers
6
active
oldest
votes
Try:
awk '(NR-1)%5<3' file
For example:
$ awk '(NR-1)%5<3' file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
How it works
The command (NR-1)%5<3
tells awk
to print any line for which (NR-1)%5<3
is true. In awk
, NR
is the line number with the first line counting as 1
. For every five lines in the file, that statement will be true for the first three.
Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion
– Jaguar Jom
yesterday
@JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?
– John1024
yesterday
yes actually i got different result actually,
– Jaguar Jom
22 hours ago
To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?
– John1024
22 hours ago
@JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, useawk '(NR-1)%6<4' file
.
– John1024
21 hours ago
|
show 2 more comments
A simple command is:
awk 'if((NR-1) % 5<=2)print $0' file
It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5
will give output like 0 1 2 3 4
, and first 3 lines are less than equal to 2. So it will only print them.
I have file with contents:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
The output is:
1
2
3
6
7
8
11
12
13
Or as suggested in comments you can use:
awk '(NR - 1) % 5 <= 2' file
3
Or, with idiomatic use ofawk
syntax:awk '(NR - 1) % 5 <= 2' file
– Kusalananda♦
2 days ago
Thanks I didnt know it.
– Prvt_Yadv
2 days ago
awk 'if((NR-1) % 5<=2)print $0' file Thank you, this work very good for me but increasing 1 to line awk 'if((NR-1) % 6<=2)print $0' file
– Jaguar Jom
yesterday
add a comment |
Basically, you want something like 'Fizz-Buzz' in awk ...
awk ' if (i++%5 < 3) print $0;'
To show this works...
for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
awk ' if (i++%5 < 3) print $0;'
When your file is named, 'mybigfile.csv',
awk ' if (i++%5 < 3) print $0;' < mybigfile.csv > mybigfile-123.csv
You could use NR, or just rely on i defaulting to zero :-) (code golf)
– ChuckCottrill
2 days ago
add a comment |
A generic solution for masking out a particular pattern of lines from a file:
#!/bin/sh
# The pattern is given on the command line.
pattern=$1
# The period is simply the length of the pattern.
period=$#pattern
# Use bc to convert the binary pattern to an integer.
mask=$( printf 'ibase=2; %sn' "$pattern" | bc )
awk -v mask="$mask" -v period="$period" '
BEGIN p = lshift(1, period-1)
and(rshift(p, (FNR-1) % period), mask)'
This relies on awk
implementing the non-standard functions and()
(bitwise AND), rshift()
and lshift()
(bitwise right and left shift), which both GNU awk
and some BSD implementations of awk
does, but not mawk
.
This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1
means "keep" and a 0
means "delete".
For example: The pattern of line that should be applied in your question is 11100
, which means "for each set of five lines, keep the first three and delete the others".
Using 01001000
would delete all but the 2nd and 5th lines in every 8 lines.
The awk
program could also be written without the BEGIN
block as
and(lshift(1, (period-1) - (FNR-1) % period), mask)
Left-shifting 1 by (period-1) - (FNR-1) % period
positions is the same as calculating 2 to that power, but I'm using lshift()
since awk
does its arithmetics using floating point operations rather than in exact integer arithmetics.
Since the code relies on the binary representation of the pattern, very long patterns may not work well.
Testing:
Removing the lines you want to remove:
$ sh script.sh 11100 <file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
Inverting the pattern:
$ sh script.sh 00011 <file
4rth lines (delete)
5th (del)
9th (del)
10th (del)
14th (del)
15th (del)
add a comment |
This can be solved using GNU sed
:
sed '4~5,5~5d' file
Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew
, after which it can be used as gsed
. On Linux, GNU sed is the default.
This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d'
fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.
The top-voted answer suggests using awk '(NR-1)%5<3'
. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.
New contributor
3
+1 ... or4~5N;d;
– steeldriver
2 days ago
add a comment |
Tried with below command and it worked fine
for((i=1;i<=20;i++)); do j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done
output
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
1
That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.
– Law29
2 days ago
add a comment |
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
Try:
awk '(NR-1)%5<3' file
For example:
$ awk '(NR-1)%5<3' file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
How it works
The command (NR-1)%5<3
tells awk
to print any line for which (NR-1)%5<3
is true. In awk
, NR
is the line number with the first line counting as 1
. For every five lines in the file, that statement will be true for the first three.
Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion
– Jaguar Jom
yesterday
@JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?
– John1024
yesterday
yes actually i got different result actually,
– Jaguar Jom
22 hours ago
To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?
– John1024
22 hours ago
@JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, useawk '(NR-1)%6<4' file
.
– John1024
21 hours ago
|
show 2 more comments
Try:
awk '(NR-1)%5<3' file
For example:
$ awk '(NR-1)%5<3' file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
How it works
The command (NR-1)%5<3
tells awk
to print any line for which (NR-1)%5<3
is true. In awk
, NR
is the line number with the first line counting as 1
. For every five lines in the file, that statement will be true for the first three.
Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion
– Jaguar Jom
yesterday
@JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?
– John1024
yesterday
yes actually i got different result actually,
– Jaguar Jom
22 hours ago
To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?
– John1024
22 hours ago
@JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, useawk '(NR-1)%6<4' file
.
– John1024
21 hours ago
|
show 2 more comments
Try:
awk '(NR-1)%5<3' file
For example:
$ awk '(NR-1)%5<3' file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
How it works
The command (NR-1)%5<3
tells awk
to print any line for which (NR-1)%5<3
is true. In awk
, NR
is the line number with the first line counting as 1
. For every five lines in the file, that statement will be true for the first three.
Try:
awk '(NR-1)%5<3' file
For example:
$ awk '(NR-1)%5<3' file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
How it works
The command (NR-1)%5<3
tells awk
to print any line for which (NR-1)%5<3
is true. In awk
, NR
is the line number with the first line counting as 1
. For every five lines in the file, that statement will be true for the first three.
edited 2 days ago
Kusalananda♦
139k17259430
139k17259430
answered 2 days ago
John1024John1024
48.2k5113128
48.2k5113128
Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion
– Jaguar Jom
yesterday
@JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?
– John1024
yesterday
yes actually i got different result actually,
– Jaguar Jom
22 hours ago
To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?
– John1024
22 hours ago
@JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, useawk '(NR-1)%6<4' file
.
– John1024
21 hours ago
|
show 2 more comments
Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion
– Jaguar Jom
yesterday
@JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?
– John1024
yesterday
yes actually i got different result actually,
– Jaguar Jom
22 hours ago
To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?
– John1024
22 hours ago
@JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, useawk '(NR-1)%6<4' file
.
– John1024
21 hours ago
Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion
– Jaguar Jom
yesterday
Thank you, but I found this script will start to delete after the 3rd line which is okay, but for the next turn it will count again from the beginning where the lines order are changed and decreased by two lines so I can't delete the lines I want . any suggestion
– Jaguar Jom
yesterday
@JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?
– John1024
yesterday
@JaguarJom OK. I showed in the answer the output from your sample input data. Is that not the output that you wanted? Or, is it that, when you run the code, you get something different?
– John1024
yesterday
yes actually i got different result actually,
– Jaguar Jom
22 hours ago
yes actually i got different result actually,
– Jaguar Jom
22 hours ago
To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?
– John1024
22 hours ago
To check, I just copied-and-pasted your input and copied-and-pasted my command and run it and I get the same result as shown in the answer. Are you copying-and-pasting the same things? Have you modified the code? Are you testing the code on different input data? Can you use pastebin.com or similar to show me exactly what you are seeing?
– John1024
22 hours ago
@JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, use
awk '(NR-1)%6<4' file
.– John1024
21 hours ago
@JaguarJom In another comment, you hinted that your pattern is six lines long, not five, and you want to delete the last two of every six lines. If that is the case, use
awk '(NR-1)%6<4' file
.– John1024
21 hours ago
|
show 2 more comments
A simple command is:
awk 'if((NR-1) % 5<=2)print $0' file
It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5
will give output like 0 1 2 3 4
, and first 3 lines are less than equal to 2. So it will only print them.
I have file with contents:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
The output is:
1
2
3
6
7
8
11
12
13
Or as suggested in comments you can use:
awk '(NR - 1) % 5 <= 2' file
3
Or, with idiomatic use ofawk
syntax:awk '(NR - 1) % 5 <= 2' file
– Kusalananda♦
2 days ago
Thanks I didnt know it.
– Prvt_Yadv
2 days ago
awk 'if((NR-1) % 5<=2)print $0' file Thank you, this work very good for me but increasing 1 to line awk 'if((NR-1) % 6<=2)print $0' file
– Jaguar Jom
yesterday
add a comment |
A simple command is:
awk 'if((NR-1) % 5<=2)print $0' file
It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5
will give output like 0 1 2 3 4
, and first 3 lines are less than equal to 2. So it will only print them.
I have file with contents:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
The output is:
1
2
3
6
7
8
11
12
13
Or as suggested in comments you can use:
awk '(NR - 1) % 5 <= 2' file
3
Or, with idiomatic use ofawk
syntax:awk '(NR - 1) % 5 <= 2' file
– Kusalananda♦
2 days ago
Thanks I didnt know it.
– Prvt_Yadv
2 days ago
awk 'if((NR-1) % 5<=2)print $0' file Thank you, this work very good for me but increasing 1 to line awk 'if((NR-1) % 6<=2)print $0' file
– Jaguar Jom
yesterday
add a comment |
A simple command is:
awk 'if((NR-1) % 5<=2)print $0' file
It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5
will give output like 0 1 2 3 4
, and first 3 lines are less than equal to 2. So it will only print them.
I have file with contents:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
The output is:
1
2
3
6
7
8
11
12
13
Or as suggested in comments you can use:
awk '(NR - 1) % 5 <= 2' file
A simple command is:
awk 'if((NR-1) % 5<=2)print $0' file
It will only print first 3 lines in sequence of 5 lines. Because (NR-1)%5
will give output like 0 1 2 3 4
, and first 3 lines are less than equal to 2. So it will only print them.
I have file with contents:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
The output is:
1
2
3
6
7
8
11
12
13
Or as suggested in comments you can use:
awk '(NR - 1) % 5 <= 2' file
edited 2 days ago
answered 2 days ago
Prvt_YadvPrvt_Yadv
3,06031329
3,06031329
3
Or, with idiomatic use ofawk
syntax:awk '(NR - 1) % 5 <= 2' file
– Kusalananda♦
2 days ago
Thanks I didnt know it.
– Prvt_Yadv
2 days ago
awk 'if((NR-1) % 5<=2)print $0' file Thank you, this work very good for me but increasing 1 to line awk 'if((NR-1) % 6<=2)print $0' file
– Jaguar Jom
yesterday
add a comment |
3
Or, with idiomatic use ofawk
syntax:awk '(NR - 1) % 5 <= 2' file
– Kusalananda♦
2 days ago
Thanks I didnt know it.
– Prvt_Yadv
2 days ago
awk 'if((NR-1) % 5<=2)print $0' file Thank you, this work very good for me but increasing 1 to line awk 'if((NR-1) % 6<=2)print $0' file
– Jaguar Jom
yesterday
3
3
Or, with idiomatic use of
awk
syntax: awk '(NR - 1) % 5 <= 2' file
– Kusalananda♦
2 days ago
Or, with idiomatic use of
awk
syntax: awk '(NR - 1) % 5 <= 2' file
– Kusalananda♦
2 days ago
Thanks I didnt know it.
– Prvt_Yadv
2 days ago
Thanks I didnt know it.
– Prvt_Yadv
2 days ago
awk 'if((NR-1) % 5<=2)print $0' file Thank you, this work very good for me but increasing 1 to line awk 'if((NR-1) % 6<=2)print $0' file
– Jaguar Jom
yesterday
awk 'if((NR-1) % 5<=2)print $0' file Thank you, this work very good for me but increasing 1 to line awk 'if((NR-1) % 6<=2)print $0' file
– Jaguar Jom
yesterday
add a comment |
Basically, you want something like 'Fizz-Buzz' in awk ...
awk ' if (i++%5 < 3) print $0;'
To show this works...
for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
awk ' if (i++%5 < 3) print $0;'
When your file is named, 'mybigfile.csv',
awk ' if (i++%5 < 3) print $0;' < mybigfile.csv > mybigfile-123.csv
You could use NR, or just rely on i defaulting to zero :-) (code golf)
– ChuckCottrill
2 days ago
add a comment |
Basically, you want something like 'Fizz-Buzz' in awk ...
awk ' if (i++%5 < 3) print $0;'
To show this works...
for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
awk ' if (i++%5 < 3) print $0;'
When your file is named, 'mybigfile.csv',
awk ' if (i++%5 < 3) print $0;' < mybigfile.csv > mybigfile-123.csv
You could use NR, or just rely on i defaulting to zero :-) (code golf)
– ChuckCottrill
2 days ago
add a comment |
Basically, you want something like 'Fizz-Buzz' in awk ...
awk ' if (i++%5 < 3) print $0;'
To show this works...
for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
awk ' if (i++%5 < 3) print $0;'
When your file is named, 'mybigfile.csv',
awk ' if (i++%5 < 3) print $0;' < mybigfile.csv > mybigfile-123.csv
Basically, you want something like 'Fizz-Buzz' in awk ...
awk ' if (i++%5 < 3) print $0;'
To show this works...
for x in 1 2 3 4 5 6 7 8 9 10 ; do echo $x; done |
awk ' if (i++%5 < 3) print $0;'
When your file is named, 'mybigfile.csv',
awk ' if (i++%5 < 3) print $0;' < mybigfile.csv > mybigfile-123.csv
answered 2 days ago
ChuckCottrillChuckCottrill
732814
732814
You could use NR, or just rely on i defaulting to zero :-) (code golf)
– ChuckCottrill
2 days ago
add a comment |
You could use NR, or just rely on i defaulting to zero :-) (code golf)
– ChuckCottrill
2 days ago
You could use NR, or just rely on i defaulting to zero :-) (code golf)
– ChuckCottrill
2 days ago
You could use NR, or just rely on i defaulting to zero :-) (code golf)
– ChuckCottrill
2 days ago
add a comment |
A generic solution for masking out a particular pattern of lines from a file:
#!/bin/sh
# The pattern is given on the command line.
pattern=$1
# The period is simply the length of the pattern.
period=$#pattern
# Use bc to convert the binary pattern to an integer.
mask=$( printf 'ibase=2; %sn' "$pattern" | bc )
awk -v mask="$mask" -v period="$period" '
BEGIN p = lshift(1, period-1)
and(rshift(p, (FNR-1) % period), mask)'
This relies on awk
implementing the non-standard functions and()
(bitwise AND), rshift()
and lshift()
(bitwise right and left shift), which both GNU awk
and some BSD implementations of awk
does, but not mawk
.
This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1
means "keep" and a 0
means "delete".
For example: The pattern of line that should be applied in your question is 11100
, which means "for each set of five lines, keep the first three and delete the others".
Using 01001000
would delete all but the 2nd and 5th lines in every 8 lines.
The awk
program could also be written without the BEGIN
block as
and(lshift(1, (period-1) - (FNR-1) % period), mask)
Left-shifting 1 by (period-1) - (FNR-1) % period
positions is the same as calculating 2 to that power, but I'm using lshift()
since awk
does its arithmetics using floating point operations rather than in exact integer arithmetics.
Since the code relies on the binary representation of the pattern, very long patterns may not work well.
Testing:
Removing the lines you want to remove:
$ sh script.sh 11100 <file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
Inverting the pattern:
$ sh script.sh 00011 <file
4rth lines (delete)
5th (del)
9th (del)
10th (del)
14th (del)
15th (del)
add a comment |
A generic solution for masking out a particular pattern of lines from a file:
#!/bin/sh
# The pattern is given on the command line.
pattern=$1
# The period is simply the length of the pattern.
period=$#pattern
# Use bc to convert the binary pattern to an integer.
mask=$( printf 'ibase=2; %sn' "$pattern" | bc )
awk -v mask="$mask" -v period="$period" '
BEGIN p = lshift(1, period-1)
and(rshift(p, (FNR-1) % period), mask)'
This relies on awk
implementing the non-standard functions and()
(bitwise AND), rshift()
and lshift()
(bitwise right and left shift), which both GNU awk
and some BSD implementations of awk
does, but not mawk
.
This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1
means "keep" and a 0
means "delete".
For example: The pattern of line that should be applied in your question is 11100
, which means "for each set of five lines, keep the first three and delete the others".
Using 01001000
would delete all but the 2nd and 5th lines in every 8 lines.
The awk
program could also be written without the BEGIN
block as
and(lshift(1, (period-1) - (FNR-1) % period), mask)
Left-shifting 1 by (period-1) - (FNR-1) % period
positions is the same as calculating 2 to that power, but I'm using lshift()
since awk
does its arithmetics using floating point operations rather than in exact integer arithmetics.
Since the code relies on the binary representation of the pattern, very long patterns may not work well.
Testing:
Removing the lines you want to remove:
$ sh script.sh 11100 <file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
Inverting the pattern:
$ sh script.sh 00011 <file
4rth lines (delete)
5th (del)
9th (del)
10th (del)
14th (del)
15th (del)
add a comment |
A generic solution for masking out a particular pattern of lines from a file:
#!/bin/sh
# The pattern is given on the command line.
pattern=$1
# The period is simply the length of the pattern.
period=$#pattern
# Use bc to convert the binary pattern to an integer.
mask=$( printf 'ibase=2; %sn' "$pattern" | bc )
awk -v mask="$mask" -v period="$period" '
BEGIN p = lshift(1, period-1)
and(rshift(p, (FNR-1) % period), mask)'
This relies on awk
implementing the non-standard functions and()
(bitwise AND), rshift()
and lshift()
(bitwise right and left shift), which both GNU awk
and some BSD implementations of awk
does, but not mawk
.
This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1
means "keep" and a 0
means "delete".
For example: The pattern of line that should be applied in your question is 11100
, which means "for each set of five lines, keep the first three and delete the others".
Using 01001000
would delete all but the 2nd and 5th lines in every 8 lines.
The awk
program could also be written without the BEGIN
block as
and(lshift(1, (period-1) - (FNR-1) % period), mask)
Left-shifting 1 by (period-1) - (FNR-1) % period
positions is the same as calculating 2 to that power, but I'm using lshift()
since awk
does its arithmetics using floating point operations rather than in exact integer arithmetics.
Since the code relies on the binary representation of the pattern, very long patterns may not work well.
Testing:
Removing the lines you want to remove:
$ sh script.sh 11100 <file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
Inverting the pattern:
$ sh script.sh 00011 <file
4rth lines (delete)
5th (del)
9th (del)
10th (del)
14th (del)
15th (del)
A generic solution for masking out a particular pattern of lines from a file:
#!/bin/sh
# The pattern is given on the command line.
pattern=$1
# The period is simply the length of the pattern.
period=$#pattern
# Use bc to convert the binary pattern to an integer.
mask=$( printf 'ibase=2; %sn' "$pattern" | bc )
awk -v mask="$mask" -v period="$period" '
BEGIN p = lshift(1, period-1)
and(rshift(p, (FNR-1) % period), mask)'
This relies on awk
implementing the non-standard functions and()
(bitwise AND), rshift()
and lshift()
(bitwise right and left shift), which both GNU awk
and some BSD implementations of awk
does, but not mawk
.
This takes a pattern, which is a binary number representing both the cyclic period and what lines within each period should be kept or masked out. A 1
means "keep" and a 0
means "delete".
For example: The pattern of line that should be applied in your question is 11100
, which means "for each set of five lines, keep the first three and delete the others".
Using 01001000
would delete all but the 2nd and 5th lines in every 8 lines.
The awk
program could also be written without the BEGIN
block as
and(lshift(1, (period-1) - (FNR-1) % period), mask)
Left-shifting 1 by (period-1) - (FNR-1) % period
positions is the same as calculating 2 to that power, but I'm using lshift()
since awk
does its arithmetics using floating point operations rather than in exact integer arithmetics.
Since the code relies on the binary representation of the pattern, very long patterns may not work well.
Testing:
Removing the lines you want to remove:
$ sh script.sh 11100 <file
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
Inverting the pattern:
$ sh script.sh 00011 <file
4rth lines (delete)
5th (del)
9th (del)
10th (del)
14th (del)
15th (del)
edited 2 days ago
answered 2 days ago
Kusalananda♦Kusalananda
139k17259430
139k17259430
add a comment |
add a comment |
This can be solved using GNU sed
:
sed '4~5,5~5d' file
Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew
, after which it can be used as gsed
. On Linux, GNU sed is the default.
This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d'
fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.
The top-voted answer suggests using awk '(NR-1)%5<3'
. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.
New contributor
3
+1 ... or4~5N;d;
– steeldriver
2 days ago
add a comment |
This can be solved using GNU sed
:
sed '4~5,5~5d' file
Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew
, after which it can be used as gsed
. On Linux, GNU sed is the default.
This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d'
fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.
The top-voted answer suggests using awk '(NR-1)%5<3'
. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.
New contributor
3
+1 ... or4~5N;d;
– steeldriver
2 days ago
add a comment |
This can be solved using GNU sed
:
sed '4~5,5~5d' file
Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew
, after which it can be used as gsed
. On Linux, GNU sed is the default.
This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d'
fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.
The top-voted answer suggests using awk '(NR-1)%5<3'
. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.
New contributor
This can be solved using GNU sed
:
sed '4~5,5~5d' file
Note that this uses a GNU-specific extension to the sed standard, and thus doesn't work with e.g. BSD sed on macOS. However, GNU sed can be installed on macOS using brew
, after which it can be used as gsed
. On Linux, GNU sed is the default.
This prints every line that does not fall in the fourth till fifth line of every five lines; for a clearer example: sed '3~10,6~10d'
fill select lines 1, 2, 7, 8, 9, 10 of every group of 10 lines by deleting lines 3 till 6.
The top-voted answer suggests using awk '(NR-1)%5<3'
. On my machine, on a file containing the numbers 1 till 2 million, this takes about 0.6 seconds, while the sed solution in this answer takes about 0.35 seconds. This is reasonable, since sed is in general a simpler tool, and can thus work faster than the more complicated, but more full-featured, awk.
New contributor
New contributor
answered 2 days ago
tomsmedingtomsmeding
1413
1413
New contributor
New contributor
3
+1 ... or4~5N;d;
– steeldriver
2 days ago
add a comment |
3
+1 ... or4~5N;d;
– steeldriver
2 days ago
3
3
+1 ... or
4~5N;d;
– steeldriver
2 days ago
+1 ... or
4~5N;d;
– steeldriver
2 days ago
add a comment |
Tried with below command and it worked fine
for((i=1;i<=20;i++)); do j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done
output
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
1
That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.
– Law29
2 days ago
add a comment |
Tried with below command and it worked fine
for((i=1;i<=20;i++)); do j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done
output
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
1
That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.
– Law29
2 days ago
add a comment |
Tried with below command and it worked fine
for((i=1;i<=20;i++)); do j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done
output
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
Tried with below command and it worked fine
for((i=1;i<=20;i++)); do j=$(($i+2)); sed -n ''$i','$j'p' filename;i=$(($j+2)); done
output
1st line (keep)
2nd line (keep)
3rd line (keep)
6th (keep)
7nth (keep)
8th lines (keep)
11th (keep)
12th (keep)
13th (keep)
answered 2 days ago
Praveen Kumar BSPraveen Kumar BS
1,7101311
1,7101311
1
That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.
– Law29
2 days ago
add a comment |
1
That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.
– Law29
2 days ago
1
1
That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.
– Law29
2 days ago
That is nice, but you have know how many lines you have in advance, and you're looping back from the beginning each round. It cannot be used on a stream, and it gets more inefficient the bigger the data gets, so since OP says the number of lines is very large, this is not the best solution.
– Law29
2 days ago
add a comment |
1
increment a line counter (zero-indexed) for each line read, print when (line counter modulo 5>=3)
– ChuckCottrill
2 days ago
can you please clarify more,
– Jaguar Jom
2 days ago
1
the duplicate is slightly worded differently, but it is the same looked in a different way.. this question would be
print lines 1,2,3 out of each 5 lines
for ex:seq 15 | awk 'BEGIN a[1] a[2] a[3] ; NR % 5 in a'
andseq 15 | sed -n 'p;n;p;n;p;n;n'
– Sundeep
2 days ago
also, the
sed
version above might be faster than theawk
one for large files– Sundeep
2 days ago