Why does tar appear to skip file contents when output file is /dev/null? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election Results Why I closed the “Why is Kali so hard” questionWhy is my tar file bigger than its contents?mv a file to /dev/null breaks dev/nulltar: /dev/st0: Cannot write: Input/output errorMeaning of `cat /dev/null > file`Output file generated by tarWhy does my tar not work?What happens when you write to /dev/null? What’s the point?Why is /dev/null a file? Why isn't its function implemented as a simple program?Make tar from /dev/stdin fileCompare file compression methods by sending to /dev/null
Can an alien society believe that their star system is the universe?
Is the Standard Deduction better than Itemized when both are the same amount?
Is grep documentation wrong?
Is it cost-effective to upgrade an old-ish Giant Escape R3 commuter bike with entry-level branded parts (wheels, drivetrain)?
Is there such thing as an Availability Group failover trigger?
Does classifying an integer as a discrete log require it be part of a multiplicative group?
How to tell that you are a giant?
First console to have temporary backward compatibility
Around usage results
How does the math work when buying airline miles?
Did MS DOS itself ever use blinking text?
Maximum summed powersets with non-adjacent items
Fantasy story; one type of magic grows in power with use, but the more powerful they are, they more they are drawn to travel to their source
What do you call the main part of a joke?
Why do we bend a book to keep it straight?
Crossing US/Canada Border for less than 24 hours
Trademark violation for app?
How to compare two different files line by line in unix?
また usage in a dictionary
What font is "z" in "z-score"?
Is there a kind of relay only consumes power when switching?
How to write this math term? with cases it isn't working
Chinese Seal on silk painting - what does it mean?
Is it common practice to audition new musicians one-on-one before rehearsing with the entire band?
Why does tar appear to skip file contents when output file is /dev/null?
Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election Results
Why I closed the “Why is Kali so hard” questionWhy is my tar file bigger than its contents?mv a file to /dev/null breaks dev/nulltar: /dev/st0: Cannot write: Input/output errorMeaning of `cat /dev/null > file`Output file generated by tarWhy does my tar not work?What happens when you write to /dev/null? What’s the point?Why is /dev/null a file? Why isn't its function implemented as a simple program?Make tar from /dev/stdin fileCompare file compression methods by sending to /dev/null
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have a directory with over 400 GiB of data in it. I wanted to check that all the files can be read without errors, so a simple way I thought of was to tar
it into /dev/null
. But instead I see the following behavior:
$ time tar cf /dev/null .
real 0m4.387s
user 0m3.462s
sys 0m0.185s
$ time tar cf - . > /dev/null
real 0m3.130s
user 0m3.091s
sys 0m0.035s
$ time tar cf - . | cat > /dev/null
^C
real 10m32.985s
user 0m1.942s
sys 0m33.764s
The third command above was forcibly stopped by Ctrl+C after having run for quite long already. Moreover, while the first two commands were working, activity indicator of the storage device containing .
was nearly always idle. With the third command the indicator is constantly lit up, meaning extreme busyness.
So it seems that, when tar
is able to find out that its output file is /dev/null
, i.e. when /dev/null
is directly opened to have the file handle which tar
writes to, file body appears skipped. (Adding v
option to tar
does print all the files in the directory being tar
'red.)
So I wonder, why is this so? Is it some kind of optimization? If yes, then why would tar
even want to do such a dubious optimization for such a special case?
I'm using GNU tar 1.26 with glibc 2.27 on Linux 4.14.105 amd64.
tar null
add a comment |
I have a directory with over 400 GiB of data in it. I wanted to check that all the files can be read without errors, so a simple way I thought of was to tar
it into /dev/null
. But instead I see the following behavior:
$ time tar cf /dev/null .
real 0m4.387s
user 0m3.462s
sys 0m0.185s
$ time tar cf - . > /dev/null
real 0m3.130s
user 0m3.091s
sys 0m0.035s
$ time tar cf - . | cat > /dev/null
^C
real 10m32.985s
user 0m1.942s
sys 0m33.764s
The third command above was forcibly stopped by Ctrl+C after having run for quite long already. Moreover, while the first two commands were working, activity indicator of the storage device containing .
was nearly always idle. With the third command the indicator is constantly lit up, meaning extreme busyness.
So it seems that, when tar
is able to find out that its output file is /dev/null
, i.e. when /dev/null
is directly opened to have the file handle which tar
writes to, file body appears skipped. (Adding v
option to tar
does print all the files in the directory being tar
'red.)
So I wonder, why is this so? Is it some kind of optimization? If yes, then why would tar
even want to do such a dubious optimization for such a special case?
I'm using GNU tar 1.26 with glibc 2.27 on Linux 4.14.105 amd64.
tar null
7
As a practical alternative, consider something likefind . -type f -exec shasum -a256 -b '' +
. Not only does it actually read and checksum all the data, but if you store the output, you can re-run it later to check that the content of the files hasn't changed.
– Ilmari Karonen
Apr 14 at 11:00
To measure things you can also usepv
:tar -cf - | pv >/dev/null
. That sidesteps the issue and gives you a progress information (the variouspv
options)
– xenoid
Apr 14 at 16:22
add a comment |
I have a directory with over 400 GiB of data in it. I wanted to check that all the files can be read without errors, so a simple way I thought of was to tar
it into /dev/null
. But instead I see the following behavior:
$ time tar cf /dev/null .
real 0m4.387s
user 0m3.462s
sys 0m0.185s
$ time tar cf - . > /dev/null
real 0m3.130s
user 0m3.091s
sys 0m0.035s
$ time tar cf - . | cat > /dev/null
^C
real 10m32.985s
user 0m1.942s
sys 0m33.764s
The third command above was forcibly stopped by Ctrl+C after having run for quite long already. Moreover, while the first two commands were working, activity indicator of the storage device containing .
was nearly always idle. With the third command the indicator is constantly lit up, meaning extreme busyness.
So it seems that, when tar
is able to find out that its output file is /dev/null
, i.e. when /dev/null
is directly opened to have the file handle which tar
writes to, file body appears skipped. (Adding v
option to tar
does print all the files in the directory being tar
'red.)
So I wonder, why is this so? Is it some kind of optimization? If yes, then why would tar
even want to do such a dubious optimization for such a special case?
I'm using GNU tar 1.26 with glibc 2.27 on Linux 4.14.105 amd64.
tar null
I have a directory with over 400 GiB of data in it. I wanted to check that all the files can be read without errors, so a simple way I thought of was to tar
it into /dev/null
. But instead I see the following behavior:
$ time tar cf /dev/null .
real 0m4.387s
user 0m3.462s
sys 0m0.185s
$ time tar cf - . > /dev/null
real 0m3.130s
user 0m3.091s
sys 0m0.035s
$ time tar cf - . | cat > /dev/null
^C
real 10m32.985s
user 0m1.942s
sys 0m33.764s
The third command above was forcibly stopped by Ctrl+C after having run for quite long already. Moreover, while the first two commands were working, activity indicator of the storage device containing .
was nearly always idle. With the third command the indicator is constantly lit up, meaning extreme busyness.
So it seems that, when tar
is able to find out that its output file is /dev/null
, i.e. when /dev/null
is directly opened to have the file handle which tar
writes to, file body appears skipped. (Adding v
option to tar
does print all the files in the directory being tar
'red.)
So I wonder, why is this so? Is it some kind of optimization? If yes, then why would tar
even want to do such a dubious optimization for such a special case?
I'm using GNU tar 1.26 with glibc 2.27 on Linux 4.14.105 amd64.
tar null
tar null
asked Apr 14 at 6:27
RuslanRuslan
1,4811429
1,4811429
7
As a practical alternative, consider something likefind . -type f -exec shasum -a256 -b '' +
. Not only does it actually read and checksum all the data, but if you store the output, you can re-run it later to check that the content of the files hasn't changed.
– Ilmari Karonen
Apr 14 at 11:00
To measure things you can also usepv
:tar -cf - | pv >/dev/null
. That sidesteps the issue and gives you a progress information (the variouspv
options)
– xenoid
Apr 14 at 16:22
add a comment |
7
As a practical alternative, consider something likefind . -type f -exec shasum -a256 -b '' +
. Not only does it actually read and checksum all the data, but if you store the output, you can re-run it later to check that the content of the files hasn't changed.
– Ilmari Karonen
Apr 14 at 11:00
To measure things you can also usepv
:tar -cf - | pv >/dev/null
. That sidesteps the issue and gives you a progress information (the variouspv
options)
– xenoid
Apr 14 at 16:22
7
7
As a practical alternative, consider something like
find . -type f -exec shasum -a256 -b '' +
. Not only does it actually read and checksum all the data, but if you store the output, you can re-run it later to check that the content of the files hasn't changed.– Ilmari Karonen
Apr 14 at 11:00
As a practical alternative, consider something like
find . -type f -exec shasum -a256 -b '' +
. Not only does it actually read and checksum all the data, but if you store the output, you can re-run it later to check that the content of the files hasn't changed.– Ilmari Karonen
Apr 14 at 11:00
To measure things you can also use
pv
: tar -cf - | pv >/dev/null
. That sidesteps the issue and gives you a progress information (the various pv
options)– xenoid
Apr 14 at 16:22
To measure things you can also use
pv
: tar -cf - | pv >/dev/null
. That sidesteps the issue and gives you a progress information (the various pv
options)– xenoid
Apr 14 at 16:22
add a comment |
2 Answers
2
active
oldest
votes
It is a documented optimization:
When the archive is being created to
/dev/null
, GNU tar tries to
minimize input and output operations. The Amanda backup system, when
used with GNU tar, has an initial sizing pass which uses this feature.
4
Ah, this wasn't described in the man page I had installed. Should have triedinfo tar
instead...
– Ruslan
Apr 14 at 7:00
8
They should really keep the man & info pages in sync, it's practically a bug that they're not
– Xen2050
Apr 14 at 9:10
9
@Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available withinfo
or as HTML in a browser.
– Gilles
Apr 14 at 9:47
18
It's a recognized problem.
– Owen
Apr 14 at 11:26
add a comment |
This can happen with a variety of programs, for example, I had that behavior once when just using cp file /dev/null
; instead of getting an estimate of my disk read speed, the command returned after a few milliseconds.
As far as I remember, that was on Solaris or AIX, but the principle applies to all kinds of unix-y systems.
In the old times, when a program copied a file to somewhere, it'd alternate between read
calls that get some data from disk (or whatever the file descriptor is referring to) to memory (with a guarantee everything is there when read
returns) and write
calls (which take the chunk of memory and send the content to the destination).
However, there are at least two newer ways to achieve the same:
Linux has system calls
copy_file_range
(not portable to other unixes at all) andsendfile
(somewhat portable; originally intended to send a file to the network, but can use any destination now). They're intended to optimize transfers; if the program uses one of those, it's easily conceivable the kernel recognizes the target is/dev/null
and turns the system call into a no-opPrograms can use
mmap
to get the file contents instead ofread
, this basically means "make sure the data is there when I try to access that chunk of memory" instead of "make sure the data is there when the system call returns". So a program canmmap
the source file, then callwrite
on that chunk of mapped memory. However, as writing/dev/null
doesn't need to access the written data, the "make sure it's there" condition isn't ever triggered, resulting in the file not being read either.
Not sure if gnu tar uses any, and which, of these two mechanisms when it detects it's writing to /dev/null
, but they're the reason why any program, when used to check read-speeds, should be run with | cat > /dev/null
instead of > /dev/null
- and why | cat > /dev/null
should be avoided in all other cases.
I think the implication in the GNUtar
info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked withtar cf /dev/null foo*
on a couple files and yeah, justnewfstatat(..., AT_SYMLINK_NOFOLLOW)
system calls, not even anopen()
that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.
– Peter Cordes
Apr 14 at 16:22
Should the mmap explanation read "access the read data" instead of "access the written data?"
– Wayne Conrad
Apr 14 at 21:07
See alsosplice(2)
on Linux. Actually, replacing,cat > /dev/null
withpv -q > /dev/null
(which usessplice()
on Linux) would likely reduce the overhead. Ordd bs=65536 skip=9999999999 2> /dev/null
, orwc -c > /dev/null
ortail -c1 > /dev/null
...
– Stéphane Chazelas
2 days ago
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f512362%2fwhy-does-tar-appear-to-skip-file-contents-when-output-file-is-dev-null%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
It is a documented optimization:
When the archive is being created to
/dev/null
, GNU tar tries to
minimize input and output operations. The Amanda backup system, when
used with GNU tar, has an initial sizing pass which uses this feature.
4
Ah, this wasn't described in the man page I had installed. Should have triedinfo tar
instead...
– Ruslan
Apr 14 at 7:00
8
They should really keep the man & info pages in sync, it's practically a bug that they're not
– Xen2050
Apr 14 at 9:10
9
@Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available withinfo
or as HTML in a browser.
– Gilles
Apr 14 at 9:47
18
It's a recognized problem.
– Owen
Apr 14 at 11:26
add a comment |
It is a documented optimization:
When the archive is being created to
/dev/null
, GNU tar tries to
minimize input and output operations. The Amanda backup system, when
used with GNU tar, has an initial sizing pass which uses this feature.
4
Ah, this wasn't described in the man page I had installed. Should have triedinfo tar
instead...
– Ruslan
Apr 14 at 7:00
8
They should really keep the man & info pages in sync, it's practically a bug that they're not
– Xen2050
Apr 14 at 9:10
9
@Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available withinfo
or as HTML in a browser.
– Gilles
Apr 14 at 9:47
18
It's a recognized problem.
– Owen
Apr 14 at 11:26
add a comment |
It is a documented optimization:
When the archive is being created to
/dev/null
, GNU tar tries to
minimize input and output operations. The Amanda backup system, when
used with GNU tar, has an initial sizing pass which uses this feature.
It is a documented optimization:
When the archive is being created to
/dev/null
, GNU tar tries to
minimize input and output operations. The Amanda backup system, when
used with GNU tar, has an initial sizing pass which uses this feature.
answered Apr 14 at 6:45
murumuru
38k590166
38k590166
4
Ah, this wasn't described in the man page I had installed. Should have triedinfo tar
instead...
– Ruslan
Apr 14 at 7:00
8
They should really keep the man & info pages in sync, it's practically a bug that they're not
– Xen2050
Apr 14 at 9:10
9
@Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available withinfo
or as HTML in a browser.
– Gilles
Apr 14 at 9:47
18
It's a recognized problem.
– Owen
Apr 14 at 11:26
add a comment |
4
Ah, this wasn't described in the man page I had installed. Should have triedinfo tar
instead...
– Ruslan
Apr 14 at 7:00
8
They should really keep the man & info pages in sync, it's practically a bug that they're not
– Xen2050
Apr 14 at 9:10
9
@Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available withinfo
or as HTML in a browser.
– Gilles
Apr 14 at 9:47
18
It's a recognized problem.
– Owen
Apr 14 at 11:26
4
4
Ah, this wasn't described in the man page I had installed. Should have tried
info tar
instead...– Ruslan
Apr 14 at 7:00
Ah, this wasn't described in the man page I had installed. Should have tried
info tar
instead...– Ruslan
Apr 14 at 7:00
8
8
They should really keep the man & info pages in sync, it's practically a bug that they're not
– Xen2050
Apr 14 at 9:10
They should really keep the man & info pages in sync, it's practically a bug that they're not
– Xen2050
Apr 14 at 9:10
9
9
@Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available with
info
or as HTML in a browser.– Gilles
Apr 14 at 9:47
@Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available with
info
or as HTML in a browser.– Gilles
Apr 14 at 9:47
18
18
It's a recognized problem.
– Owen
Apr 14 at 11:26
It's a recognized problem.
– Owen
Apr 14 at 11:26
add a comment |
This can happen with a variety of programs, for example, I had that behavior once when just using cp file /dev/null
; instead of getting an estimate of my disk read speed, the command returned after a few milliseconds.
As far as I remember, that was on Solaris or AIX, but the principle applies to all kinds of unix-y systems.
In the old times, when a program copied a file to somewhere, it'd alternate between read
calls that get some data from disk (or whatever the file descriptor is referring to) to memory (with a guarantee everything is there when read
returns) and write
calls (which take the chunk of memory and send the content to the destination).
However, there are at least two newer ways to achieve the same:
Linux has system calls
copy_file_range
(not portable to other unixes at all) andsendfile
(somewhat portable; originally intended to send a file to the network, but can use any destination now). They're intended to optimize transfers; if the program uses one of those, it's easily conceivable the kernel recognizes the target is/dev/null
and turns the system call into a no-opPrograms can use
mmap
to get the file contents instead ofread
, this basically means "make sure the data is there when I try to access that chunk of memory" instead of "make sure the data is there when the system call returns". So a program canmmap
the source file, then callwrite
on that chunk of mapped memory. However, as writing/dev/null
doesn't need to access the written data, the "make sure it's there" condition isn't ever triggered, resulting in the file not being read either.
Not sure if gnu tar uses any, and which, of these two mechanisms when it detects it's writing to /dev/null
, but they're the reason why any program, when used to check read-speeds, should be run with | cat > /dev/null
instead of > /dev/null
- and why | cat > /dev/null
should be avoided in all other cases.
I think the implication in the GNUtar
info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked withtar cf /dev/null foo*
on a couple files and yeah, justnewfstatat(..., AT_SYMLINK_NOFOLLOW)
system calls, not even anopen()
that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.
– Peter Cordes
Apr 14 at 16:22
Should the mmap explanation read "access the read data" instead of "access the written data?"
– Wayne Conrad
Apr 14 at 21:07
See alsosplice(2)
on Linux. Actually, replacing,cat > /dev/null
withpv -q > /dev/null
(which usessplice()
on Linux) would likely reduce the overhead. Ordd bs=65536 skip=9999999999 2> /dev/null
, orwc -c > /dev/null
ortail -c1 > /dev/null
...
– Stéphane Chazelas
2 days ago
add a comment |
This can happen with a variety of programs, for example, I had that behavior once when just using cp file /dev/null
; instead of getting an estimate of my disk read speed, the command returned after a few milliseconds.
As far as I remember, that was on Solaris or AIX, but the principle applies to all kinds of unix-y systems.
In the old times, when a program copied a file to somewhere, it'd alternate between read
calls that get some data from disk (or whatever the file descriptor is referring to) to memory (with a guarantee everything is there when read
returns) and write
calls (which take the chunk of memory and send the content to the destination).
However, there are at least two newer ways to achieve the same:
Linux has system calls
copy_file_range
(not portable to other unixes at all) andsendfile
(somewhat portable; originally intended to send a file to the network, but can use any destination now). They're intended to optimize transfers; if the program uses one of those, it's easily conceivable the kernel recognizes the target is/dev/null
and turns the system call into a no-opPrograms can use
mmap
to get the file contents instead ofread
, this basically means "make sure the data is there when I try to access that chunk of memory" instead of "make sure the data is there when the system call returns". So a program canmmap
the source file, then callwrite
on that chunk of mapped memory. However, as writing/dev/null
doesn't need to access the written data, the "make sure it's there" condition isn't ever triggered, resulting in the file not being read either.
Not sure if gnu tar uses any, and which, of these two mechanisms when it detects it's writing to /dev/null
, but they're the reason why any program, when used to check read-speeds, should be run with | cat > /dev/null
instead of > /dev/null
- and why | cat > /dev/null
should be avoided in all other cases.
I think the implication in the GNUtar
info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked withtar cf /dev/null foo*
on a couple files and yeah, justnewfstatat(..., AT_SYMLINK_NOFOLLOW)
system calls, not even anopen()
that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.
– Peter Cordes
Apr 14 at 16:22
Should the mmap explanation read "access the read data" instead of "access the written data?"
– Wayne Conrad
Apr 14 at 21:07
See alsosplice(2)
on Linux. Actually, replacing,cat > /dev/null
withpv -q > /dev/null
(which usessplice()
on Linux) would likely reduce the overhead. Ordd bs=65536 skip=9999999999 2> /dev/null
, orwc -c > /dev/null
ortail -c1 > /dev/null
...
– Stéphane Chazelas
2 days ago
add a comment |
This can happen with a variety of programs, for example, I had that behavior once when just using cp file /dev/null
; instead of getting an estimate of my disk read speed, the command returned after a few milliseconds.
As far as I remember, that was on Solaris or AIX, but the principle applies to all kinds of unix-y systems.
In the old times, when a program copied a file to somewhere, it'd alternate between read
calls that get some data from disk (or whatever the file descriptor is referring to) to memory (with a guarantee everything is there when read
returns) and write
calls (which take the chunk of memory and send the content to the destination).
However, there are at least two newer ways to achieve the same:
Linux has system calls
copy_file_range
(not portable to other unixes at all) andsendfile
(somewhat portable; originally intended to send a file to the network, but can use any destination now). They're intended to optimize transfers; if the program uses one of those, it's easily conceivable the kernel recognizes the target is/dev/null
and turns the system call into a no-opPrograms can use
mmap
to get the file contents instead ofread
, this basically means "make sure the data is there when I try to access that chunk of memory" instead of "make sure the data is there when the system call returns". So a program canmmap
the source file, then callwrite
on that chunk of mapped memory. However, as writing/dev/null
doesn't need to access the written data, the "make sure it's there" condition isn't ever triggered, resulting in the file not being read either.
Not sure if gnu tar uses any, and which, of these two mechanisms when it detects it's writing to /dev/null
, but they're the reason why any program, when used to check read-speeds, should be run with | cat > /dev/null
instead of > /dev/null
- and why | cat > /dev/null
should be avoided in all other cases.
This can happen with a variety of programs, for example, I had that behavior once when just using cp file /dev/null
; instead of getting an estimate of my disk read speed, the command returned after a few milliseconds.
As far as I remember, that was on Solaris or AIX, but the principle applies to all kinds of unix-y systems.
In the old times, when a program copied a file to somewhere, it'd alternate between read
calls that get some data from disk (or whatever the file descriptor is referring to) to memory (with a guarantee everything is there when read
returns) and write
calls (which take the chunk of memory and send the content to the destination).
However, there are at least two newer ways to achieve the same:
Linux has system calls
copy_file_range
(not portable to other unixes at all) andsendfile
(somewhat portable; originally intended to send a file to the network, but can use any destination now). They're intended to optimize transfers; if the program uses one of those, it's easily conceivable the kernel recognizes the target is/dev/null
and turns the system call into a no-opPrograms can use
mmap
to get the file contents instead ofread
, this basically means "make sure the data is there when I try to access that chunk of memory" instead of "make sure the data is there when the system call returns". So a program canmmap
the source file, then callwrite
on that chunk of mapped memory. However, as writing/dev/null
doesn't need to access the written data, the "make sure it's there" condition isn't ever triggered, resulting in the file not being read either.
Not sure if gnu tar uses any, and which, of these two mechanisms when it detects it's writing to /dev/null
, but they're the reason why any program, when used to check read-speeds, should be run with | cat > /dev/null
instead of > /dev/null
- and why | cat > /dev/null
should be avoided in all other cases.
answered Apr 14 at 9:51
Guntram BlohmGuntram Blohm
28426
28426
I think the implication in the GNUtar
info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked withtar cf /dev/null foo*
on a couple files and yeah, justnewfstatat(..., AT_SYMLINK_NOFOLLOW)
system calls, not even anopen()
that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.
– Peter Cordes
Apr 14 at 16:22
Should the mmap explanation read "access the read data" instead of "access the written data?"
– Wayne Conrad
Apr 14 at 21:07
See alsosplice(2)
on Linux. Actually, replacing,cat > /dev/null
withpv -q > /dev/null
(which usessplice()
on Linux) would likely reduce the overhead. Ordd bs=65536 skip=9999999999 2> /dev/null
, orwc -c > /dev/null
ortail -c1 > /dev/null
...
– Stéphane Chazelas
2 days ago
add a comment |
I think the implication in the GNUtar
info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked withtar cf /dev/null foo*
on a couple files and yeah, justnewfstatat(..., AT_SYMLINK_NOFOLLOW)
system calls, not even anopen()
that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.
– Peter Cordes
Apr 14 at 16:22
Should the mmap explanation read "access the read data" instead of "access the written data?"
– Wayne Conrad
Apr 14 at 21:07
See alsosplice(2)
on Linux. Actually, replacing,cat > /dev/null
withpv -q > /dev/null
(which usessplice()
on Linux) would likely reduce the overhead. Ordd bs=65536 skip=9999999999 2> /dev/null
, orwc -c > /dev/null
ortail -c1 > /dev/null
...
– Stéphane Chazelas
2 days ago
I think the implication in the GNU
tar
info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked with tar cf /dev/null foo*
on a couple files and yeah, just newfstatat(..., AT_SYMLINK_NOFOLLOW)
system calls, not even an open()
that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.– Peter Cordes
Apr 14 at 16:22
I think the implication in the GNU
tar
info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked with tar cf /dev/null foo*
on a couple files and yeah, just newfstatat(..., AT_SYMLINK_NOFOLLOW)
system calls, not even an open()
that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.– Peter Cordes
Apr 14 at 16:22
Should the mmap explanation read "access the read data" instead of "access the written data?"
– Wayne Conrad
Apr 14 at 21:07
Should the mmap explanation read "access the read data" instead of "access the written data?"
– Wayne Conrad
Apr 14 at 21:07
See also
splice(2)
on Linux. Actually, replacing, cat > /dev/null
with pv -q > /dev/null
(which uses splice()
on Linux) would likely reduce the overhead. Or dd bs=65536 skip=9999999999 2> /dev/null
, or wc -c > /dev/null
or tail -c1 > /dev/null
...– Stéphane Chazelas
2 days ago
See also
splice(2)
on Linux. Actually, replacing, cat > /dev/null
with pv -q > /dev/null
(which uses splice()
on Linux) would likely reduce the overhead. Or dd bs=65536 skip=9999999999 2> /dev/null
, or wc -c > /dev/null
or tail -c1 > /dev/null
...– Stéphane Chazelas
2 days ago
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f512362%2fwhy-does-tar-appear-to-skip-file-contents-when-output-file-is-dev-null%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
7
As a practical alternative, consider something like
find . -type f -exec shasum -a256 -b '' +
. Not only does it actually read and checksum all the data, but if you store the output, you can re-run it later to check that the content of the files hasn't changed.– Ilmari Karonen
Apr 14 at 11:00
To measure things you can also use
pv
:tar -cf - | pv >/dev/null
. That sidesteps the issue and gives you a progress information (the variouspv
options)– xenoid
Apr 14 at 16:22