Why does tar appear to skip file contents when output file is /dev/null? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern) 2019 Community Moderator Election Results Why I closed the “Why is Kali so hard” questionWhy is my tar file bigger than its contents?mv a file to /dev/null breaks dev/nulltar: /dev/st0: Cannot write: Input/output errorMeaning of `cat /dev/null > file`Output file generated by tarWhy does my tar not work?What happens when you write to /dev/null? What’s the point?Why is /dev/null a file? Why isn't its function implemented as a simple program?Make tar from /dev/stdin fileCompare file compression methods by sending to /dev/null

Can an alien society believe that their star system is the universe?

Is the Standard Deduction better than Itemized when both are the same amount?

Is grep documentation wrong?

Is it cost-effective to upgrade an old-ish Giant Escape R3 commuter bike with entry-level branded parts (wheels, drivetrain)?

Is there such thing as an Availability Group failover trigger?

Does classifying an integer as a discrete log require it be part of a multiplicative group?

How to tell that you are a giant?

First console to have temporary backward compatibility

Around usage results

How does the math work when buying airline miles?

Did MS DOS itself ever use blinking text?

Maximum summed powersets with non-adjacent items

Fantasy story; one type of magic grows in power with use, but the more powerful they are, they more they are drawn to travel to their source

What do you call the main part of a joke?

Why do we bend a book to keep it straight?

Crossing US/Canada Border for less than 24 hours

Trademark violation for app?

How to compare two different files line by line in unix?

また usage in a dictionary

What font is "z" in "z-score"?

Is there a kind of relay only consumes power when switching?

How to write this math term? with cases it isn't working

Chinese Seal on silk painting - what does it mean?

Is it common practice to audition new musicians one-on-one before rehearsing with the entire band?



Why does tar appear to skip file contents when output file is /dev/null?



Announcing the arrival of Valued Associate #679: Cesar Manara
Planned maintenance scheduled April 17/18, 2019 at 00:00UTC (8:00pm US/Eastern)
2019 Community Moderator Election Results
Why I closed the “Why is Kali so hard” questionWhy is my tar file bigger than its contents?mv a file to /dev/null breaks dev/nulltar: /dev/st0: Cannot write: Input/output errorMeaning of `cat /dev/null > file`Output file generated by tarWhy does my tar not work?What happens when you write to /dev/null? What’s the point?Why is /dev/null a file? Why isn't its function implemented as a simple program?Make tar from /dev/stdin fileCompare file compression methods by sending to /dev/null



.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








19















I have a directory with over 400 GiB of data in it. I wanted to check that all the files can be read without errors, so a simple way I thought of was to tar it into /dev/null. But instead I see the following behavior:



$ time tar cf /dev/null .

real 0m4.387s
user 0m3.462s
sys 0m0.185s
$ time tar cf - . > /dev/null

real 0m3.130s
user 0m3.091s
sys 0m0.035s
$ time tar cf - . | cat > /dev/null
^C

real 10m32.985s
user 0m1.942s
sys 0m33.764s


The third command above was forcibly stopped by Ctrl+C after having run for quite long already. Moreover, while the first two commands were working, activity indicator of the storage device containing . was nearly always idle. With the third command the indicator is constantly lit up, meaning extreme busyness.



So it seems that, when tar is able to find out that its output file is /dev/null, i.e. when /dev/null is directly opened to have the file handle which tar writes to, file body appears skipped. (Adding v option to tar does print all the files in the directory being tar'red.)



So I wonder, why is this so? Is it some kind of optimization? If yes, then why would tar even want to do such a dubious optimization for such a special case?



I'm using GNU tar 1.26 with glibc 2.27 on Linux 4.14.105 amd64.










share|improve this question

















  • 7





    As a practical alternative, consider something like find . -type f -exec shasum -a256 -b '' +. Not only does it actually read and checksum all the data, but if you store the output, you can re-run it later to check that the content of the files hasn't changed.

    – Ilmari Karonen
    Apr 14 at 11:00











  • To measure things you can also use pv: tar -cf - | pv >/dev/null. That sidesteps the issue and gives you a progress information (the various pv options)

    – xenoid
    Apr 14 at 16:22

















19















I have a directory with over 400 GiB of data in it. I wanted to check that all the files can be read without errors, so a simple way I thought of was to tar it into /dev/null. But instead I see the following behavior:



$ time tar cf /dev/null .

real 0m4.387s
user 0m3.462s
sys 0m0.185s
$ time tar cf - . > /dev/null

real 0m3.130s
user 0m3.091s
sys 0m0.035s
$ time tar cf - . | cat > /dev/null
^C

real 10m32.985s
user 0m1.942s
sys 0m33.764s


The third command above was forcibly stopped by Ctrl+C after having run for quite long already. Moreover, while the first two commands were working, activity indicator of the storage device containing . was nearly always idle. With the third command the indicator is constantly lit up, meaning extreme busyness.



So it seems that, when tar is able to find out that its output file is /dev/null, i.e. when /dev/null is directly opened to have the file handle which tar writes to, file body appears skipped. (Adding v option to tar does print all the files in the directory being tar'red.)



So I wonder, why is this so? Is it some kind of optimization? If yes, then why would tar even want to do such a dubious optimization for such a special case?



I'm using GNU tar 1.26 with glibc 2.27 on Linux 4.14.105 amd64.










share|improve this question

















  • 7





    As a practical alternative, consider something like find . -type f -exec shasum -a256 -b '' +. Not only does it actually read and checksum all the data, but if you store the output, you can re-run it later to check that the content of the files hasn't changed.

    – Ilmari Karonen
    Apr 14 at 11:00











  • To measure things you can also use pv: tar -cf - | pv >/dev/null. That sidesteps the issue and gives you a progress information (the various pv options)

    – xenoid
    Apr 14 at 16:22













19












19








19


2






I have a directory with over 400 GiB of data in it. I wanted to check that all the files can be read without errors, so a simple way I thought of was to tar it into /dev/null. But instead I see the following behavior:



$ time tar cf /dev/null .

real 0m4.387s
user 0m3.462s
sys 0m0.185s
$ time tar cf - . > /dev/null

real 0m3.130s
user 0m3.091s
sys 0m0.035s
$ time tar cf - . | cat > /dev/null
^C

real 10m32.985s
user 0m1.942s
sys 0m33.764s


The third command above was forcibly stopped by Ctrl+C after having run for quite long already. Moreover, while the first two commands were working, activity indicator of the storage device containing . was nearly always idle. With the third command the indicator is constantly lit up, meaning extreme busyness.



So it seems that, when tar is able to find out that its output file is /dev/null, i.e. when /dev/null is directly opened to have the file handle which tar writes to, file body appears skipped. (Adding v option to tar does print all the files in the directory being tar'red.)



So I wonder, why is this so? Is it some kind of optimization? If yes, then why would tar even want to do such a dubious optimization for such a special case?



I'm using GNU tar 1.26 with glibc 2.27 on Linux 4.14.105 amd64.










share|improve this question














I have a directory with over 400 GiB of data in it. I wanted to check that all the files can be read without errors, so a simple way I thought of was to tar it into /dev/null. But instead I see the following behavior:



$ time tar cf /dev/null .

real 0m4.387s
user 0m3.462s
sys 0m0.185s
$ time tar cf - . > /dev/null

real 0m3.130s
user 0m3.091s
sys 0m0.035s
$ time tar cf - . | cat > /dev/null
^C

real 10m32.985s
user 0m1.942s
sys 0m33.764s


The third command above was forcibly stopped by Ctrl+C after having run for quite long already. Moreover, while the first two commands were working, activity indicator of the storage device containing . was nearly always idle. With the third command the indicator is constantly lit up, meaning extreme busyness.



So it seems that, when tar is able to find out that its output file is /dev/null, i.e. when /dev/null is directly opened to have the file handle which tar writes to, file body appears skipped. (Adding v option to tar does print all the files in the directory being tar'red.)



So I wonder, why is this so? Is it some kind of optimization? If yes, then why would tar even want to do such a dubious optimization for such a special case?



I'm using GNU tar 1.26 with glibc 2.27 on Linux 4.14.105 amd64.







tar null






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Apr 14 at 6:27









RuslanRuslan

1,4811429




1,4811429







  • 7





    As a practical alternative, consider something like find . -type f -exec shasum -a256 -b '' +. Not only does it actually read and checksum all the data, but if you store the output, you can re-run it later to check that the content of the files hasn't changed.

    – Ilmari Karonen
    Apr 14 at 11:00











  • To measure things you can also use pv: tar -cf - | pv >/dev/null. That sidesteps the issue and gives you a progress information (the various pv options)

    – xenoid
    Apr 14 at 16:22












  • 7





    As a practical alternative, consider something like find . -type f -exec shasum -a256 -b '' +. Not only does it actually read and checksum all the data, but if you store the output, you can re-run it later to check that the content of the files hasn't changed.

    – Ilmari Karonen
    Apr 14 at 11:00











  • To measure things you can also use pv: tar -cf - | pv >/dev/null. That sidesteps the issue and gives you a progress information (the various pv options)

    – xenoid
    Apr 14 at 16:22







7




7





As a practical alternative, consider something like find . -type f -exec shasum -a256 -b '' +. Not only does it actually read and checksum all the data, but if you store the output, you can re-run it later to check that the content of the files hasn't changed.

– Ilmari Karonen
Apr 14 at 11:00





As a practical alternative, consider something like find . -type f -exec shasum -a256 -b '' +. Not only does it actually read and checksum all the data, but if you store the output, you can re-run it later to check that the content of the files hasn't changed.

– Ilmari Karonen
Apr 14 at 11:00













To measure things you can also use pv: tar -cf - | pv >/dev/null. That sidesteps the issue and gives you a progress information (the various pv options)

– xenoid
Apr 14 at 16:22





To measure things you can also use pv: tar -cf - | pv >/dev/null. That sidesteps the issue and gives you a progress information (the various pv options)

– xenoid
Apr 14 at 16:22










2 Answers
2






active

oldest

votes


















23














It is a documented optimization:




When the archive is being created to /dev/null, GNU tar tries to
minimize input and output operations. The Amanda backup system, when
used with GNU tar, has an initial sizing pass which uses this feature.







share|improve this answer


















  • 4





    Ah, this wasn't described in the man page I had installed. Should have tried info tar instead...

    – Ruslan
    Apr 14 at 7:00






  • 8





    They should really keep the man & info pages in sync, it's practically a bug that they're not

    – Xen2050
    Apr 14 at 9:10






  • 9





    @Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available with info or as HTML in a browser.

    – Gilles
    Apr 14 at 9:47






  • 18





    It's a recognized problem.

    – Owen
    Apr 14 at 11:26


















7














This can happen with a variety of programs, for example, I had that behavior once when just using cp file /dev/null; instead of getting an estimate of my disk read speed, the command returned after a few milliseconds.



As far as I remember, that was on Solaris or AIX, but the principle applies to all kinds of unix-y systems.



In the old times, when a program copied a file to somewhere, it'd alternate between read calls that get some data from disk (or whatever the file descriptor is referring to) to memory (with a guarantee everything is there when read returns) and write calls (which take the chunk of memory and send the content to the destination).



However, there are at least two newer ways to achieve the same:



  • Linux has system calls copy_file_range (not portable to other unixes at all) and sendfile (somewhat portable; originally intended to send a file to the network, but can use any destination now). They're intended to optimize transfers; if the program uses one of those, it's easily conceivable the kernel recognizes the target is /dev/null and turns the system call into a no-op


  • Programs can use mmap to get the file contents instead of read, this basically means "make sure the data is there when I try to access that chunk of memory" instead of "make sure the data is there when the system call returns". So a program can mmap the source file, then call write on that chunk of mapped memory. However, as writing /dev/null doesn't need to access the written data, the "make sure it's there" condition isn't ever triggered, resulting in the file not being read either.


Not sure if gnu tar uses any, and which, of these two mechanisms when it detects it's writing to /dev/null, but they're the reason why any program, when used to check read-speeds, should be run with | cat > /dev/null instead of > /dev/null - and why | cat > /dev/null should be avoided in all other cases.






share|improve this answer























  • I think the implication in the GNU tar info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked with tar cf /dev/null foo* on a couple files and yeah, just newfstatat(..., AT_SYMLINK_NOFOLLOW) system calls, not even an open() that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.

    – Peter Cordes
    Apr 14 at 16:22












  • Should the mmap explanation read "access the read data" instead of "access the written data?"

    – Wayne Conrad
    Apr 14 at 21:07











  • See also splice(2) on Linux. Actually, replacing, cat > /dev/null with pv -q > /dev/null (which uses splice() on Linux) would likely reduce the overhead. Or dd bs=65536 skip=9999999999 2> /dev/null, or wc -c > /dev/null or tail -c1 > /dev/null...

    – Stéphane Chazelas
    2 days ago











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f512362%2fwhy-does-tar-appear-to-skip-file-contents-when-output-file-is-dev-null%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









23














It is a documented optimization:




When the archive is being created to /dev/null, GNU tar tries to
minimize input and output operations. The Amanda backup system, when
used with GNU tar, has an initial sizing pass which uses this feature.







share|improve this answer


















  • 4





    Ah, this wasn't described in the man page I had installed. Should have tried info tar instead...

    – Ruslan
    Apr 14 at 7:00






  • 8





    They should really keep the man & info pages in sync, it's practically a bug that they're not

    – Xen2050
    Apr 14 at 9:10






  • 9





    @Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available with info or as HTML in a browser.

    – Gilles
    Apr 14 at 9:47






  • 18





    It's a recognized problem.

    – Owen
    Apr 14 at 11:26















23














It is a documented optimization:




When the archive is being created to /dev/null, GNU tar tries to
minimize input and output operations. The Amanda backup system, when
used with GNU tar, has an initial sizing pass which uses this feature.







share|improve this answer


















  • 4





    Ah, this wasn't described in the man page I had installed. Should have tried info tar instead...

    – Ruslan
    Apr 14 at 7:00






  • 8





    They should really keep the man & info pages in sync, it's practically a bug that they're not

    – Xen2050
    Apr 14 at 9:10






  • 9





    @Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available with info or as HTML in a browser.

    – Gilles
    Apr 14 at 9:47






  • 18





    It's a recognized problem.

    – Owen
    Apr 14 at 11:26













23












23








23







It is a documented optimization:




When the archive is being created to /dev/null, GNU tar tries to
minimize input and output operations. The Amanda backup system, when
used with GNU tar, has an initial sizing pass which uses this feature.







share|improve this answer













It is a documented optimization:




When the archive is being created to /dev/null, GNU tar tries to
minimize input and output operations. The Amanda backup system, when
used with GNU tar, has an initial sizing pass which uses this feature.








share|improve this answer












share|improve this answer



share|improve this answer










answered Apr 14 at 6:45









murumuru

38k590166




38k590166







  • 4





    Ah, this wasn't described in the man page I had installed. Should have tried info tar instead...

    – Ruslan
    Apr 14 at 7:00






  • 8





    They should really keep the man & info pages in sync, it's practically a bug that they're not

    – Xen2050
    Apr 14 at 9:10






  • 9





    @Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available with info or as HTML in a browser.

    – Gilles
    Apr 14 at 9:47






  • 18





    It's a recognized problem.

    – Owen
    Apr 14 at 11:26












  • 4





    Ah, this wasn't described in the man page I had installed. Should have tried info tar instead...

    – Ruslan
    Apr 14 at 7:00






  • 8





    They should really keep the man & info pages in sync, it's practically a bug that they're not

    – Xen2050
    Apr 14 at 9:10






  • 9





    @Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available with info or as HTML in a browser.

    – Gilles
    Apr 14 at 9:47






  • 18





    It's a recognized problem.

    – Owen
    Apr 14 at 11:26







4




4





Ah, this wasn't described in the man page I had installed. Should have tried info tar instead...

– Ruslan
Apr 14 at 7:00





Ah, this wasn't described in the man page I had installed. Should have tried info tar instead...

– Ruslan
Apr 14 at 7:00




8




8





They should really keep the man & info pages in sync, it's practically a bug that they're not

– Xen2050
Apr 14 at 9:10





They should really keep the man & info pages in sync, it's practically a bug that they're not

– Xen2050
Apr 14 at 9:10




9




9





@Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available with info or as HTML in a browser.

– Gilles
Apr 14 at 9:47





@Ruslan With most GNU utilities, the man page only contains a brief summary, basically only good enough when you remember that it has an option to do something but don't remember the option's name. The complete documentation is in a format that doesn't translate well to man pages and is available with info or as HTML in a browser.

– Gilles
Apr 14 at 9:47




18




18





It's a recognized problem.

– Owen
Apr 14 at 11:26





It's a recognized problem.

– Owen
Apr 14 at 11:26













7














This can happen with a variety of programs, for example, I had that behavior once when just using cp file /dev/null; instead of getting an estimate of my disk read speed, the command returned after a few milliseconds.



As far as I remember, that was on Solaris or AIX, but the principle applies to all kinds of unix-y systems.



In the old times, when a program copied a file to somewhere, it'd alternate between read calls that get some data from disk (or whatever the file descriptor is referring to) to memory (with a guarantee everything is there when read returns) and write calls (which take the chunk of memory and send the content to the destination).



However, there are at least two newer ways to achieve the same:



  • Linux has system calls copy_file_range (not portable to other unixes at all) and sendfile (somewhat portable; originally intended to send a file to the network, but can use any destination now). They're intended to optimize transfers; if the program uses one of those, it's easily conceivable the kernel recognizes the target is /dev/null and turns the system call into a no-op


  • Programs can use mmap to get the file contents instead of read, this basically means "make sure the data is there when I try to access that chunk of memory" instead of "make sure the data is there when the system call returns". So a program can mmap the source file, then call write on that chunk of mapped memory. However, as writing /dev/null doesn't need to access the written data, the "make sure it's there" condition isn't ever triggered, resulting in the file not being read either.


Not sure if gnu tar uses any, and which, of these two mechanisms when it detects it's writing to /dev/null, but they're the reason why any program, when used to check read-speeds, should be run with | cat > /dev/null instead of > /dev/null - and why | cat > /dev/null should be avoided in all other cases.






share|improve this answer























  • I think the implication in the GNU tar info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked with tar cf /dev/null foo* on a couple files and yeah, just newfstatat(..., AT_SYMLINK_NOFOLLOW) system calls, not even an open() that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.

    – Peter Cordes
    Apr 14 at 16:22












  • Should the mmap explanation read "access the read data" instead of "access the written data?"

    – Wayne Conrad
    Apr 14 at 21:07











  • See also splice(2) on Linux. Actually, replacing, cat > /dev/null with pv -q > /dev/null (which uses splice() on Linux) would likely reduce the overhead. Or dd bs=65536 skip=9999999999 2> /dev/null, or wc -c > /dev/null or tail -c1 > /dev/null...

    – Stéphane Chazelas
    2 days ago















7














This can happen with a variety of programs, for example, I had that behavior once when just using cp file /dev/null; instead of getting an estimate of my disk read speed, the command returned after a few milliseconds.



As far as I remember, that was on Solaris or AIX, but the principle applies to all kinds of unix-y systems.



In the old times, when a program copied a file to somewhere, it'd alternate between read calls that get some data from disk (or whatever the file descriptor is referring to) to memory (with a guarantee everything is there when read returns) and write calls (which take the chunk of memory and send the content to the destination).



However, there are at least two newer ways to achieve the same:



  • Linux has system calls copy_file_range (not portable to other unixes at all) and sendfile (somewhat portable; originally intended to send a file to the network, but can use any destination now). They're intended to optimize transfers; if the program uses one of those, it's easily conceivable the kernel recognizes the target is /dev/null and turns the system call into a no-op


  • Programs can use mmap to get the file contents instead of read, this basically means "make sure the data is there when I try to access that chunk of memory" instead of "make sure the data is there when the system call returns". So a program can mmap the source file, then call write on that chunk of mapped memory. However, as writing /dev/null doesn't need to access the written data, the "make sure it's there" condition isn't ever triggered, resulting in the file not being read either.


Not sure if gnu tar uses any, and which, of these two mechanisms when it detects it's writing to /dev/null, but they're the reason why any program, when used to check read-speeds, should be run with | cat > /dev/null instead of > /dev/null - and why | cat > /dev/null should be avoided in all other cases.






share|improve this answer























  • I think the implication in the GNU tar info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked with tar cf /dev/null foo* on a couple files and yeah, just newfstatat(..., AT_SYMLINK_NOFOLLOW) system calls, not even an open() that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.

    – Peter Cordes
    Apr 14 at 16:22












  • Should the mmap explanation read "access the read data" instead of "access the written data?"

    – Wayne Conrad
    Apr 14 at 21:07











  • See also splice(2) on Linux. Actually, replacing, cat > /dev/null with pv -q > /dev/null (which uses splice() on Linux) would likely reduce the overhead. Or dd bs=65536 skip=9999999999 2> /dev/null, or wc -c > /dev/null or tail -c1 > /dev/null...

    – Stéphane Chazelas
    2 days ago













7












7








7







This can happen with a variety of programs, for example, I had that behavior once when just using cp file /dev/null; instead of getting an estimate of my disk read speed, the command returned after a few milliseconds.



As far as I remember, that was on Solaris or AIX, but the principle applies to all kinds of unix-y systems.



In the old times, when a program copied a file to somewhere, it'd alternate between read calls that get some data from disk (or whatever the file descriptor is referring to) to memory (with a guarantee everything is there when read returns) and write calls (which take the chunk of memory and send the content to the destination).



However, there are at least two newer ways to achieve the same:



  • Linux has system calls copy_file_range (not portable to other unixes at all) and sendfile (somewhat portable; originally intended to send a file to the network, but can use any destination now). They're intended to optimize transfers; if the program uses one of those, it's easily conceivable the kernel recognizes the target is /dev/null and turns the system call into a no-op


  • Programs can use mmap to get the file contents instead of read, this basically means "make sure the data is there when I try to access that chunk of memory" instead of "make sure the data is there when the system call returns". So a program can mmap the source file, then call write on that chunk of mapped memory. However, as writing /dev/null doesn't need to access the written data, the "make sure it's there" condition isn't ever triggered, resulting in the file not being read either.


Not sure if gnu tar uses any, and which, of these two mechanisms when it detects it's writing to /dev/null, but they're the reason why any program, when used to check read-speeds, should be run with | cat > /dev/null instead of > /dev/null - and why | cat > /dev/null should be avoided in all other cases.






share|improve this answer













This can happen with a variety of programs, for example, I had that behavior once when just using cp file /dev/null; instead of getting an estimate of my disk read speed, the command returned after a few milliseconds.



As far as I remember, that was on Solaris or AIX, but the principle applies to all kinds of unix-y systems.



In the old times, when a program copied a file to somewhere, it'd alternate between read calls that get some data from disk (or whatever the file descriptor is referring to) to memory (with a guarantee everything is there when read returns) and write calls (which take the chunk of memory and send the content to the destination).



However, there are at least two newer ways to achieve the same:



  • Linux has system calls copy_file_range (not portable to other unixes at all) and sendfile (somewhat portable; originally intended to send a file to the network, but can use any destination now). They're intended to optimize transfers; if the program uses one of those, it's easily conceivable the kernel recognizes the target is /dev/null and turns the system call into a no-op


  • Programs can use mmap to get the file contents instead of read, this basically means "make sure the data is there when I try to access that chunk of memory" instead of "make sure the data is there when the system call returns". So a program can mmap the source file, then call write on that chunk of mapped memory. However, as writing /dev/null doesn't need to access the written data, the "make sure it's there" condition isn't ever triggered, resulting in the file not being read either.


Not sure if gnu tar uses any, and which, of these two mechanisms when it detects it's writing to /dev/null, but they're the reason why any program, when used to check read-speeds, should be run with | cat > /dev/null instead of > /dev/null - and why | cat > /dev/null should be avoided in all other cases.







share|improve this answer












share|improve this answer



share|improve this answer










answered Apr 14 at 9:51









Guntram BlohmGuntram Blohm

28426




28426












  • I think the implication in the GNU tar info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked with tar cf /dev/null foo* on a couple files and yeah, just newfstatat(..., AT_SYMLINK_NOFOLLOW) system calls, not even an open() that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.

    – Peter Cordes
    Apr 14 at 16:22












  • Should the mmap explanation read "access the read data" instead of "access the written data?"

    – Wayne Conrad
    Apr 14 at 21:07











  • See also splice(2) on Linux. Actually, replacing, cat > /dev/null with pv -q > /dev/null (which uses splice() on Linux) would likely reduce the overhead. Or dd bs=65536 skip=9999999999 2> /dev/null, or wc -c > /dev/null or tail -c1 > /dev/null...

    – Stéphane Chazelas
    2 days ago

















  • I think the implication in the GNU tar info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked with tar cf /dev/null foo* on a couple files and yeah, just newfstatat(..., AT_SYMLINK_NOFOLLOW) system calls, not even an open() that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.

    – Peter Cordes
    Apr 14 at 16:22












  • Should the mmap explanation read "access the read data" instead of "access the written data?"

    – Wayne Conrad
    Apr 14 at 21:07











  • See also splice(2) on Linux. Actually, replacing, cat > /dev/null with pv -q > /dev/null (which uses splice() on Linux) would likely reduce the overhead. Or dd bs=65536 skip=9999999999 2> /dev/null, or wc -c > /dev/null or tail -c1 > /dev/null...

    – Stéphane Chazelas
    2 days ago
















I think the implication in the GNU tar info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked with tar cf /dev/null foo* on a couple files and yeah, just newfstatat(..., AT_SYMLINK_NOFOLLOW) system calls, not even an open() that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.

– Peter Cordes
Apr 14 at 16:22






I think the implication in the GNU tar info page (see other answer) is that it has a special mode for this, which presumably just stats files without opening them. In fact I just checked with tar cf /dev/null foo* on a couple files and yeah, just newfstatat(..., AT_SYMLINK_NOFOLLOW) system calls, not even an open() that might update the atime. But +1 for describing mechanisms where this can happen without having to specially detect it.

– Peter Cordes
Apr 14 at 16:22














Should the mmap explanation read "access the read data" instead of "access the written data?"

– Wayne Conrad
Apr 14 at 21:07





Should the mmap explanation read "access the read data" instead of "access the written data?"

– Wayne Conrad
Apr 14 at 21:07













See also splice(2) on Linux. Actually, replacing, cat > /dev/null with pv -q > /dev/null (which uses splice() on Linux) would likely reduce the overhead. Or dd bs=65536 skip=9999999999 2> /dev/null, or wc -c > /dev/null or tail -c1 > /dev/null...

– Stéphane Chazelas
2 days ago





See also splice(2) on Linux. Actually, replacing, cat > /dev/null with pv -q > /dev/null (which uses splice() on Linux) would likely reduce the overhead. Or dd bs=65536 skip=9999999999 2> /dev/null, or wc -c > /dev/null or tail -c1 > /dev/null...

– Stéphane Chazelas
2 days ago

















draft saved

draft discarded
















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f512362%2fwhy-does-tar-appear-to-skip-file-contents-when-output-file-is-dev-null%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Sum ergo cogito? 1 nng

三茅街道4182Guuntc Dn precexpngmageondP