Extracting Files from a Gzipped Tar Archive using Go
Learn how to work with gzipped tar archives in Go. This guide walks you through a Go program that extracts and prints file contents from a .tgz file—ideal for managing compressed data. We’ll also show you how to create a .tgz archive to test the program.
Introduction
Working with compressed files is a common task in software development, especially when dealing with backups, file transfers, or package distributions.
In this post, we will learn how to read and extract files from a gzipped tar archive (.tgz
file) and output their content to the console.
Let’s get to it.
Create a .tar.gz
File
To start with, use the following steps to create a .tar.gz
(.tgz
) file. We will use this file to effectively test our Go code:
-
First, gather the files you wish to compress into a single directory. For this example, we’ll create a simple text file to include in the archive.
1 2 3
mkdir example_dir echo "This is a sample file" > example_dir/file1.txt echo "This is another sample file" > example_dir/file2.txt
-
Use the
tar
command to create a tar file (.tar
) from the directory. This command combines multiple files into a single archive without compression.1
tar -cvf example.tar -C example_dir .
-c
: Create a new archive.-v
: Verbosely list files as they are added to the archive.-f
: Specify the name of the archive file.-C
: Change to theexample_dir
directory.
-
Now, compress the tar file using
gzip
to create a.tar.gz
file.1
gzip example.tar
-
It’s good practice to verify the contents of the newly created archive. You can do so by listing the files in the
tar.gz
archive without extracting them:1
tar -tzf example.tar.gz
The Go Application
Importing Packages
We start off by importing the required packages:
|
|
main
Function
|
|
The program opens a gzipped tar file named example.tar.gz
in read-only mode. The deferred call to f.Close()
ensures the file is closed properly once it’s no longer needed or in case of an error.
Next, we creating a gzip
reader:
|
|
gzip.NewReader(f)
wraps the opened .tar.gz
file with a gzip.Reader
that will handle the decompression of the gzip stream. If an error occurs during this operation, the program will panic and terminate.
Creating a tar reader:
|
|
This creates a tar reader that reads from the decompressed stream, enabling successive extraction of each file within the tar archive.
Now we are ready to read files from the archive:
|
|
The loop iterates over the files in the tar archive, while tarReader.Next()
advances to the next file in the archive and retrieves the file’s header. The loop breaks when an io.EOF
is encountered, indicating the end of the archive.
As a final step, we want to filter and output the file content:
|
|
The above code ensures that files are processed only if they are regular files (tar.TypeReg
). AppleDouble resource fork files, often found in archives created on macOS (containing ._
), are ignored. The file content is read by io.ReadAll(tarReader)
and printed to the console using fmt.Println
.
Here is the complete code:
|
|
Run
Now, it’s time to put our application to test:
|
|
Congratulations!! 🥳
Conclusion
This Go program effectively demonstrates how to handle gzipped tar archives by performing decompression and extraction using the gzip
and tar
packages. Proper error handling ensures that the program handles potential issues gracefully, enhancing robustness.
The use of Go’s deferred function calls provides a clean and efficient way to manage resource cleanup, such as file closure. As a result, this script enables seamless integration into larger applications or serves as a foundation for more advanced archive manipulation functionalities.
Thanks for reading!! 🙇🏻♂️