Contents

Extracting Files from a Gzipped Tar Archive using Go

Learn how to work with gzipped tar archives in Go. This guide walks you through a Go program that extracts and prints file contents from a .tgz file—ideal for managing compressed data. We’ll also show you how to create a .tgz archive to test the program.

Introduction

Working with compressed files is a common task in software development, especially when dealing with backups, file transfers, or package distributions.

In this post, we will learn how to read and extract files from a gzipped tar archive (.tgz file) and output their content to the console.

Let’s get to it.


Create a .tar.gz File

To start with, use the following steps to create a .tar.gz (.tgz) file. We will use this file to effectively test our Go code:

  • First, gather the files you wish to compress into a single directory. For this example, we’ll create a simple text file to include in the archive.

    1
    2
    3
    
    mkdir example_dir
    echo "This is a sample file" > example_dir/file1.txt
    echo "This is another sample file" > example_dir/file2.txt
    

  • Use the tar command to create a tar file (.tar) from the directory. This command combines multiple files into a single archive without compression.

    1
    
    tar -cvf example.tar -C example_dir .
    

    -c: Create a new archive.

    -v: Verbosely list files as they are added to the archive.

    -f: Specify the name of the archive file.

    -C: Change to the example_dir directory.


  • Now, compress the tar file using gzip to create a .tar.gz file.

    1
    
    gzip example.tar
    

  • It’s good practice to verify the contents of the newly created archive. You can do so by listing the files in the tar.gz archive without extracting them:

    1
    
    tar -tzf example.tar.gz
    

The Go Application

Importing Packages

We start off by importing the required packages:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// main.go

import (
  // Provides functionality for dealing with tar archives.
  "archive/tar"

  // Offers gzip compression and decompression facilities.
  "compress/gzip"

  // Used for formatting strings and outputting them.
  "fmt"

  // Contains fundamental interfaces for input/output operations.
  "io"

  // Provides simple logging capabilities.
  "log"

  // Offers platform-independent interface to operating system functionality.
  "os"

  // Provides functions for manipulating UTF-8 encoded strings.
  "strings"
)

main Function

1
2
3
4
5
6
func main() {
	f, err := os.OpenFile("example.tar.gz", os.O_RDONLY, 0o666)
	if err != nil {
		log.Fatalln(err)
	}
	defer f.Close()

The program opens a gzipped tar file named example.tar.gz in read-only mode. The deferred call to f.Close() ensures the file is closed properly once it’s no longer needed or in case of an error.


Next, we creating a gzip reader:

1
2
3
4
	gzipReader, err := gzip.NewReader(f)
	if err != nil {
		panic(err)
	}

gzip.NewReader(f) wraps the opened .tar.gz file with a gzip.Reader that will handle the decompression of the gzip stream. If an error occurs during this operation, the program will panic and terminate.


Creating a tar reader:

1
	tarReader := tar.NewReader(gzipReader)

This creates a tar reader that reads from the decompressed stream, enabling successive extraction of each file within the tar archive.


Now we are ready to read files from the archive:

1
2
3
4
5
6
7
8
9
	for {
		header, err := tarReader.Next()
		if err == io.EOF {
			break
		}

		if err != nil {
			log.Fatalln(err)
		}

The loop iterates over the files in the tar archive, while tarReader.Next() advances to the next file in the archive and retrieves the file’s header. The loop breaks when an io.EOF is encountered, indicating the end of the archive.


As a final step, we want to filter and output the file content:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
		if header.Typeflag == tar.TypeReg && !strings.Contains(header.Name, "._") {
			content, err := io.ReadAll(tarReader)
			if err != nil {
				log.Fatalln(err)
			}

			fmt.Println(string(content))
		}
	}
}

The above code ensures that files are processed only if they are regular files (tar.TypeReg). AppleDouble resource fork files, often found in archives created on macOS (containing ._), are ignored. The file content is read by io.ReadAll(tarReader) and printed to the console using fmt.Println.


Here is the complete code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
package main

import (
	"archive/tar"
	"compress/gzip"
	"fmt"
	"io"
	"log"
	"os"
	"strings"
)

func main() {
	f, err := os.OpenFile("example.tar.gz", os.O_RDONLY, 0o666)
	if err != nil {
		log.Fatalln(err)
	}
	defer f.Close()

	gzipReader, err := gzip.NewReader(f)
	if err != nil {
		panic(err)
	}

	tarReader := tar.NewReader(gzipReader)

	for {
		header, err := tarReader.Next()
		if err == io.EOF {
			break
		}

		if err != nil {
			log.Fatalln(err)
		}

		if header.Typeflag == tar.TypeReg && !strings.Contains(header.Name, "._") {
			content, err := io.ReadAll(tarReader)
			if err != nil {
				log.Fatalln(err)
			}

			fmt.Println("FILE_NAME: ", header.Name)
			fmt.Println(string(content))
		}
	}
}

Run

Now, it’s time to put our application to test:

1
2
3
4
5
6
➜ go run main.go
FILE_NAME:  ./file2.txt
This is another sample file

FILE_NAME:  ./file1.txt
This is a sample file

Congratulations!! 🥳


Conclusion

This Go program effectively demonstrates how to handle gzipped tar archives by performing decompression and extraction using the gzip and tar packages. Proper error handling ensures that the program handles potential issues gracefully, enhancing robustness.

The use of Go’s deferred function calls provides a clean and efficient way to manage resource cleanup, such as file closure. As a result, this script enables seamless integration into larger applications or serves as a foundation for more advanced archive manipulation functionalities.

Thanks for reading!! 🙇🏻‍♂️