Contents

Building a Rust Command-Line Utility - wc-rs

Delving into Rust’s capabilities, this post guides you through the process of crafting a command-line utility, wc-rs, mirroring the functionality of the classic wc tool with a modern twist.


Introduction

The motivation to build wc-rs comes from John Crickett’s build your own wc tool coding challenge. Solving these challenges is a great way of learning different concepts, in my opinion. So, here we are starting with our first one.

The challenge is to build your own version of the Unix command line tool wc. The functional requirements for wc are concisely described by it’s man page - give it a go in your local terminal now:

1
man wc

The TL/DR version is: wc – word, line, character, and byte count.

So, let’s get Rusty!!


Code Walkthrough

The wc-rs program is designed to be familiar to those who have used the original wc command, but under the hood, it leverages Rust’s advanced features for improved performance and reliability.

The complete code is available at gauravgahlot/getting-rustywc-rs.


Dependencies

1
use clap::Parser;

We use the clap crate to simplify command-line argument parsing, making it effortless to define and handle the flags and options our program accepts.


The CLI Struct

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#[derive(Parser)]
#[command(name = "wc-rs")]
#[command(version = "0.1.0")]
#[command(about="The wc-rs utility displays the count of lines, words, characters, and bytes contained in each input file", long_about=None)]
struct CLI {
    /// The number of bytes in each input file is written to the
    /// standard output. This will cancel out any prior usage of the
    /// -m option.
    #[arg(short = 'c')]
    bytes: bool,

    /// The number of lines in each input file is written to the
    /// standard output.
    #[arg(short)]
    lines: bool,

    /// The number of words in each input file is written to the
    /// standard output.
    #[arg(short)]
    words: bool,

    /// The number of characters in each input file is written to the
    /// standard output.
    #[arg(short = 'm')]
    chars: bool,
    files: Option<Vec<String>>,
}

Our CLI struct provides the skeleton for the command-line interface of wc-rs. With clap, beyond just defining the struct, we annotate it with information like the program’s name, version, and a brief description.

Flags and Options

Each field in the CLI struct represents a command-line flag or option, complete with descriptive comments that clap uses to generate help messages:

  • -c for byte count
  • -l for line count
  • -w for word count
  • -m for character count

The files field holds an optional list of files to process. If it’s empty, wc-rs reads from standard input.


The Output Struct

1
2
3
4
5
6
7
8
#[derive(Default)]
struct Output {
    bytes: u64,
    lines: u64,
    words: u64,
    chars: u64,
    file: Option<String>,
}

As we prepare to crunch numbers, we store our results in the Output struct. This is where the counts of bytes, lines, words, and characters will be accumulated, along with an optional filename for display purposes.


The Main Event

The main function is where we tie everything together. We parse the command-line arguments, iterate over files (or standard input), and calculate the statistics. We handle files and potential I/O errors gracefully, reflecting Rust’s commitment to safe and explicit error management.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
fn main() -> io::Result<()> {
    let cli = CLI::parse();
    let mut output: Vec<Output> = vec![];

    if let Some(files) = &cli.files {
        for file in files.iter() {
            let f = File::open(file)?;
            let reader = io::BufReader::new(f);

            let mut out = Output::new(file);
            out.bytes = fs::metadata(file)?.len();

            process_lines(reader, false, &mut out);
            output.push(out);
        }
    } else {
        let stdin = io::stdin();
        let reader = stdin.lock();
        let mut out = Output::default();

        process_lines(reader, true, &mut out);
        output.push(out);
    }

    print_output(&cli, output);

    Ok(())
}

Processing the Input

The process_lines function is at the heart of wc-rs. It takes a reader—anything that implements the BufRead trait—and an Output struct by mutable reference, updating the counts as it iterates over the lines in the text.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
fn process_lines<T: BufRead>(reader: T, from_stdin: bool, out: &mut Output) {
    for line in reader.lines() {
        match line {
            Ok(input) => {
                out.lines += 1;

                if from_stdin {
                    out.bytes += input.as_bytes().len() as u64;
                }

                let line_words: Vec<_> = input.split_terminator(" ").collect();
                out.words += line_words.len() as u64;

                line_words.iter().for_each(|w| out.chars += w.len() as u64);
            }
            Err(e) => eprintln!("{}", e),
        }
    }
}

We account for characters and bytes differently depending on whether we’re reading from a file or from standard input to ensure accuracy. Word counts are obtained by splitting lines on spaces, highlighting Rust’s iterator and collection capabilities.


Showing the Numbers

Finally, print_output is responsible for displaying the collected counts. Following the flags provided, it either prints specific stats or defaults to all counts if no flags are specified.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
fn print_output(cli: &CLI, output: Vec<Output>) {
    let mut print_all = false;
    let mut total = Output::default();

    for out in &output {
        if !cli.lines && !cli.words && !cli.chars && !cli.bytes {
            print_all = true;
            if let Some(f) = &out.file {
                print!("\t{}\t{}\t{}\t{}\n", out.lines, out.words, out.bytes, f);
            } else {
                print!("\t{}\t{}\t{}\n", out.lines, out.words, out.bytes);
            }
        } else {
            if cli.lines {
                print!("\t{}", out.lines);
            }
            if cli.words {
                print!("\t{}", out.words);
            }
            if cli.bytes {
                print!("\t{}", out.bytes);
            } else if cli.chars {
                print!("\t{}", out.chars);
            }
            if let Some(f) = &out.file {
                print!("\t{}\n", f);
            } else {
                println!();
            }
        }

        total.lines += out.lines;
        total.words += out.words;
        total.bytes += out.bytes;
        total.chars += out.chars;
    }

    if output.len() > 1 {
        if print_all {
            print!("\t{}\t{}\t{}\t{}\n", total.lines, total.words, total.bytes, "total");
        } else {
            if cli.lines {
                print!("\t{}", total.lines);
            }
            if cli.words {
                print!("\t{}", total.words);
            }
            if cli.bytes {
                print!("\t{}", total.bytes);
            } else if cli.chars {
                print!("\t{}", total.chars);
            }

            println!("\ttotal");
        }
    }
}

Getting wc-rs Up and Running

To try out wc-rs, you’ll compile and install it with cargo, Rust’s build system and package manager. Once installed, running the program is just like using the traditional wc.

Installation

You can install the CLI using the below command:

1
cargo install --path .

Help

By using the --help option you can obtain the usage help for the CLI:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
wc-rs --help
The wc-rs utility displays the count of lines, words, characters, and bytes contained in each input file

Usage: wc-rs [OPTIONS] [FILES]...

Arguments:
  [FILES]...

Options:
  -c             The number of bytes in each input file is written to the standard output. This will cancel out any prior usage of the -m option
  -l             The number of lines in each input file is written to the standard output
  -w             The number of words in each input file is written to the standard output
  -m             The number of characters in each input file is written to the standard output
  -h, --help     Print help
  -V, --version  Print version

Examples

  • Getting details of a single file:
1
2
3
wc-rs test.txt
        5       5       22      test.txt
      #lines  #words  #bytes
  • Getting details for multiple files:
1
2
3
4
wc-rs -wm test.txt Cargo.toml
        5       16      test.txt
        25      138     Cargo.toml
        30      154     total
  • Getting details for data from standard input
1
2
3
wc-rs
data from std input
        1       4       19
  • Display number of lines and characters only
1
2
wc-rs -lm test.txt
        5       16      test.txt

Conclusion

wc-rs might be a simple tool, but it embodies the elegance and robustness of Rust for command-line applications. Through an exploration of this utility, we’ve seen the power of meticulous error handling, the convenience of clap for argument parsing, and how straightforward it can be to work with files and strings in Rust. Whether you’re an experienced developer or new to the command line, wc-rs is a testament to Rust’s capability to reinvent classic tools with a modern and reliable twist.