Skip to content

Commit 85f90e4

Browse files
committed
added first pass on reading and writing files page
1 parent 3e51284 commit 85f90e4

File tree

2 files changed

+304
-0
lines changed

2 files changed

+304
-0
lines changed
Lines changed: 304 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,304 @@
1+
# Writing and Reading Files
2+
3+
:::note[Overview]
4+
Questions
5+
- How do I create/edit text files?
6+
- How do I move/copy/delete files?
7+
8+
Objectives
9+
- Learn to use the `nano` text editor.
10+
- Understand how to move, create, and delete files.
11+
:::
12+
13+
Now that we know how to move around and look at things, let’s learn how to read, write, and handle files! We’ll start by moving back to our home directory and creating a scratch directory:
14+
```bash
15+
$ cd ~
16+
$ mkdir hpc-test
17+
$ cd hpc-test
18+
```
19+
20+
## Creating and Editing Text Files
21+
When working on an HPC system, we will frequently need to create or edit text files. Text is one of the simplest computer file formats, defined as a simple sequence of text lines.
22+
23+
What if we want to make a file? There are a few ways of doing this, the easiest of which is simply using a text editor. For this lesson, we are going to use nano, since it’s more intuitive than many other terminal text editors.
24+
25+
To create or edit a file, type `nano <filename>`, on the terminal, where `<filename>` is the name of the file. If the file does not already exist, it will be created. Let’s make a new file now, type whatever you want in it, and save it.
26+
27+
```bash
28+
$ nano draft.txt
29+
```
30+
31+
![Nano in action](static/nano-screenshot.png)
32+
33+
Nano defines a number of _shortcut keys_ (prefixed by the `Control` or `Ctrl` key) to perform actions such as saving the file or exiting the editor. Here are the shortcut keys for a few common actions:
34+
35+
- `Ctrl`+`O` — save the file (into a current name or a new name).
36+
- `Ctrl`+`X` — exit the editor. If you have not saved your file upon exiting, `nano` will ask you if you want to save.
37+
- `Ctrl`+`K` — cut (“kill”) a text line. This command deletes a line and saves it on a clipboard. If repeated multiple times without any interruption (key typing or cursor movement), it will cut a chunk of text lines.
38+
- `Ctrl`+`U` — paste the cut text line (or lines). This command can be repeated to paste the same text elsewhere.
39+
40+
41+
| Option | Explanation |
42+
| ------------ | ---------------- |
43+
| `Ctrl` + `O` | Save the changes |
44+
| `Ctrl` + `X` | Exit nano |
45+
| `Ctrl` + `K` | Cut single line |
46+
| `Ctrl` + `U` | Paste the text |
47+
48+
49+
:::tip[Using `vim` as a text editor]
50+
From time to time, you may encounter the `vim` text editor. Although `vim` isn’t the easiest or most user-friendly of text editors, you’ll be able to find it on any system and it has many more features than `nano`.
51+
52+
`vim` has several modes, a “command” mode (for doing big operations, like saving and quitting) and an “insert” mode. You can switch to insert mode with the `i` key, and command mode with `Esc`.
53+
54+
In insert mode, you can type more or less normally. In command mode there are a few commands you should be aware of:
55+
56+
- `:q!` — quit, without saving
57+
- `:wq` — save and quit
58+
- `dd` — cut/delete a line
59+
- `y` — paste a line
60+
:::
61+
62+
Do a quick check to confirm our file was created.
63+
```bash
64+
$ ls
65+
draft.txt
66+
```
67+
68+
## Reading Files
69+
Let’s read the file we just created now. There are a few different ways of doing this, one of which is reading the entire file with `cat`.
70+
```bash
71+
$ cat draft.txt
72+
It's not "publish or perish" any more,
73+
it's "share and thrive".
74+
```
75+
76+
By default, `cat` prints out the content of the given file. Although `cat` may not seem like an intuitive command with which to read files, it stands for “concatenate”. Giving it multiple file names will print out the contents of the input files in the order specified in the `cat`’s invocation. For example:
77+
```bash
78+
$ cat draft.txt draft.txt
79+
It's not "publish or perish" any more,
80+
it's "share and thrive".
81+
It's not "publish or perish" any more,
82+
it's "share and thrive".
83+
```
84+
85+
:::tip[Reading Multiple Text Files]
86+
Create two more files using `nano`, giving them different names such as `chap1.txt` and `chap2.txt`. Then use a single `cat` command to read and print the contents of `draft.txt`, `chap1.txt`, and `chap2.txt`.
87+
:::
88+
89+
## Creating Directory
90+
We’ve successfully created a file. What about a directory? We’ve actually done this before, using `mkdir`.
91+
```bash
92+
$ mkdir files
93+
$ mkdir documents
94+
$ ls
95+
documents files draft.txt
96+
```
97+
98+
## Moving, Renaming, Copying Files
99+
**Moving** — We will move `draft.txt` to the `files` directory with `mv` (“move”) command. The same syntax works for both files and directories:<br />
100+
`mv <file/directory> <new-location>`
101+
102+
```bash
103+
$ mv draft.txt files
104+
$ cd files
105+
$ ls
106+
draft.txt
107+
```
108+
109+
| Command | Explanation |
110+
| --------------------------------- | -------------------------------------------------------------- |
111+
| `mv dummy_file.txt test_file.txt` | Renames dummy_file.txt as test_file.txt |
112+
| `mv subdir new_subdir` | Renames the directory “subdir” to a new directory “new_subdir” |
113+
114+
**Renaming**`draft.txt` isn’t a very descriptive name. How do we go about changing it? It turns out that `mv` is also used to rename files and directories. Although this may not seem intuitive at first, think of it as _moving_ a file to be stored under a different name. The syntax is quite similar to moving files: `mv oldName newName`
115+
```bash
116+
$ mv draft.txt newname.testfile
117+
$ ls
118+
newname.testfile
119+
```
120+
121+
:::tip[File extensions are arbitrary]
122+
In the last example, we changed both a file’s name and extension at the same time. On UNIX systems, file extensions (like `.txt`) are arbitrary. A file is a `.txt` file only because we say it is. Changing the name or extension of the file will never change a file’s contents, so you are free to rename things as you wish. With that in mind, however, file extensions are a useful tool for keeping track of what type of data it contains. A `.txt` file typically contains text, for instance.
123+
:::
124+
125+
**Copying** — What if we want to copy a file, instead of simply renaming or moving it? Use `cp` command (an abbreviated name for “copy”). This command has two different uses that work in the same way as `mv`:
126+
127+
- Copy to same directory (copied file is renamed): `cp file newFilename`
128+
- Copy to other directory (copied file retains original name): `cp file directory`
129+
130+
You can also combine these two operations in one command to copy a file to a different directory with a new name: `cp file directory/newFilename`
131+
132+
| Command | Explanation |
133+
| ---------------------------------- | ------------|
134+
| `cp test_file1.txt test_file2.txt` | Copies a duplicate copy of test_file1.txt with the new name test_file2.txt |
135+
| `cp -r subdir subdir2` | Recursively copies the directory “subdir” to a new directory “subdir2”. That is, a new directory “subdir2” is created, and each file and directory under “subdir” is replicated in “subdir2”. |
136+
137+
Let’s try this out:
138+
```bash
139+
$ cp newname.testfile copy.testfile
140+
$ ls
141+
newname.testfile copy.testfile
142+
$ cp newname.testfile ..
143+
$ cd ..
144+
$ ls
145+
files documents newname.testfile
146+
```
147+
148+
## Removing files
149+
We’ve begun to clutter up our workspace with all of the directories and files we’ve been making. Let’s learn how to get rid of them. One important note before we start… when you delete a file on UNIX systems, they are gone _forever_. There is no “recycle bin” or “trash”. Once a file is deleted, it is gone, never to return. So be ***very*** careful when deleting files.
150+
151+
Files are deleted with `rm file [moreFiles]`. To delete the `newname.testfile` in our current directory:
152+
```bash
153+
$ ls
154+
files documents newname.testfile
155+
$ rm newname.testfile
156+
$ ls
157+
files documents
158+
```
159+
160+
That was simple enough. Directories are deleted in a similar manner using `rmdir` if the directory is empty or `rm -r` (the `-r` option stands for ‘recursive’) if the directory has contents.
161+
```bash
162+
$ ls
163+
files documents
164+
$ rmdir documents
165+
$ rmdir files
166+
rmdir: failed to remove `files/': Directory not empty
167+
$ ls
168+
files
169+
$ rm -r files
170+
$ ls
171+
172+
```
173+
174+
| Command | Explanation |
175+
| ------------------------ | ------------- |
176+
| `rm dummy_file.txt` | Remove a file |
177+
| `rm -i dummy_file.txt` | If you use `-i` you will be prompted for confirmation before each file is deleted. |
178+
| `rm -f serious_file.txt` | Forcibly removes a file without asking, regardless of its permissions (provided you own the file). |
179+
| `rmdir subdir/` | Removes “subdir” if it is already empty. Otherwise, the command fails. |
180+
| `rm -r subdir/` | Recursively deletes the directory “subdir” and everything in it. **Use it with care!** |
181+
182+
What happened? As it turns out, `rmdir` is unable to remove directories that have stuff in them. To delete a directory and everything inside it, we will use a special variant of `rm`, `rm -rf` directory. This is probably the scariest command on UNIX- it will force delete a directory and all of its contents without prompting. **ALWAYS** double check your typing before using it… if you leave out the arguments, it will attempt to delete everything on your file system that you have permission to delete. So when deleting directories be _very, very_ careful.
183+
184+
:::danger[What happens when you use `rm -rf` accidentally]
185+
Steam is a major online sales platform for PC video games with over 125 million users. Despite this, it hasn’t always had the most stable or error-free code.
186+
187+
In January 2015, user kevyin on GitHub [reported that Steam’s Linux client had deleted every file on his computer](https://github.com/ValveSoftware/steam-for-linux/issues/3671). It turned out that one of the Steam programmers had added the following line: `rm -rf "$STEAMROOT/"*`. Due to the way that Steam was set up, the variable `$STEAMROOT` was never initialized, meaning the statement evaluated to `rm -rf /*`. This coding error in the Linux client meant that Steam deleted every single file on a computer when run in certain scenarios (including connected external hard drives). Moral of the story: be very careful when using `rm -rf`!
188+
:::
189+
190+
## Looking at files
191+
Sometimes it’s not practical to read an entire file with `cat`. The file might be way too large, take a long time to open, or maybe we want to only look at a certain part of the file. As an example, we are going to look at a large and complex file type used in bioinformatics, a `.gtf` file. The GTF2 format is commonly used to describe the location of genetic features in a genome.
192+
193+
Let’s grab and unpack a set of demo files for use later. To do this, we’ll use [`wget`](https://www.gnu.org/software/wget/).
194+
```bash
195+
$ wget https://nyuhpc.github.io/hpc-shell/files/bash-lesson.tar.gz
196+
```
197+
198+
:::warning[Problems with wget?]
199+
wget is a stand-alone application for downloading things over HTTP/HTTPS and FTP/FTPS connections, and it does the job admirably — when it is installed.
200+
201+
Some operating systems instead come with cURL, which is the command-line interface to libcurl, a powerful library for programming interactions with remote resources over a wide variety of network protocols. If you have curl but not wget, then try this command instead:
202+
```bash
203+
$ curl -O https://nyuhpc.github.io/hpc-shell/files/bash-lesson.tar.gz
204+
```
205+
For very large downloads, you might consider using Aria2, which has support for downloading the same file from multiple mirrors. You have to install it separately, but if you have it, try this to get it faster than your neighbors:
206+
```bash
207+
$ aria2c https://nyuhpc.github.io/hpc-shell/files/bash-lesson.tar.gz
208+
```
209+
<details>
210+
<summary>
211+
Install wget
212+
</summary>
213+
- **Linux**:
214+
- Debian, Ubuntu, Mint: `sudo apt install wget`
215+
- CentOS, Red Hat: `sudo yum install wget` or `zypper install wget`
216+
- Fedora: `sudo dnf install wget`
217+
- **macOS**: `brew install wget`
218+
- **Windows**:
219+
1. Download the Wget executable (wget.exe) from a reliable source.
220+
1. Place the `wget.exe` file in a directory that's included in your system's PATH environment variable (e.g., `C:\Windows\System32`).
221+
1. Open a command prompt and verify the installation by running `wget --version`
222+
</details>
223+
<details>
224+
<summary>
225+
Install cURL
226+
</summary>
227+
- **Linux**: curl is packaged for every major distribution. You can install it through the usual means.
228+
- Debian, Ubuntu, Mint: `sudo apt install curl`
229+
- CentOS, Red Hat: `sudo yum install curl` or `zypper install curl`
230+
- Fedora: `sudo dnf install curl`
231+
- **macOS**: curl is preinstalled on macOS. If you must have the latest version you can brew install it, but only do so if the stock version has failed you.
232+
- **Windows**:
233+
- curl comes preinstalled for the Windows 10 command line.
234+
- For earlier Windows systems, you can download the executable directly; run it in place.
235+
- curl comes preinstalled in Git for Windows and Windows Subsystem for Linux.
236+
- On Cygwin, run the setup program again and select the curl package to install it.
237+
</details>
238+
<details>
239+
<summary>
240+
Install Aria2
241+
</summary>
242+
- **Linux**: every major distribution has an aria2 package. Install it by the usual means.
243+
- Debian, Ubuntu, Mint: `sudo apt install aria2`
244+
- CentOS, Red Hat: `sudo yum install aria2` or `zypper install aria2`
245+
- Fedora: `sudo dnf install aria2`
246+
- **macOS**: aria2c is available through homebrew: `brew install aria2`
247+
- **Windows**: you have the following 2 options:
248+
- download the latest [release](https://github.com/aria2/aria2/releases) and run `aria2c` in place.
249+
- Use the [Windows Subsystem for Linux](https://learn.microsoft.com/en-us/windows/wsl/install)
250+
</details>
251+
:::
252+
253+
You’ll commonly encounter `.tar.gz` archives while working in UNIX. To extract the files from a `.tar.gz` file, we run the command `tar -xvf filename.tar.gz`:
254+
```bash
255+
$ tar -xvf bash-lesson.tar.gz
256+
dmel-all-r6.19.gtf
257+
dmel_unique_protein_isoforms_fb_2016_01.tsv
258+
gene_association.fb
259+
SRR307023_1.fastq
260+
SRR307023_2.fastq
261+
SRR307024_1.fastq
262+
SRR307024_2.fastq
263+
SRR307025_1.fastq
264+
SRR307025_2.fastq
265+
SRR307026_1.fastq
266+
SRR307026_2.fastq
267+
SRR307027_1.fastq
268+
SRR307027_2.fastq
269+
SRR307028_1.fastq
270+
SRR307028_2.fastq
271+
SRR307029_1.fastq
272+
SRR307029_2.fastq
273+
SRR307030_1.fastq
274+
SRR307030_2.fastq
275+
```
276+
277+
:::tip[Unzipping files]
278+
We just unzipped a .tar.gz file for this example. What if we run into other file formats that we need to unzip? Just use the handy reference below:
279+
280+
- `gunzip` extracts the contents of `.gz` files
281+
- `unzip` extracts the contents of `.zip` files
282+
- `tar -xvf` extracts the contents of `.tar.gz`, `.tgz` and `.tar.bz2` files
283+
:::
284+
285+
That is a lot of files! One of these files, `dmel-all-r6.19.gtf` is extremely large, and contains every annotated feature in the Drosophila melanogaster genome. It’s a huge file. What happens if we run `cat` on it? (Press `Ctrl` + `C` to stop it).
286+
287+
So, `cat` is a really bad option when reading big files… it scrolls through the entire file far too quickly! What are the alternatives? Try all of these out and see which ones you like best!
288+
289+
- `head file`: Print the top 10 lines in a file to the console. You can control the number of lines you see with the `-n numberOfLines` flag.
290+
- `tail file`: Same as `head`, but prints the last 10 lines in a file to the console.
291+
- `less file`: Opens a file and display as much as possible on-screen. You can scroll with `Enter` or the arrow keys on your keyboard. Press `q` to close the viewer.
292+
293+
Out of `cat`, `head`, `tail`, and `less`, which method of reading files is your favourite? Why?
294+
295+
:::tip[Key Points]
296+
- Use `nano` to create or edit text files from a terminal.
297+
- Use `cat file1 [file2 ...]` to print the contents of one or more files to the terminal.
298+
- Use `mv old dir` to move a file or directory `old` to another directory `dir`.
299+
- Use `mv old new` to rename a file or directory `old` to a `new` name.
300+
- Use `cp old new` to copy a file under a new name or location.
301+
- Use `cp old dir` copies a file `old` into a directory `dir`.
302+
- Use `rm old` to delete (remove) a file.
303+
- File extensions are entirely arbitrary on UNIX systems.
304+
:::
Loading

0 commit comments

Comments
 (0)