Skip to content

Commit 043f5d0

Browse files
RobJYs-sajid-ali
authored andcommitted
first pass of last section of first tutorial
1 parent ea68ef8 commit 043f5d0

File tree

1 file changed

+393
-0
lines changed

1 file changed

+393
-0
lines changed
Lines changed: 393 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,393 @@
1+
import { Smile } from "lucide-react";
2+
3+
# Scripts, variables, and loops
4+
5+
:::note[Overview]
6+
Questions
7+
- How do I turn a set of commands into a program?
8+
9+
Objectives
10+
- Write a shell script
11+
- Understand and manipulate UNIX permissions
12+
- Understand shell variables and how to use them
13+
- Write a simple `for` loop.
14+
:::
15+
16+
We now know a lot of UNIX commands! Wouldn’t it be great if we could save certain commands so that we could run them later or not have to type them out again? As it turns out, this is straightforward to do. A “shell script” is essentially a text file containing a list of UNIX commands to be executed in a sequential manner. These shell scripts can be run whenever we want, and are a great way to automate our work.
17+
18+
## Writing a Script
19+
So how do we write a shell script, exactly? It turns out we can do this with a text editor. Start editing a file called `demo.sh` (to recap, we can do this with `nano demo.sh`). The `.sh` is the standard file extension for shell scripts that most people use (you may also see `.bash` used).
20+
21+
Our shell script will have two parts:
22+
23+
- On the very first line, add `#!/bin/bash`. The `#!` (pronounced “hash-bang”) tells our computer what program to run our script with. In this case, we are telling it to run our script with our command-line shell (what we’ve been doing everything in so far). If we wanted our script to be run with something else, like Perl, we could use `#!/usr/bin/perl` instead.
24+
- Now, anywhere below the first line, add `echo "Our script worked!"`. When our script runs, `echo` will happily print out "Our script worked!".
25+
26+
Our file should now look like this:
27+
```bash
28+
#!/bin/bash
29+
30+
echo "Our script worked!"
31+
```
32+
33+
Ready to run our program? Let’s try running it:
34+
```bash
35+
$ demo.sh
36+
bash: demo.sh: command not found...
37+
```
38+
39+
Strangely enough, Bash can’t find our script. As it turns out, Bash will only look in certain directories for scripts to run. To run anything else, we need to tell Bash exactly where to look. To run a script that we wrote ourselves, we need to specify the full path to the file, followed by the filename. We could do this one of two ways:
40+
- with our absolute path /home/yourUserName/demo.sh
41+
- with the relative path ./demo.sh
42+
43+
```bash
44+
$ ./demo.sh
45+
bash: ./demo.sh: Permission denied
46+
```
47+
There’s one last thing we need to do. Before a file can be run, it needs 'permission' to run. We'll get a better understanding of Linux file permissions in the next section that will allow us to finally run our script.
48+
49+
## Permissions
50+
Let’s look at our file’s permissions with `ls -l`:
51+
52+
```bash
53+
$ ls -l
54+
-rw-rw-r-- 1 yourUsername tc001 12534006 Jan 16 18:50 bash-lesson.tar.gz
55+
-rw-rw-r-- 1 yourUsername tc001 40 Jan 16 19:41 demo.sh
56+
-rw-rw-r-- 1 yourUsername tc001 77426528 Jan 16 18:50 dmel-all-r6.19.gtf
57+
-rw-r--r-- 1 yourUsername tc001 721242 Jan 25 2016 dmel_unique_protein_is...
58+
drwxrwxr-x 2 yourUsername tc001 4096 Jan 16 19:16 fastq
59+
-rw-r--r-- 1 yourUsername tc001 1830516 Jan 25 2016 gene_association.fb.gz
60+
-rw-rw-r-- 1 yourUsername tc001 15 Jan 16 19:17 test.txt
61+
-rw-rw-r-- 1 yourUsername tc001 245 Jan 16 19:24 word_counts.txt
62+
```
63+
64+
That’s a huge amount of output: a full listing of everything in the directory. Let’s see if we can understand what each field of a given row represents, working from the left to right.
65+
66+
### Column 1: File/Directory Permissions
67+
This column contains a block of subcolumns that define the permissions for a file or directory given in each row. The permissions are shown for three user types to perform three actions each.
68+
69+
The user types are:
70+
- user (`u`): This refers to your permissions for this file/directory.
71+
- group (`g`): This refers to the permissions for people in the same group as this file/directory. You will see the group in the 4th column.
72+
- other (`o`): This refers to the permissions for all other users.
73+
74+
The actions are:
75+
- read (`r`): This refers to the permission to read this file.
76+
- write (`w`): This refers to the permission to write to this file.
77+
- execute (`x`): This refers to the permission to execute this file.
78+
79+
The following table show what each of the subcolumns refer to and their possible values:
80+
81+
| directory | user read | user write | user execute | group read | group write | group execute | other read | other write | other execute |
82+
| ---------- | ---------- | ---------- | ------------ | ---------- | ----------- | ------------- | ---------- | ----------- | ------------- |
83+
| `d` or `-` | `r` or `-` | `w` or `-` | `x` or `-` | `r` or `-` | `w` or `-` | `x` or `-` | `r` or `-` | `w` or `-` | `x` or `-` |
84+
85+
If there is a `-` in the directory column, the row refers to a file. If it contains a `d`, the row refers to a directory. The following columns behave in a similar manner. If they contain a `-`, the associated action is not allowed for the associated user type.
86+
87+
### Column 2: References
88+
This counts the number of references (hard links) to the item (file, folder, symbolic link or “shortcut”).
89+
90+
### Column 3: Owner
91+
This is the username of the user who owns the file.
92+
:::note
93+
Their permissions are indicated in the first permissions block of three after the directory column
94+
:::
95+
96+
### Column 4: Group
97+
Each user has a primary group and is optionally a member of other groups. When a user creates a file, it is normally associated with the user’s primary group. At NYU HPC, all users are in a group named ‘users’, so group permission has little meaning.
98+
:::note
99+
Other members of this group have the permissions in the second block of three after the directory column
100+
:::
101+
102+
### Column 5: Size of item
103+
This is the number of bytes in a file, or the number of filesystem blocks occupied by the contents of a folder.
104+
:::note
105+
We can use the `-h` option here to get a human-readable file size in megabytes, gigabytes, etc.
106+
:::
107+
108+
### Column 6: Time last modified
109+
This is the last time the file was modified.
110+
111+
### Column 7: Filename
112+
This is the name of the file/directory.
113+
114+
## Changing Permissions
115+
As previously mentioned, in Unix a file has three basic permissions, each of which can be set for three types of user. Those three permission also have a numeric value:
116+
117+
- Read permission (“r”) - numeric value 4.
118+
- Write permission (“w”) - numeric value 2.
119+
- Execute permission (“x”) - numeric value 1.
120+
:::note
121+
When applied to a directory, execute permission refers to whether the directory can be entered with `cd`.
122+
:::
123+
124+
You'll need to use the `chmod` command to modify permissions. You grant permissions with `chmod who+what file` and revoke them with `chmod who-what file`. (Notice that the first has `+` and the second `-`). Here, “who” is some combination of “u”, “g”, and “o”, and “what” is some combination of “r”, “w”, and “x”. Leaving out the `who` part of the command applies it to all user types.
125+
126+
So, to set execute permission we use:
127+
```bash
128+
$ chmod +x demo.sh
129+
$ ls -l
130+
-rw-rw-r-- 1 yourUsername tc001 12534006 Jan 16 18:50 bash-lesson.tar.gz
131+
-rwxrwxr-x 1 yourUsername tc001 40 Jan 16 19:41 demo.sh
132+
-rw-rw-r-- 1 yourUsername tc001 77426528 Jan 16 18:50 dmel-all-r6.19.gtf
133+
-rw-r--r-- 1 yourUsername tc001 721242 Jan 25 2016 dmel_unique_protein_is...
134+
drwxrwxr-x 2 yourUsername tc001 4096 Jan 16 19:16 fastq
135+
-rw-r--r-- 1 yourUsername tc001 1830516 Jan 25 2016 gene_association.fb.gz
136+
-rw-rw-r-- 1 yourUsername tc001 15 Jan 16 19:17 test.txt
137+
-rw-rw-r-- 1 yourUsername tc001 245 Jan 16 19:24 word_counts.txt
138+
```
139+
140+
## Executing Script
141+
Now that we have executable permissions for that file, we can run it.
142+
```bash
143+
$ ./demo.sh
144+
```
145+
146+
Our script worked! Fantastic, we’ve written our first program!
147+
148+
## Comments
149+
Before we go any further, let’s learn how to take notes inside our program using comments. A comment is indicated by the `#` character, followed by whatever we want. Comments do not get run. Let’s try out some comments in the console, then add one to our script!
150+
151+
```bash
152+
# This won't show anything.
153+
```
154+
155+
Now let's try adding this to our script with nano. Edit your script to look something like this:
156+
```bash
157+
#!/bin/bash
158+
159+
# This is a comment... they are nice for making notes!
160+
echo "Our script worked!"
161+
```
162+
163+
When we run our script, the output should be unchanged from before!
164+
165+
## Shell variables
166+
One important concept that we’ll need to cover are shell variables. Variables are a great way of saving information under a name you can access later. In programming languages like Python and R, variables can store pretty much anything you can think of. In the shell, they usually just store text. The best way to understand how they work is to see them in action.
167+
168+
To set a variable, simply type in a name containing only letters, numbers, and underscores, followed by an `=` and whatever you want to put in the variable. Shell variable names are often uppercase by convention (but do not have to be).
169+
```bash
170+
$ VAR="This is our variable"
171+
```
172+
To use a variable, prefix its name with a `$` sign. Note that if we want to simply check what a variable is, we should use `echo` (or else the shell will try to run the contents of a variable).
173+
```bash
174+
$ echo $VAR
175+
This is our variable
176+
```
177+
Let’s try setting a variable in our script and then recalling its value as part of a command. We’re going to make it so our script runs `wc -l` on whichever file we specify with `FILE`.
178+
179+
Our script:
180+
```bash
181+
#!/bin/bash
182+
183+
# set our variable to the name of our GTF file
184+
FILE=dmel-all-r6.19.gtf
185+
186+
# call wc -l on our file
187+
wc -l $FILE
188+
```
189+
```bash
190+
$ ./demo.sh
191+
542048 dmel-all-r6.19.gtf
192+
```
193+
194+
What if we wanted to do our little `wc -l` script on other files without having to change `$FILE` every time we want to use it? There is actually a special shell variable we can use in scripts that allows us to use arguments in our scripts (arguments are extra information that we can pass to our script, like the `-l` in `wc -l`).
195+
196+
To use the first argument to a script, use `$1` (the second argument is `$2`, and so on). Let’s change our script to run `wc -l` on `$1` instead of `$FILE`. Note that we can also pass all of the arguments using `$@` (not going to use it in this lesson, but it’s something to be aware of).
197+
198+
Our script:
199+
```bash
200+
#!/bin/bash
201+
202+
# call wc -l on our first argument
203+
wc -l $1
204+
```
205+
```bash
206+
$ ./demo.sh dmel_unique_protein_isoforms_fb_2016_01.tsv
207+
22129 dmel_unique_protein_isoforms_fb_2016_01.tsv
208+
```
209+
210+
Nice! One thing to be aware of when using variables: they are all treated as pure text. How do we save the output of an actual command like `ls -l`?
211+
212+
First, a demonstration of what doesn’t work:
213+
```bash
214+
$ TEST=ls -l
215+
-bash: -l: command not found
216+
```
217+
218+
What does work? We need to surround any command with `$(command)`:
219+
```bash
220+
$ TEST=$(ls -l)
221+
$ echo $TEST
222+
total 90372 -rw-rw-r-- 1 jeff jeff 12534006 Jan 16 18:50 bash-lesson.tar.gz -rwxrwxr-x. 1 jeff jeff 40 Jan 1619:41 demo.sh -rw-rw-r-- 1 jeff jeff 77426528 Jan 16 18:50 dmel-all-r6.19.gtf -rw-r--r-- 1 jeff jeff 721242 Jan 25 2016 dmel_unique_protein_isoforms_fb_2016_01.tsv drwxrwxr-x. 2 jeff jeff 4096 Jan 16 19:16 fastq -rw-r--r-- 1 jeff jeff 1830516 Jan 25 2016 gene_association.fb.gz -rw-rw-r-- 1 jeff jeff 15 Jan 16 19:17 test.txt -rw-rw-r-- 1 jeff jeff 245 Jan 16 19:24 word_counts.txt
223+
```
224+
:::note
225+
Everything got printed on the same line. This is a feature, not a bug, as it allows us to use $(commands) inside lines of script without triggering line breaks (which would end our line of code and execute it prematurely).
226+
:::
227+
228+
## Loops
229+
To end our lesson on scripts, we are going to learn how to write a for-loop to execute a lot of commands at once. This will let us do the same string of commands on every file in a directory (or other stuff of that nature).
230+
231+
for-loops generally have the following syntax:
232+
```bash
233+
#!/bin/bash
234+
235+
for VAR in first second third
236+
do
237+
echo $VAR
238+
done
239+
```
240+
241+
When a for-loop gets run, the loop will run once for everything following the word `in`. In each iteration, the variable `$VAR` is set to a particular value for that iteration. In this case it will be set to `first` during the first iteration, `second` on the second, and so on. During each iteration, the code between `do` and `done` is performed.
242+
243+
Let’s run the script we just wrote (I saved mine as `loop.sh`).
244+
```bash
245+
$ chmod +x loop.sh
246+
$ ./loop.sh
247+
first
248+
second
249+
third
250+
```
251+
252+
What if we wanted to loop over a shell variable, such as every file in the current directory? Shell variables work perfectly in for-loops. In this example, we’ll save the result of `ls` and loop over each file:
253+
```bash
254+
#!/bin/bash
255+
256+
FILES=$(ls)
257+
for VAR in $FILES
258+
do
259+
echo $VAR
260+
done
261+
```
262+
```bash
263+
$ ./loop.sh
264+
bash-lesson.tar.gz
265+
demo.sh
266+
dmel_unique_protein_isoforms_fb_2016_01.tsv
267+
dmel-all-r6.19.gtf
268+
fastq
269+
gene_association.fb.gz
270+
loop.sh
271+
test.txt
272+
word_counts.txt
273+
```
274+
275+
There’s a shortcut to run on all files of a particular type, say all `.gz` files:
276+
```bash
277+
#!/bin/bash
278+
279+
for VAR in *.gz
280+
do
281+
echo $VAR
282+
done
283+
```
284+
```bash
285+
bash-lesson.tar.gz
286+
gene_association.fb.gz
287+
```
288+
<details>
289+
<summary>
290+
:::info[Writing our own scripts and loops]
291+
`cd` to our `fastq` directory from earlier and write a loop to print off the name and top 4 lines of every fastq file in that directory.
292+
293+
Is there a way to only run the loop on fastq files ending in _1.fastq? <br />
294+
**[Click for Solution]**
295+
:::
296+
</summary>
297+
:::tip[Solution]
298+
Create the following script in a file called `head_all.sh`
299+
```bash
300+
#!/bin/bash
301+
302+
for FILE in *.fasatq
303+
do
304+
echo $FILE
305+
head -n 4 $FILE
306+
done
307+
```
308+
The `for` line could be modified to be `for FILE in *_1.fastq` to achieve the second aim.
309+
:::
310+
</details>
311+
312+
<details>
313+
<summary>
314+
:::info[Concatenating variables]
315+
Concatenating (i.e. mashing together) variables is quite easy to do. Add whatever you want to concatenate to the beginning or end of the shell variable after enclosing it in `{}` characters.
316+
```bash
317+
$ FILE=stuff.txt
318+
$ echo ${FILE}.example
319+
stuff.txt.example
320+
```
321+
Can you write a script that prints off the name of every file in a directory with `.processed` added to it? <br />
322+
**[Click for Solution]**
323+
:::
324+
</summary>
325+
::::tip[Solution]
326+
Create the following script in a file called `process.sh`:
327+
```bash
328+
#!/bin/bash
329+
330+
for FILE in *
331+
do
332+
echo ${FILE}.processed
333+
done
334+
```
335+
:::note
336+
This will also print directories appended with `.processed`.
337+
:::
338+
To truly only get files and not directories, we need to modify this to use the `find` command to give us only files in the current directory:
339+
```bash
340+
#!/bin/bash
341+
342+
for FILE in $(find . -maxdepth 1 -type f)
343+
do
344+
echo ${FILE}.processed
345+
done
346+
```
347+
but this will have the side effect of listing hidden files too. We can fix this by making a small change to the find command:
348+
```bash
349+
#!/bin/bash
350+
351+
for FILE in $(find . -maxdepth 1 -type f ! -name ".*")
352+
do
353+
echo ${FILE}.processed
354+
done
355+
```
356+
We've added `! -name ".*"` to the `find` command. It means `not` (`!`) a name that starts with `.`.<br />
357+
As you can see, programming is often iterative in more ways than one. <Smile />
358+
::::
359+
</details>
360+
361+
<details>
362+
<summary>
363+
:::info[Special permissions]
364+
What if we want to give different sets of users different permissions. `chmod` actually accepts special numeric codes instead of stuff like `chmod +x`, as we mentioned above. Again, the numeric codes are as follows: read = 4, write = 2, execute = 1. For each user we will assign permissions based on the sum of these permissions (must be between 7 and 0).
365+
366+
Let’s make an example file and give everyone permission to do everything with it.
367+
```bash
368+
$ touch example
369+
$ ls -l example
370+
-rw-r--r-- 1 yourUsername users 0 May 30 14:50 example
371+
$ chmod 777 example
372+
$ ls -l example
373+
-rwxrwxrwx 1 yourUsername users 0 May 30 14:50 example
374+
```
375+
376+
How might we give ourselves permission to do everything with a file, but allow no one else to do anything with it.
377+
**[Click for Solution]**
378+
:::
379+
</summary>
380+
:::tip[Solution]
381+
```bash
382+
$ chmod 700 example
383+
$ ls -l example
384+
-rwx------ 1 yourUsername users 0 May 30 14:50 example
385+
```
386+
We want all permissions, so: 4 (read) + 2 (write) + 1 (execute) = 7 for user (first position), no permissions, i.e. 0, for group (second position) and other (third position).
387+
:::
388+
</details>
389+
390+
:::tip[Key Points]
391+
- A shell script is just a list of bash commands in a text file.
392+
- To make a shell script file executable, run `chmod +x script.sh`.
393+
:::

0 commit comments

Comments
 (0)