- 6.828 is largely about design and implementation of system
call interface. let's start by looking at how programs use that
interface. example: the Unix shell.
- the shell is the Unix command UI
- the shell is also a programming/scripting language
- typically handles login session, runs other processes
- look at some simple examples of shell operations, how they
use different O/S abstractions, and how those abstractions
fit together.
See Unix paper
if you are unfamiliar with the shell.
- Basic structure: see sh.c
- Basic organization: parsing and executing commands (e.g., ls, ls | wc, ls > out)
- Shell implemented using system calls (e.g., read, write, fork, exec, wait)
conventions: -1 return value signals error,
error code stored in
errno
,
perror
prints out a descriptive error
message based on errno
.
- Many systems calls are encapsulated in libc calls (e.g., fgets vs read)
- what does fork() do?
copies user memory
copies process kernel state (e.g. user id)
child gets a different PID
child state contains parent PID
returns twice, with different values
- what does wait() do?
waits for any child to exit
what if child exits before parent calls wait?
- What does parscmd() do?
- Example:
$ ls
- The parse functions form a simple recursive-descent parser for the sh
scripting language
- Do exercise 1 of shell assignment
- You need a variant of exec() (type "man 3 exec"):
replaces memory of current process with instrs/data from file
i.e. runs a file created by compiler/linker
still the same process, keeps most state (e.g. user id)
- the fork/exec split looks wasteful, but it turns out to be useful.
- how does "ls" know which directory to look at?
cwd in kernel-maintained process state, copied during fork
- how does it know what to do with its output?
- I/O: process has file descriptors, numbered starting from 0.
index into table in process's kernel state
- system calls: open, read, write, close
- numbering conventions:
- so ls writes output to file descriptor 1
- on fork, child inherits open file descriptors from parent (show in
process diagram).
- on exec, process retains file descriptors.
- This shell command sends ls's output to the file out:
$ ls > out
- Q: how could our simple shell implement output redirection?
- A: Do exercise 2.
- Good illustration of why it's nice to have separate fork and exec.
- Many commands use 0/1 by default, so they work with redirection. For example,
see xv6 cat.c.
- system call interface simple, just ints and char buffers.
why not have open() return a pointer reference to a kernel file object?
- Linux has a nice representation of a process and its FDs, under /proc/PID/
- maps: VA range, perms (p=private, s=shared), offset, dev, inode, pathname
- fd: symlinks to files pointed to by each fd.
(what's missing in this representation?)
- try exec 3>/tmp/xx ; ls -l /proc/$$/fd
- One often wants to run a series of programs on some data:
$ sort < in > out
$ uniq out > out2
$ wc out2
$ rm out out2
the shell supports this more concisely with "piping" of "filters":
$ sort < in | uniq | wc
- A pipe is a one-way communication channel. Here is a simple example:
int fds[2];
char buf[512];
int n;
pipe(fds);
write(fds[1], "hello", 5);
n = read(fds[0], buf, sizeof(buf));
// buf[] now contains 'h', 'e', 'l', 'l', 'o'
- file descriptors are inherited across
fork()
, so this also works:
int fds[2];
char buf[512];
int n, pid;
pipe(fds);
pid = fork();
if(pid > 0){
write(fds[1], "hello", 5);
} else {
n = read(fds[0], buf, sizeof(buf));
exit(0);
}
- How does the shell implement pipelines (i.e., cmd 1 | cmd 2 |..)?
We want to arrange that the output of cmd 1 is the input of cmd 2.
The way to achieve this goal is to manipulate stdout and stdin.
- The shell creates processes for each command in
the pipeline, hooks up their stdin and stdout,
and waits for the last process of the
pipeline to exit.
- Do exercise 3.
- Who waits for whom? (draw a tree of processes)
- Why close read-end and write-end? ensure that
every process starts with 3 file descriptors, and that
reading from the pipe returns end of file after the first command exits.
- You can run the shell, redirect its stdin/stdout, etc.
- I'll run this shell script with sh < script:
echo one
echo two
- Q: what will this shell command do?
$ sh < script > out
- the script itself didn't redirect the echo output,
but it did inherit a fd 1 that was redirected to out.
- Q: I'll run the following directly; is it the same as above?
echo one > out
echo two > out
- Notes about the file descriptor design:
- nice interaction with fork
- FDs help make programs more general purpose: don't need special
cases for files vs console vs pipe
- shell pipelines only work for programs w/ common formats (lines of text)
- How do you create a background job?
$ sleep 2 &
- Q: How does the shell implement "&"?
- Q: What if a background process exits while sh waits for a foreground process?
- Do challenge exercises