GitHub - imnotlistening/rsh: Bash clone for RIT project

Branches Tags
Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
include		include
scripts		scripts
src		src
Makefile		Makefile
README		README
Repository files navigation

                                     RSH
1. Overview

    RSH is meant to be as close to bash syntax as possible. However, since I am
human, attented university, and am not a CS major, there is only so much I
could do. RSH consists of several main parts that work together to create what
someone might call a 'shell'. The parts are as follows: a terminal interface, a
lexxer -- generated by `flex', a parser, a command interpretter, and finally
an exec() syscall interface. Each of these sections contribute to the flow of
command -> action.
    There are also some other nice features that will be discussed such as the
symbol table and the environment table. Piping and IO redirection is also
included in the shell, though it is more limited than what a real shell would
be capable of mainly due to the parser implementation.

2. The Terminal Interface.

    In order to have some of the most basic of shell behaviors, a terminal
interface is required. In general, a programmer would not reinvent the wheel
and use a library like gnu-term-readline, which supports history, sophisticated
command line editing, basic tab completion, etc. However, as wonderful as
gnu-term-readline is, it would not fulfill the project requirements, so instead
I wrote my own simplified version of gnu-term-readline. The results are in the
source file src/readterm.c. Essentially, one must read each byte of the input
stream and process said byte. If an escape character is detected then a special
handler for the following escape sequence must be called. This is implemented
in the function _rsh_handle_escape_seq(). This function deals with all
understood escape sequences. It has sub handlers for particular escape
sequences such as the up arrow key or the DEL key. These sequences are handled
by: _rsh_do_history_completion(), _rsh_do_move_cursor(), _rsh_do_delete(). As
of now the following non printing characters or sequences can be processed:
DEL, the arrow keys, backspace (ASCII 0x7f).
    Terminal history is handled in this section of the code as well. All of the
incoming data is stored into buffers, These buffers are implemented via

    struct rsh_buff {

      /* The underlying buffer itself. */
      char *buf;

      /* The offset to write the next character to. */
      int offset; 

      /* The number of characters in the buffer. */
      int len;
	    
      /* The size of the buffer. */
      int size;

      /* Does this buffer have stuff? */
      int used;

    };

This data type is used to effectively keep track of what has been typed, which
allows us to redisplay the terminal data with a cursor location so editing can
be performed in arbitrary places in the data (also implies left/right arrow
keys, del, backspace). The history itself is a circular buffer of just the text
in a buffer. The buffer itself can be regenerated from the history so that the
history can be edited once placed on the command line vie the up/down arrow 
keys. However, the history itself is not changed, just the temporary buffer
holding a copy of the history item.
    History itself is implemented as a circular buffer of pointers to pointers
to char:

    char *history[_HIST_BUFFER_SIZE];

This allows extremely efficient usage of memory since there is only 1 pointer
worth of overhead: the pointer that delimits the end of the table. If the 
history is full, then the end pointer is simply incremented and the first entry
put into the history will be replaced. In that regard the history is like a 
FIFO queue. The strings themselves are stored on the heap via malloc() via 
strdup().
    The readline functionality must be able to interface with the lexxer 
generated by `flex'. In order to do this, the `flex' input macro

    YY_INPUT(buf, result, max_size)

had to be overwridden. This is done by defining a macro in the parser.lex
preamble code. The function that this macro uses is defined in src/readterm.c
as 
    rsh_readbuf(char *buffer, size_t max)

This function must translate between line reading (which is what we do on the
terminal) and buffer reading which is what `flex' does. Flex will attempt to 
read at most max characters. However, if for whatever reason, the user has 
typed in more than 'max' characters, read_buf must be able to handle this.
read_buf() must also be able to handle reading from a file; as such read_buf()
really just determines what actual function to call based on whether the shell
is interactive or not. For more information regarding this code, peruse the
lex.yy.c (this is the generated lexxer) and the parser.lex files.


3. The Parser

    RSH's parser really sucks. I tried to implement something via yacc but
realized that would not work since I didn't really understand how to write a
proper grammer. As such instead of implementing my own shift reduce parser or
the like, I write a really simple, bad, and at least mostly functional parser.
Since it is completely adhoc'ish code, it does not support all the cool things
that a real shell like ZSH supports. But it gets the job done. That code is all
int parser.c.

4. Command Interpretter

    The command interpretter is a fairly simple block of code. It reads through
each passed command and figures out what to do: varible declarations, commands,
background or foreground, builtins, etc. Once it knows what to do, it asks the
exec block of code to actually do it. The command interpretter is in command.c.

5. Exec()...

    The first thing I want to say is this code was not easy to debug. Nor was
the documentation all that good. The GNU libc documentation (and example shell)
was an invaluable reasource. In any event, the basic idea is this, rsh needs to
make sure it is in its own process group, this has to do with signal handling.
When a signal is passed to a child process, rsh does not want to get that
signal as well (unless we are running as a script, but we will ge to that in a
bit). In general, rsh really only wants to get a few signals from the children:
SIGCHLD primarily. This allows the wait() syscall and friends to get 
notifications about child state changes. This is crucial for the shell.
    RSH keeps track of all child processes via the following data structure:

    struct rsh_process_group {

      /* List related stuff. */
      struct rsh_process **pgroup;
      int max_procs;

      /* The standard I/O streams that point to the controlling terminal. Or a 
       * pipe if configured that way, but these should never not be 0, 1, & 2.*/
      int stdin;
      int stdout;
      int stderr;
  
      /* The pid of the shell. Why not? */
      pid_t pid;

      /* The process group ID for the shell's process group. */
      gid_t pgid;

    };

This struct has a list of processes. These processes are tracked via the 
following data type:

    struct rsh_process {

      /* Process ID and process group ID. */
      pid_t pid;
      gid_t pgid;
      
      /* Process' standard I/O streams. */
      int stdin;
      int stdout;
      int stderr;

      /* Zero if this process is in the foreground. */
      int background;
  
      /* Non-zero if the process is actually running. */
      int running;

      /* The first 128 chars (if there are that many) of the commands actual
       * file name (i.e: /usr/bin/ping). */
      char name[128];

      /* Pipe related stuff. */
      int pipe[2];
      int pipe_used;
      int pipe_lane; /* Which (stdin, stdout, stderr) is being piped. */

      /* This is cosmetic; if set, display a message if this process terminates. */
      int display_exit;
      /* Terminal settings. */
      struct termios term;

      /* These will probably not remain valid forever, but as long as they last
       * up to the fork() call, they will be good for the child process to use.
       */
      char *command;
      char **argv;
      int argc;

    };

Each process that gets spawned gets its own struct rsh_process allocated. This
allows rsh to keep track of background processes. The builtin command `dproc'
displays the list of currently running/paused processes.
    By using the proces list, jobs can be controlled fairly easily. Each
spawned process is put into its own process group. This isn't what should
happen, each group of process making up a job should be given the same process 
group, but as of now, that isn't implemented. In any event, the exec.c code
implements typical job control. For example, given a program called run that
just sits in a for (;;); loop:

    [rsh]$ /home/alex/tmp/run
    CTRL-z
    Process (4786) stopped [sig=20].
    [rsh]$ bg
    [rsh]$ dproc
    pid 4767  (running): rsh
    pid 4786  (running): /home/alex/tmp/run
    [rsh]$ killall -SIGSTOP run
    + stopped             /home/alex/tmp/run
    [rsh]$ dproc
    pid 4767  (running): rsh
    pid 4786  (stopped): /home/alex/tmp/run
    [rsh]$ fg

    Process (4786) terminated by signal.
    [rsh]$ dproc
    pid 4767  (running): rsh

In any event, as can be seen, the process run (pid=4786) was started normally,
got a SIGTSTP (Terminal stop) from the CTRL-z, got backgrounded, then got
stopped via a killall -SIGSTOP, was foregrounded again, and finally killed with
a CTRL-c. The dproc command was used to display the contents of the rsh shells
process list. However, to really see this in action, open up another shell
even rsh would work for this) and run top. Now you will be able to see the 
process use the cpu when its running.
    Outside of the shell signals were returned to normal, that is to say CTRL-c
does not print the shell history if the shell is not the foreground. This also
allows us to kill a process that we don't like without having to open another
shell.

6. The Symbol Table and Environment.

    Like all shells, rsh supports the defining of scalar variables. However
there are some limitations that should be noted. Here is a basic example of a
scalar definition:

    [rsh]$ PROMPT='[rsh]$ '
    
This sets the $PROMPT variable to '[rsh]$ '. $PROMPT is used by rsh to generate
the prompt. In any event, this adds a symbol to the rsh symbol table. This
symbol can be used later on like so:

    [rsh]$ echo $PROMPT
    [rsh]$ 
    [rsh]$ 
    
Again this is just typical shell behavior. However, some more complex examples
will not work the way you expect them to. For instance:

    [rsh]$ echo blah=10
    
    [rsh]$ 

This occurs because variable definitions may occur anywhere in the command 
line. In the ZSH shell however, you would see something like this:

    [11:35AM alex@australia src]$ echo blah=10
    blah=10
    [11:38AM alex@australia src]$

    The symbol table used by rsh also interfaces with the environment list that
libc maintains. If you try and use a variable that is in the enviroment, the
environment variable will be chosen over the symbol tabel definition. For
instance:

    [rsh]$ PATH=$PATH:/my/new/path
    [rsh]$ echo $PATH
    /usr/local/bin:/usr/bin:/bin:/home/alex/bin:/usr/local/sbin:/usr/sbin:/sbin
    :/my/new/path
    [rsh]$

This is a rather useful feature, especially since the environment varibles
carry over into sub processes:

    [rsh]$ PATH=$PATH:/my/new/path
    [rsh]$ env
    ...
    PATH=/usr/local/bin:/usr/bin:/bin:/home/alex/bin:/usr/local/sbin:/usr/sbin:
    /sbin:/my/new/path
    [rsh]$ 

There are a few limitations to dealing with variables. Don't try to export a
symbol from the symbol table to the environment at the same time you try and
define it.

    [rsh]$ export BLAH="hello world"

This will not work the way you expect it to. Instead:
    
    [rsh]$ BLAH="hello world"
    [rsh]$ export BLAH

7. Pipes.

    RSH supports piping output from one process to another process. This is
accomplished via the pipe() system call. pipe() generates a pair of file
descriptors: one for reading, the other for writing. The write descriptor is
duped into the first processes stdout (or stderr) and the read descriptor is
duped into the second processes stdin. When the first process writes, the 
second may then read that data. This is mostly implemented in exec.c though the
code in command.c also needs to make sure the pipe is actually created. The
biggest difficulty with pipes is making sure all ends get closed the 
appropriate number of times. If the write end of the pipe never gets fully 
closed (by rsh, first process and second process) then the pipe never 
terminates and the shell recieving process hangs while waiting for the stream
to EOF. On my computer here is an example demostrating a 3 stage pipe:

    [rsh]$ ps aux | grep alex | grep zsh
    alex      2038  0.0  0.0 124136  2652 pts/0    Ss   Oct10   0:00 zsh
    alex      2073  0.0  0.0 124124  2452 pts/1    Ss   Oct10   0:00 zsh
    alex      2339  0.0  0.0 124128  3256 pts/7    Ss   16:43   0:00 zsh
    alex      2819  0.0  0.0 124268  3384 pts/8    Ss+  16:50   0:00 zsh
    alex      2905  0.0  0.0 124256  3084 pts/2    Ss+  Oct10   0:00 zsh
    alex      2940  0.0  0.0 124128  2924 pts/3    Ss   Oct10   0:00 zsh
    alex      4815  0.0  0.0 124136  3232 pts/9    Ss+  17:05   0:00 zsh
    alex      6582  0.0  0.0 121880  2872 pts/10   Ss   17:26   0:00 zsh
    alex      8616  0.0  0.0 124132  3280 pts/4    Ss+  Oct10   0:00 zsh
    alex     11339  0.0  0.0 124128  3276 pts/11   Ss   18:15   0:00 zsh
    alex     11687  0.0  0.0 124280  3376 pts/12   Ss+  18:18   0:00 zsh
    alex     13374  0.0  0.0 121880  2872 pts/13   Ss   18:30   0:00 zsh
    alex     18846  0.0  0.0 124152  3352 pts/6    Ss   19:33   0:00 -zsh
    alex     19433  0.0  0.0 123900  3024 pts/14   Ss   19:37   0:00 -zsh
    alex     20413  0.0  0.0 102732   760 pts/14   S+   19:47   0:00 grep zsh
    alex     25264  0.0  0.0 124280  3364 pts/5    Ss   07:32   0:00 zsh
    [rsh]$

Wow, I have a lot of terminals open. In any event, output from `ps aux' is
piped to `grep alex' which finds all processes that I have run. Then, the
result from the first grep is piped to a second grep to find all occurences of
zsh. This command is simply for illustration purposes, it could be made with
out grep pipes, but that would not be as interesting.

This was probably the most complex thing to implement in the entire shell. Job
control was hard, but this was a nightmare.

8. Random other stuff.

    A. Builtins.

        There are several built in functions that you can use. These are:
	
	    cd
	    exit
	    exec
	    history
	    fg
	    bg
	    dproc
	    export

        Most of these work as they would under a normal shell. The exceptions
	are 'exec' and 'dproc'. Exec is not implemented yet at all and, when
	implemented, will not be able to do shell file descriptor manipulation.
	dproc is used to display processes that the shell is aware of and their
	state. This is slightly different functionality than the default
	behavior of `ps'.

	Built in functions are mostly implemented in src/builtin.c. Each built
	in function must match the prototype in defined in the builtin struct:
	
	    typedef int (* builtin_func)(int argc, char **argv, 
	    	    	   		 int in, int out, int err);
	    struct builtin {

  	        char *name;
		builtin_func func;

	    };

        Each built in function is then also placed in the 

	    struct builtin builtins[];

	array. This facilitates looking up built in functions by checking a
	command against the list of built in functions in the builtins[] array.
	This can lead to over head if a huge number of built in functions were
	to be defined but that doesn't seem like a big enough problem to
	warrent making a hashtable of built ins.

    B. PROMPT Variable.

        Like any shell, the prompt is customizable. The prompt for RSH is not
	nearly as customizable as the prompt for a real shell, but I am working
	on making as much as I can. Look at src/prompt.c for more information
	on how to define a prompt. Simple overview: define it like you would
	in bash, but look at src/prompt.c to see what escapes I have actually
	implemented and what they escapes are. I tried to keep them close to
	bash syntax, but I did make some changes.

    C. Solaris Sucks.
    
        Apparently the developers of the Solaris C API decided that it would
	be a good idea to have stdin, stderr, and stdout be defined as macros.
	These macros exapnd to some array reference which I presume contains
	the value of the FILE data structure that you want to use when doing
	fprintf(). This makes any code like this a pain in the ass:

	    struct rsh_process proc;
	    proc.stdin  = 0;
	    proc.stdout = 1;
	    proc.stderr = 2;

	This leads to compile errors. So yeah, I hate Solaris now.

    D. Initialization Files.

        If you wish, you can make an RC file for RSH. RSH looks for ~/.rshrc
	and sources it. Don't just use your ~/.bashrc file or the like since
	bash syntax will make RSH cry. Primarily things like if clauses and
	for loops and name globbing will just not work and you will be very
	unhappy about the results. On the other hand you can set up changes
	to your path variable or maybe set JAVA_HOME or whatever else you want
	to do. For the most part you wont need anything since almost everything
	of relevance will be handled by the login shell (presumably bash) and
	RSH will just get the results in the environment.

    E. CTRL-c

        CTRL-c when in RSH will display the history. However, child processes
	have their signal handlers reset so that CTRL-c works as expected for
	an errant child.

    F. File redirection.

        File redirection works. Well, it works so far as I have tested it.
	
	    echo HELLO WORLD > tmp.txt

	will do as you expect. Likewise with stderr. Redirection does not work
	for higher file descriptors; my lexxer is not smart enough at the
	to figure that out. It would be implementable, but that would warrent
	a complete reimplementation of the parser, which would be really really
	time consuming. Piping as of now has not been implemented though should
	actually be pretty easy; just needs the code to be written.

    G. Signal handling by RSH

        The only 2 signals explicitly handled by RSH are SIGCHLD and SIGINT.
	SIGINT is for the assignment requirements. SIGCHLD is used to clean up
	background processes that terminate. A common example of this is in
	pipes. Only the last process in a pipe is foregrounded, the rest 
	execute in the background. As such those background processes much be
	cleaned up (mostly this is just a deallocation of the processes
	struct rsh_process data structure). SIGCHLD solves this problem since
	it is delivered whenever a child process changes state. When we get a
	SIGCHLD signal, rsh goes through the process list and does a 
	non-blocking waitpid() to see if a given process has terminated, 
	stopped, etc (we skip foreground processes here since those are handled
	by foreground()). If we detect a change we act accordingly.

	RSH does not display every signal it recieves, since this would bog
	down the terminal command line. However, if a process exits abnormally
	then the cause of that exit is displayed. For instance if a process is
	killed by a SIGINT, RSH alerts the user to that.

	    [rsh]$ ./run_forever
	    ^C
	    Process (19049) terminated by signal (2)
	    [rsh]$

	Likewise, if a job is paused, RSH will alert the user to this event.
	
	    [rsh]$ ./run_forever
	    ^C
	    Process (19397) stopped [sig=20].
	    [rsh]$
	
	Also for background processes, a message will be displayed:

	    [rsh]$ sleep 5 &
	    [rsh]$
	    Process (19556) terminated (0)
	    [rsh]$
	
	Here, the sleep program just sleeps for 5 seconds in the background. 
	When done, it terminated, RSH recieves a SIGCHLD, searches through the
	process list, finds out that `sleep' has terminated, and displayes a
	message accordingly. It recreates any buffer that is being edited on
	the command line to make sure input stays coherent.