-
Notifications
You must be signed in to change notification settings - Fork 0
imnotlistening/rsh
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
RSH 1. Overview RSH is meant to be as close to bash syntax as possible. However, since I am human, attented university, and am not a CS major, there is only so much I could do. RSH consists of several main parts that work together to create what someone might call a 'shell'. The parts are as follows: a terminal interface, a lexxer -- generated by `flex', a parser, a command interpretter, and finally an exec() syscall interface. Each of these sections contribute to the flow of command -> action. There are also some other nice features that will be discussed such as the symbol table and the environment table. Piping and IO redirection is also included in the shell, though it is more limited than what a real shell would be capable of mainly due to the parser implementation. 2. The Terminal Interface. In order to have some of the most basic of shell behaviors, a terminal interface is required. In general, a programmer would not reinvent the wheel and use a library like gnu-term-readline, which supports history, sophisticated command line editing, basic tab completion, etc. However, as wonderful as gnu-term-readline is, it would not fulfill the project requirements, so instead I wrote my own simplified version of gnu-term-readline. The results are in the source file src/readterm.c. Essentially, one must read each byte of the input stream and process said byte. If an escape character is detected then a special handler for the following escape sequence must be called. This is implemented in the function _rsh_handle_escape_seq(). This function deals with all understood escape sequences. It has sub handlers for particular escape sequences such as the up arrow key or the DEL key. These sequences are handled by: _rsh_do_history_completion(), _rsh_do_move_cursor(), _rsh_do_delete(). As of now the following non printing characters or sequences can be processed: DEL, the arrow keys, backspace (ASCII 0x7f). Terminal history is handled in this section of the code as well. All of the incoming data is stored into buffers, These buffers are implemented via struct rsh_buff { /* The underlying buffer itself. */ char *buf; /* The offset to write the next character to. */ int offset; /* The number of characters in the buffer. */ int len; /* The size of the buffer. */ int size; /* Does this buffer have stuff? */ int used; }; This data type is used to effectively keep track of what has been typed, which allows us to redisplay the terminal data with a cursor location so editing can be performed in arbitrary places in the data (also implies left/right arrow keys, del, backspace). The history itself is a circular buffer of just the text in a buffer. The buffer itself can be regenerated from the history so that the history can be edited once placed on the command line vie the up/down arrow keys. However, the history itself is not changed, just the temporary buffer holding a copy of the history item. History itself is implemented as a circular buffer of pointers to pointers to char: char *history[_HIST_BUFFER_SIZE]; This allows extremely efficient usage of memory since there is only 1 pointer worth of overhead: the pointer that delimits the end of the table. If the history is full, then the end pointer is simply incremented and the first entry put into the history will be replaced. In that regard the history is like a FIFO queue. The strings themselves are stored on the heap via malloc() via strdup(). The readline functionality must be able to interface with the lexxer generated by `flex'. In order to do this, the `flex' input macro YY_INPUT(buf, result, max_size) had to be overwridden. This is done by defining a macro in the parser.lex preamble code. The function that this macro uses is defined in src/readterm.c as rsh_readbuf(char *buffer, size_t max) This function must translate between line reading (which is what we do on the terminal) and buffer reading which is what `flex' does. Flex will attempt to read at most max characters. However, if for whatever reason, the user has typed in more than 'max' characters, read_buf must be able to handle this. read_buf() must also be able to handle reading from a file; as such read_buf() really just determines what actual function to call based on whether the shell is interactive or not. For more information regarding this code, peruse the lex.yy.c (this is the generated lexxer) and the parser.lex files. 3. The Parser RSH's parser really sucks. I tried to implement something via yacc but realized that would not work since I didn't really understand how to write a proper grammer. As such instead of implementing my own shift reduce parser or the like, I write a really simple, bad, and at least mostly functional parser. Since it is completely adhoc'ish code, it does not support all the cool things that a real shell like ZSH supports. But it gets the job done. That code is all int parser.c. 4. Command Interpretter The command interpretter is a fairly simple block of code. It reads through each passed command and figures out what to do: varible declarations, commands, background or foreground, builtins, etc. Once it knows what to do, it asks the exec block of code to actually do it. The command interpretter is in command.c. 5. Exec()... The first thing I want to say is this code was not easy to debug. Nor was the documentation all that good. The GNU libc documentation (and example shell) was an invaluable reasource. In any event, the basic idea is this, rsh needs to make sure it is in its own process group, this has to do with signal handling. When a signal is passed to a child process, rsh does not want to get that signal as well (unless we are running as a script, but we will ge to that in a bit). In general, rsh really only wants to get a few signals from the children: SIGCHLD primarily. This allows the wait() syscall and friends to get notifications about child state changes. This is crucial for the shell. RSH keeps track of all child processes via the following data structure: struct rsh_process_group { /* List related stuff. */ struct rsh_process **pgroup; int max_procs; /* The standard I/O streams that point to the controlling terminal. Or a * pipe if configured that way, but these should never not be 0, 1, & 2.*/ int stdin; int stdout; int stderr; /* The pid of the shell. Why not? */ pid_t pid; /* The process group ID for the shell's process group. */ gid_t pgid; }; This struct has a list of processes. These processes are tracked via the following data type: struct rsh_process { /* Process ID and process group ID. */ pid_t pid; gid_t pgid; /* Process' standard I/O streams. */ int stdin; int stdout; int stderr; /* Zero if this process is in the foreground. */ int background; /* Non-zero if the process is actually running. */ int running; /* The first 128 chars (if there are that many) of the commands actual * file name (i.e: /usr/bin/ping). */ char name[128]; /* Pipe related stuff. */ int pipe[2]; int pipe_used; int pipe_lane; /* Which (stdin, stdout, stderr) is being piped. */ /* This is cosmetic; if set, display a message if this process terminates. */ int display_exit; /* Terminal settings. */ struct termios term; /* These will probably not remain valid forever, but as long as they last * up to the fork() call, they will be good for the child process to use. */ char *command; char **argv; int argc; }; Each process that gets spawned gets its own struct rsh_process allocated. This allows rsh to keep track of background processes. The builtin command `dproc' displays the list of currently running/paused processes. By using the proces list, jobs can be controlled fairly easily. Each spawned process is put into its own process group. This isn't what should happen, each group of process making up a job should be given the same process group, but as of now, that isn't implemented. In any event, the exec.c code implements typical job control. For example, given a program called run that just sits in a for (;;); loop: [rsh]$ /home/alex/tmp/run CTRL-z Process (4786) stopped [sig=20]. [rsh]$ bg [rsh]$ dproc pid 4767 (running): rsh pid 4786 (running): /home/alex/tmp/run [rsh]$ killall -SIGSTOP run + stopped /home/alex/tmp/run [rsh]$ dproc pid 4767 (running): rsh pid 4786 (stopped): /home/alex/tmp/run [rsh]$ fg Process (4786) terminated by signal. [rsh]$ dproc pid 4767 (running): rsh In any event, as can be seen, the process run (pid=4786) was started normally, got a SIGTSTP (Terminal stop) from the CTRL-z, got backgrounded, then got stopped via a killall -SIGSTOP, was foregrounded again, and finally killed with a CTRL-c. The dproc command was used to display the contents of the rsh shells process list. However, to really see this in action, open up another shell even rsh would work for this) and run top. Now you will be able to see the process use the cpu when its running. Outside of the shell signals were returned to normal, that is to say CTRL-c does not print the shell history if the shell is not the foreground. This also allows us to kill a process that we don't like without having to open another shell. 6. The Symbol Table and Environment. Like all shells, rsh supports the defining of scalar variables. However there are some limitations that should be noted. Here is a basic example of a scalar definition: [rsh]$ PROMPT='[rsh]$ ' This sets the $PROMPT variable to '[rsh]$ '. $PROMPT is used by rsh to generate the prompt. In any event, this adds a symbol to the rsh symbol table. This symbol can be used later on like so: [rsh]$ echo $PROMPT [rsh]$ [rsh]$ Again this is just typical shell behavior. However, some more complex examples will not work the way you expect them to. For instance: [rsh]$ echo blah=10 [rsh]$ This occurs because variable definitions may occur anywhere in the command line. In the ZSH shell however, you would see something like this: [11:35AM alex@australia src]$ echo blah=10 blah=10 [11:38AM alex@australia src]$ The symbol table used by rsh also interfaces with the environment list that libc maintains. If you try and use a variable that is in the enviroment, the environment variable will be chosen over the symbol tabel definition. For instance: [rsh]$ PATH=$PATH:/my/new/path [rsh]$ echo $PATH /usr/local/bin:/usr/bin:/bin:/home/alex/bin:/usr/local/sbin:/usr/sbin:/sbin :/my/new/path [rsh]$ This is a rather useful feature, especially since the environment varibles carry over into sub processes: [rsh]$ PATH=$PATH:/my/new/path [rsh]$ env ... PATH=/usr/local/bin:/usr/bin:/bin:/home/alex/bin:/usr/local/sbin:/usr/sbin: /sbin:/my/new/path [rsh]$ There are a few limitations to dealing with variables. Don't try to export a symbol from the symbol table to the environment at the same time you try and define it. [rsh]$ export BLAH="hello world" This will not work the way you expect it to. Instead: [rsh]$ BLAH="hello world" [rsh]$ export BLAH 7. Pipes. RSH supports piping output from one process to another process. This is accomplished via the pipe() system call. pipe() generates a pair of file descriptors: one for reading, the other for writing. The write descriptor is duped into the first processes stdout (or stderr) and the read descriptor is duped into the second processes stdin. When the first process writes, the second may then read that data. This is mostly implemented in exec.c though the code in command.c also needs to make sure the pipe is actually created. The biggest difficulty with pipes is making sure all ends get closed the appropriate number of times. If the write end of the pipe never gets fully closed (by rsh, first process and second process) then the pipe never terminates and the shell recieving process hangs while waiting for the stream to EOF. On my computer here is an example demostrating a 3 stage pipe: [rsh]$ ps aux | grep alex | grep zsh alex 2038 0.0 0.0 124136 2652 pts/0 Ss Oct10 0:00 zsh alex 2073 0.0 0.0 124124 2452 pts/1 Ss Oct10 0:00 zsh alex 2339 0.0 0.0 124128 3256 pts/7 Ss 16:43 0:00 zsh alex 2819 0.0 0.0 124268 3384 pts/8 Ss+ 16:50 0:00 zsh alex 2905 0.0 0.0 124256 3084 pts/2 Ss+ Oct10 0:00 zsh alex 2940 0.0 0.0 124128 2924 pts/3 Ss Oct10 0:00 zsh alex 4815 0.0 0.0 124136 3232 pts/9 Ss+ 17:05 0:00 zsh alex 6582 0.0 0.0 121880 2872 pts/10 Ss 17:26 0:00 zsh alex 8616 0.0 0.0 124132 3280 pts/4 Ss+ Oct10 0:00 zsh alex 11339 0.0 0.0 124128 3276 pts/11 Ss 18:15 0:00 zsh alex 11687 0.0 0.0 124280 3376 pts/12 Ss+ 18:18 0:00 zsh alex 13374 0.0 0.0 121880 2872 pts/13 Ss 18:30 0:00 zsh alex 18846 0.0 0.0 124152 3352 pts/6 Ss 19:33 0:00 -zsh alex 19433 0.0 0.0 123900 3024 pts/14 Ss 19:37 0:00 -zsh alex 20413 0.0 0.0 102732 760 pts/14 S+ 19:47 0:00 grep zsh alex 25264 0.0 0.0 124280 3364 pts/5 Ss 07:32 0:00 zsh [rsh]$ Wow, I have a lot of terminals open. In any event, output from `ps aux' is piped to `grep alex' which finds all processes that I have run. Then, the result from the first grep is piped to a second grep to find all occurences of zsh. This command is simply for illustration purposes, it could be made with out grep pipes, but that would not be as interesting. This was probably the most complex thing to implement in the entire shell. Job control was hard, but this was a nightmare. 8. Random other stuff. A. Builtins. There are several built in functions that you can use. These are: cd exit exec history fg bg dproc export Most of these work as they would under a normal shell. The exceptions are 'exec' and 'dproc'. Exec is not implemented yet at all and, when implemented, will not be able to do shell file descriptor manipulation. dproc is used to display processes that the shell is aware of and their state. This is slightly different functionality than the default behavior of `ps'. Built in functions are mostly implemented in src/builtin.c. Each built in function must match the prototype in defined in the builtin struct: typedef int (* builtin_func)(int argc, char **argv, int in, int out, int err); struct builtin { char *name; builtin_func func; }; Each built in function is then also placed in the struct builtin builtins[]; array. This facilitates looking up built in functions by checking a command against the list of built in functions in the builtins[] array. This can lead to over head if a huge number of built in functions were to be defined but that doesn't seem like a big enough problem to warrent making a hashtable of built ins. B. PROMPT Variable. Like any shell, the prompt is customizable. The prompt for RSH is not nearly as customizable as the prompt for a real shell, but I am working on making as much as I can. Look at src/prompt.c for more information on how to define a prompt. Simple overview: define it like you would in bash, but look at src/prompt.c to see what escapes I have actually implemented and what they escapes are. I tried to keep them close to bash syntax, but I did make some changes. C. Solaris Sucks. Apparently the developers of the Solaris C API decided that it would be a good idea to have stdin, stderr, and stdout be defined as macros. These macros exapnd to some array reference which I presume contains the value of the FILE data structure that you want to use when doing fprintf(). This makes any code like this a pain in the ass: struct rsh_process proc; proc.stdin = 0; proc.stdout = 1; proc.stderr = 2; This leads to compile errors. So yeah, I hate Solaris now. D. Initialization Files. If you wish, you can make an RC file for RSH. RSH looks for ~/.rshrc and sources it. Don't just use your ~/.bashrc file or the like since bash syntax will make RSH cry. Primarily things like if clauses and for loops and name globbing will just not work and you will be very unhappy about the results. On the other hand you can set up changes to your path variable or maybe set JAVA_HOME or whatever else you want to do. For the most part you wont need anything since almost everything of relevance will be handled by the login shell (presumably bash) and RSH will just get the results in the environment. E. CTRL-c CTRL-c when in RSH will display the history. However, child processes have their signal handlers reset so that CTRL-c works as expected for an errant child. F. File redirection. File redirection works. Well, it works so far as I have tested it. echo HELLO WORLD > tmp.txt will do as you expect. Likewise with stderr. Redirection does not work for higher file descriptors; my lexxer is not smart enough at the to figure that out. It would be implementable, but that would warrent a complete reimplementation of the parser, which would be really really time consuming. Piping as of now has not been implemented though should actually be pretty easy; just needs the code to be written. G. Signal handling by RSH The only 2 signals explicitly handled by RSH are SIGCHLD and SIGINT. SIGINT is for the assignment requirements. SIGCHLD is used to clean up background processes that terminate. A common example of this is in pipes. Only the last process in a pipe is foregrounded, the rest execute in the background. As such those background processes much be cleaned up (mostly this is just a deallocation of the processes struct rsh_process data structure). SIGCHLD solves this problem since it is delivered whenever a child process changes state. When we get a SIGCHLD signal, rsh goes through the process list and does a non-blocking waitpid() to see if a given process has terminated, stopped, etc (we skip foreground processes here since those are handled by foreground()). If we detect a change we act accordingly. RSH does not display every signal it recieves, since this would bog down the terminal command line. However, if a process exits abnormally then the cause of that exit is displayed. For instance if a process is killed by a SIGINT, RSH alerts the user to that. [rsh]$ ./run_forever ^C Process (19049) terminated by signal (2) [rsh]$ Likewise, if a job is paused, RSH will alert the user to this event. [rsh]$ ./run_forever ^C Process (19397) stopped [sig=20]. [rsh]$ Also for background processes, a message will be displayed: [rsh]$ sleep 5 & [rsh]$ Process (19556) terminated (0) [rsh]$ Here, the sleep program just sleeps for 5 seconds in the background. When done, it terminated, RSH recieves a SIGCHLD, searches through the process list, finds out that `sleep' has terminated, and displayes a message accordingly. It recreates any buffer that is being edited on the command line to make sure input stays coherent.
About
Bash clone for RIT project
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published