Tutorials - Parser

Tutorials - Parser

For implementing assembler you may use already prepared classes available on e-classroom.

Our goals is to make a program which loads a source code (with SIC/XE assembly), transforms it into internal representation (AST - abstract syntax tree), and, finally, it prints so generated structure. The end result will be so called "pretty print" program.

Code representation

a) There are several kinds of commands (e.g., instructions, directives, comments, etc.). For each kind make your own class which should inherit the Node base class. You will need at least the following classes:

  • Comment - comments;
  • InstructionF1 - instructions of format 1;
  • InstructionF2 - instructions of format 2;
  • InstructionF3 - instructions of format 3;
  • InstructionF4 - instructions of format 4;
  • Directive - directives such as START, END, ...;
  • Storage - memory directives such as BYTE, WORD, RESB, RESW.

For each command (class) think about, which data (operands etc.) do you need.

b) You will also need a class Code, which will represent whole assembly program. It should store the name of program, the list of commands, location counter etc.

Within this exercise think also about how will you later (next exercises) expand class with various methods for resolving symbols, generating code etc.

Representing mnemonics

Mnemonic is a symbolic name for a command. We have many kinds of them, depending on format, operands etc. We will again have an abstract class Mnemonic, which contains mnemonic name, operational code, mnemonic description etc.

An important part will also be a method

which reads and parses any operands following the mnemonic. Afterwards it returns a representation of the command as an object. The type of this object is a subclass of Node (defined in previous exercise).

For each mnemonic respectively operand kind we will make separate class. For example:

  • MnemonicD - directive without operands (NOBASE, LTORG);
  • MnemonicDn - directive with one numeric operand (can also be symbol) (START, END, ...);
  • MnemonicF1 - instructions of format 1 (without operands) (FIX, FLOAT, ...);
  • MnemonicF2n - instructions of format 2 with one numeric operand (SVC);
  • MnemonicF2r - instruction of format 2 with one register operand (CLEAR, TIXR);
  • MnemonicF2rn - instructions of format 2 with one register and one numeric operand (SHIFTL, SHIFTR);
  • MnemonicF2rr - instructions of format 2 with two register operands (ADDR, ...);
  • MnemonicF3 - instructions of format 3 without operands (RSUB);
  • MnemonicF3m - instructions of format 3 with one operand (LDA, ...);
  • MnemonicF4m - instructions of format 4 with one operand (+LDA, ...);
  • MnemonicSd - memory directive with data operand (BYTE, WORD);
  • MnemonicSn - memory directive with numeric operand (RESB, RESW).

For each of this mnemonic kind you have to override the parse() method.

Syntax analysis (parsing)

For syntax analysis you may use classes Lexer and Parser given at tutorials. Study both of them before implementing. Your main task is to write parsing of operands (see previous exercises).

Last modified: Saturday, 1 December 2018, 4:09 PM