Copyright (C) 2001 Michael Leonhard Mike Leonhard mike at tamale dot net http://tamale.net/ ascorbic version 0.1 Ascorbic is a compiler for the Vitamin C language. Vitamin C is a simple programming language. The Vitamin C language is defined in vitaminc.txt. I needed a high level programming language to prove Sebae. I decided to write a compiler for a simple language so I tried to get the ANSI Minimal BASIC specification. Unfortunately ANSI charges money to download their documents. Being a poor student, I decided to make up my own simple programming language. Originally I called this language QuickScript. This is a cheesy name and I soon chose to name the language after something that is very important to me, vitamin C. Please note that Vitamin C does not attempt to emulate the C language. ascorbic compiles .asc files into Sebae assembly files. You can find more about Sebae at Mike_L's website, http://tamale.net/ Decimal integers: 123, 088172 Hexadecimal integers: 0x1234, 0xDEADBEEF Octal integers: unsupported -Mike_L TODO if statement + block consider removing ; block terminator sub parameter sub return value definition lists multiple parameters multiple return values CHANGELOG 11-10-01 removed `expected semicolonsymbol' error from Parse_Block added support for negative numbers: -1, 5 + -5, -(1 + 2), - 0x1234 =) implemented nested comments, /* comment /*nested*/ */ 11-07-01 completed Assembly_Shift, see shiftest.c folded Assembly_Shift into Assembly_Generic completed support for lsh and rsh 11-06-01 changed Assembly_Add to Assembly_Generic added support for subtraction added support for or, xor, gt, lt, equal, multiply, divide created Assembly_Shift 11-04-01 added duplicate symbol name check to Assembly_AddSymbol added support for hexadecimal integers (0xFFFFFFF) to Tokenize_Integer added support for octal integers (037777777777) to Tokenize_Integer commented out octal integer support in Tokenize_Integer created Tokenize_Comment, supports // comments 10-17-01 created Assembly_PurgeLocalScope, Assembly_Value, Assembly_VariablePlace created Assembly_Variable, Assembly_Add, Assembly_Assign rewrote Assembly_Integer created Assembly_SetupSymbolTable, Assembly_EntryExit 10-16-01 created Assembly_FindSymbol created entry and exit defns created File_OpenOutput, File_CloseOutput changed code to use fprintf() instead of printf() Ascorbic produces first working assembly code!!! =) 10-15-01 created Assembly_WalkForBlocks, Assembly_AddSymbol overhauled Assembly_Write, Assembly_Sub renamed *definition to *type 10-07-01 created Ascorbic_ErrorParticle 10-05-01 Added optional parameter definition to sub, updated grammar created Assembly_Write 09-29-01 fixed stack corruption bug, particle.c: char milk[32]; milk[32] = 0; Thanks to Zhivago@OPN for finding this bug ascorbic uses 28MB of RAM to create the parse tree of the 1MB big.asc 09-29-01 fixed Tokenize_Source where whitespace was not handled correctly Tokenizer errors are now prefixed with "token error" renamed `codeblock' to `block' created Parse_Block, Parse_Sub, Parse_Ret it seems that I have implemented all of the parser grammar ascorbic segfaults after being compiled with optimization, -O 09-25-01 created Parse_Definition, Parse_Statement 09-23-01 created Parse_Precedence fixed operator precedence added Particle.data as void * created Parse_Identifier created Tokenize_Endline quick test results: parser uses 30MB of memory to process 1MB of text this takes 3 seconds on my AMD K6-2 450MHz 09-22-01 created Parse_Expression, Parse_Generic need to fix operator precedence: 1+2/3+4 -> (1 + (2/3)) + 4 09-22-01 I have come to the conclusion that my voyage into the realm of general parsers has taken too much time. I will now implement a tokenizer and recursive descent parser as I had originally planned. Created tokenizer and parser grammars in vitaminc.txt renamed match.c to tokenize.c reworked particle.c particles no longer carry their string names reworked ascorbic.c PrintLine uses a better algorithm now PrintLine now converts tabs into space for line printout, arrow aligns created Tokenize_Identifier, Tokenize_Integer, Tokenize_Floatingpoint created Tokenize_Source that finishes the tokenizer 09-19-01 JustRay@OPN sent me to this beautiful URL: http://icl.pku.edu.cn/bswen/pls/ParsingTechs-APracticalGuide.pdf 09-19-01 added more rules added Particle->handle and updated functions cleaning up code testing with basic set of rules bug is eluding me, not all possible parses are being checked 09-19-01 created Match_ONEORMORE Match_CreateTree now removes useless incomplete rules 09-18-01 created Match_CreateTree, Match_Rule, Match_Or, Match_Sequence changed MatchStruct to doubly linked list created Match_Append, Match_Remove reformed loop in CreateTree to properly process completed matches match.c seems to be a working Earley parser ;) 09-15-01 changed Particle->place to Particle->start, added Particle->end updated particle.c 09-15-01 *sigh* my pattern matcher is flawed rewriting it as an Earley parser 09-13-01 Added sub definition Fixed bug with `keep' rules where children were being added even if the rule didn't match 09-10-01 Added Rule.handle to allow extraneous tree nodes to be eliminated mem usage is dramatically decreased, processed 250kb input file segfaults on 1MB input file found the bug in match.c, setting place to input->child[p] where p could be more or equal to input->childnum 1MB file takes 87MB to parse... thats an 83:1 ratio =) 09-09-01 asc->place is now set at the point where a rule fails changed matching functions to return -1 if no matches, this allows the ZEROORMORE rule to return a successful 0 matches 09-09-01 fixed bug in Match_Sequence where pattern items were being skipped testing on a 60kb source code file... uses 56MB of RAM!?! changed Particle->child realloc() increments from 16 to to, now 36MB removed ival and fval from struct Particle, brought down to 32MB commented out code to duplicate particle names, down to 23MB this thing is a memory hog, but that's ok for my first compiler =) nice to know that it fails assert() when out of RAM 09-08-01 removed Match_Nodify changed Match functions to return int = num particles consumed Match_CreateTree() now just requests a single node of type vitaminc YAY! 09-08-01 Fixed bug in Match_CreateTree where processed particles were not getting skipped Bug alert! 09-08-01 chopped parse.c, Parse_Tokenize now does characters only changed Match_Pattern to Match_CreateTree created Match_Rule, Match_Multiple created Particle_Copy, Particle_AddCopy removed unneeded asc parameter from Particle functions created Match_Sequence, Match_Or tested the rules... they work! yay! 09-06-01 fixed Makefile so BUILDNUM will be updated found 195 makes of ascorbic in ~/.bash_history, setting BUILDNUM to 195 created Parse_Identifier, Parse_Hexadecimal, Parse_Integer created Parse_FloatingPoint, Parse_Character fixed Parse_Tokenize fixed bug in Match_Pattern where last matching token was not being kept 09-05-01 created Match_Pattern, RuleStruct, CheckPattern 09-04-01 changed Parse_Whiteout to Parse_Tokenize created particle.c created Particle_Add, Particle_Free, Particle_New, Particle_Print created Particle_AddNew Particle_Tokenize now handles string literals and escaped chars 09-04-01 renamed QuickScript to Vitamin C renamed qsc to ascorbic rewriting compiler created main, File_ReadSource created Ascorbic_Error, Ascorbic_PError, Ascorbic_PrintError reads source code created Ascorbic_PrintLine, Ascorbic_ErrorHere created Ascorbic_PrintErrorOnLine created Parse_Whiteout 09-01-01 updated bytecode.c to produce proper bytecode 08-01-01 updated bytecode.txt 07-10-01 added filename and line numbers to LexicalPartical added filename to makeparticlelist(), ReadParticles() modified makeparticle(), addparticle(), printparticle() 07-08-01 21:01 writebytecode() now emits file header and footer produce*() now emit bytecode first Sebae bytecode!!! 07-08-01 15:12 fixed stack in the produce*() funcs created producesubtract(), subtract rule, added subtract to value rule moved symbol table functions to symbol.c added stringliteral to value rule commented out some printfs fixed bug in QuotedString() 07-08-01 12:50 created SingleCharParticle(), QuotedString(), Number(), Identifier() fixed ReadParticles(), ParticlizeText(), ExtractParticle() removed particle types alpha, alphanum, digit added particle type stringliteral removed lexDigit, lexNumber, lexIdentifier mixed number and letter variable names are now valid string literals are particlized properly 07-07-01 23:22 adding tokenizer code... currently broken 07-05-01 21:11 created populatesymbollist() created producestatement() created produceassign() created producevalue() created produceadd() qsc now outputs basic assembler code =) 07-03-01 20:07 fixed flaw in grammar matching added ONEORMORE rule type and updated CheckWithRule() made a bunch of generic GrammarFunc()s added name and defaultproduct to GrammarRule struct and makerule() rewrote quickscript.c to use the generic funcs and new makerule 06-31-01 22:31 add, assign, and print are all parsed correctly fixed quickscript.c 06-23-01 13:52 changed rules to free up unneeded particles created intval particle type added int val to makeparticle() paramater list renamed LexicalParticle.intval to val 06-23-01 00:39 rewrote GrammarMatch(), it is not so fast now, but works perfect 06-22-01 10:05 found flaw in grammar matching... trying to fix 06-22-01 07:39 changed GrammarMatch to make GrammarFunc returned particle be optional 06-21-01 08:58 added explanatory blurb to readme.txt fixed bug in ReadParticles() where it was processing text backwards updated printparticle to show name if present, tree printout is NICE! =) updated ReadParticles() and quickscript.c to send particle name added char *name to LexicalParticle simplified main() to allow only one input file and one output file Changed ConverText into ReadParticles, now reads text from file handle split code in match.c into match.c, particle.c, quickscript.c, rule.c removed old token code from qsc.c 06-20-01 00:41 match.c is now building trees of LexicalElements!!! 06-17-01 01:51 Developed preliminary GrammarMatching engine in match.c 06-16-01 23:53 Made temporary WriteStatementList() and WriteBytecode() Created ExtractSymantics(), ExtractSymanticsFromLine() Created makesyme(), makesymelist(), addsyme() Created SymE and SymEList structures 06-16-01 00:37 Defined syme types and syme structure 06-15-01 02:55 Started planning grammars Moved token functions to token.c 06-14-01 20:27 Temporarily hacked ReadTokens() to split tokens that are >1024 bytes 06-14-01 02:30 Added hash function; findkey() performance is immensly improved Converted TokenList struct to store tokens as int Renamed MakeTokenList() to maketokenlist() Renamed FreeTokenList() to freetokenlist() Allowed escaped quotes in quoted string Created addtoken() Fixed bug in ReadTokens() where memcpy() was called with the wrong parms 06-13-01 02:29 Created endoftoken(), slow findkey(), addkey(), TokenizeText() 06-13-01 00:51 Defined struct TokenList in qsc.h Created MakeTokenList(), FreeTokenList(), ReadTokens() Updated main() 06-12-01 02:59 Created qsc project and Makefile. Started qsc.c.