Firefly! Now in technicolor!

Posted by DesertFox on Jan. 16, 2015, 7:29 p.m.

Also, look! A blog!

As the IRC chat regulars and readers of my last blog know, I'm developing a language. Some of you think I'm crazy! But the Banana God tells me YOU are the crazy ones! Wahahahaha!

The serious issue of mental health aside, I've kept this project a little quiet - not really talking about it or getting into my goals and plans for it. Part of the reason I've been so quiet is that I have a history of starting things and never finishing them. I always get distracted by something else; my hard drive is littered with old projects in a half-dozen languages, all gathering metaphorical dust. I've made A* and game engines in everything from Actionscript to Objective-C, but its been years since I 'published' anything outside of for-someone-else work. I didn't want to talk about a project I wasn't sure I was going to follow through on.

Then, I realized I simply don't care about making content - I just want to make things - engines, frameworks, tools. And what is a programming language but a tool? I've spent the last four months dripping knowledge about language parsing and virtual machines into my mind. There have been nights where I could not sleep because my mind kept iterating on some small facet, wouldn't stop pulling on a thread of an idea until I'd laid bare the consequences. It's the cliche of the mad artist, unable to stop himself from composing - but the composition is of logic, not music. There is a strange beauty in the carefully placed domino sets that we call interpreters and compilers - and that is exactly what they are. Complex, for sure, but they only do what you tell them to do.

So I'm making a language, and its name is 'Firefly'. I have reasons, and I almost called it 'Serenity'. Interestingly, the TV show is not the origin of either possibility.

The language is just a part of a set of things I'm building, but it is the cornerstone, the foundation of the project. It is my magnum opus. I've become fascinated with the self-hosting and bootstrapping of compilers and interpreters. Its an idea that just feels so right, as if it were the satisfying *click* of the last lego piece snapping into place. It's the programmer's version of origami - "see my program, watch it unfold!" You use a simpler grammar or restricted subset of a language to write a compiler that you then use to make a more complicated version of your language. There are several benefits to doing this as well, chiefly that you can build your language piecemeal as well as the compiler being a test of itself which is really cool.


One of the most common pieces of feedback I've received is something along the lines of 'how does this improve on existing things?' In other words,I know about the XKCD comic.

But sometimes, you can make valid improvements, even if they are subtle. Sometimes, that 15th standard is good enough to switch to.

So what am I trying to improve upon?

Mostly, Python.

Python has so many itty bitty teensy weensy annoying quirks, mostly due to the whole 2.x/3.x schism. And "Argh!" they are everywhere. Things like string formatting, the `except:` syntax being screwy (looking at you 2vs3), bitwise manipulation decorators and decorator generators being just a wee bit more annoying to mess with than they should be, and the god-fucking-awful module management. I like Python a lot, but sometimes it feels that there's just a few things missing, and I want to remedy this. Basically, I want to bring Ruby's 'principal of least astonishment' to a more compact and syntax-flexible Python. I am also handling things like scope differently.

Python is like this, only you're stubbing your toe on some stupid legacy syntax.

Also Python has a GIL. I know some libraries for and forks of Python to deal with this, but *stubs toe before he can complete the sentence*

"Argh!"

My language is also a functional language. I've got planned constructs for mixins, lazy evaluation, and something that is supposed to be monads. Theoretically, objects don't even exist, and the role of classes are played by factory closures that produce instance closures that wrap around a hashmap/dict that it accesses differently depending on arguments. Technically you could even implement dicts and lists with nothing but closures (the speed would be terrible), but I'm balancing philosophy with practicality - hence the 'theoretically'.

I also intend to make parsing a native construct within the language. I want to reduce the code written in C down to the vm, API-hooks, and a very simple hybrid symbolic/pseudo-EBNF parser. I'll then use that to define a simple version of my language, which in turn will add the final syntax. Then, not only do I have my language, I can then in turn embed *that* ability directly into my language, allowing on-the-fly syntax-extension or parser-generation - all compiling down to the base vm.

The best part is that all of this is to build the platform for the actual idea. I have a specific end-game in mind, and writing a VM + programming language is simply the easiest way of achieving it to my picky specifications. I'm actually trying to figure out how to get investors, because I plan on making this my full-time job. Either self-employment or convince Chaotic Moon to let me work on this and still keep control of the idea. Tough stuff. More on this specific subject (the grand end-game, as well as investmonies) in a future blog.


Now for some language snippets! Here, have a Fizzbuzz test!

-- Firefly fizzbuzz test
for n in 1...100:
	if n % 3 == 0:
		print('Woof' if n % 5 == 0 else 'Fizz')
	elif n % 5 == 0:
		print('Buzz')
	else:
		print(n)

At a very basic level, the snippet looks awfully like python. If it weren’t for the range operator and the Perl-style comments, it’d compile and run under Python. Most of the syntactic and semantic differences will pop up in stuff in upcoming progress blogs (stuff such as scope handling and classes)

The produced syntax tree is a bit different/more complicated than from before - it has more information for things like scope:

(*module 'interpreter' -> interpreter
	body=[
		(for-loop
			var=(name 'n')
			iterable=(irange
				first=(literal
					value=(int '1')
				)
				second=(literal
					value=(int '100')
				)
			)
			body=[
				(branch
					condition=(infix '=='
						left=(infix '%'
							left=(name 'n')
							right=(literal
								value=(int '3')
							)
						)
						right=(literal
							value=(int '0')
						)
					)
					true-branch=[
						(call
							target=(name 'print')
							args=[
								(ternary-if
									first=(literal
										value=(string 'Woof')
									)
									second=(infix '=='
										left=(infix '%'
											left=(name 'n')
											right=(literal
												value=(int '5')
											)
										)
										right=(literal
											value=(int '0')
										)
									)
									third=(literal
										value=(string 'Fizz')
									)
								)
							]
						)
					]
					false-branch=(branch
						condition=(infix '=='
							left=(infix '%'
								left=(name 'n')
								right=(literal
									value=(int '5')
								)
							)
							right=(literal
								value=(int '0')
							)
						)
						true-branch=[
							(call
								target=(name 'print')
								args=[
									(literal
										value=(string 'Buzz')
									)
								]
							)
						]
						false-branch=[
							(call
								target=(name 'print')
								args=[
									(name 'n')
								]
							)
						]
					)
				)
			]
		)
	]
)

Also, it now can actually execute stuff!

1

2

Fizz

4

Buzz

Fizz

7

8

Fizz

Buzz

11

Fizz

13

14

Woof

16

17

Fizz

19

Buzz

Fizz

22

23

Fizz

Buzz

26

Fizz

28

29

Woof

31

32

Fizz

34

Buzz

Fizz

37

38

Fizz

Buzz

41

Fizz

43

44

Woof

46

47

Fizz

49

Buzz

Fizz

52

53

Fizz

Buzz

56

Fizz

58

59

Woof

61

62

Fizz

64

Buzz

Fizz

67

68

Fizz

Buzz

71

Fizz

73

74

Woof

76

77

Fizz

79

Buzz

Fizz

82

83

Fizz

Buzz

86

Fizz

88

89

Woof

91

92

Fizz

94

Buzz

Fizz

97

98

Fizz

Buzz

I’m still working on crystallizing my language’s syntax (I have a bunch of spec sheets, but a lot of it is now out of date), but as expected the end result is like a fine blend of Ruby and Python. The code so-far is very flexible, and in a few months I’ll be rewriting it all in C because it is currently written in (you guessed it) Python, and so it inherits a lot of Python’s quirks - and damnit if part of the reason I'm doing this is to fix shit like that!

My current goal is to get a syntactically complete version of my language running, after which I will start the whole language-definition-language, get the C code running as an interpreter, then get it compiling to bytecode and integrate it with version 2.0 of the VM. Lots of work ahead of me, I know.

Also, the technicolor! I've added syntax highlighting to the process, currently far from perfect because it only highlights the terms that are part of the final tree. This is html output by my parser, unmodified.

I had it spit out HTML, of which this is a screenshot! Because lazy and couldn't remember how to embed HTML into 64digits! Also this is the snippet from the last blog!

I've got a long way to go, but I've made a lot of progress. Keep moving forwards.

Comments

Josea 9 years, 3 months ago

Congratulations on working on a programming language! I love everything about programming languages and love to see other people like it too. Back at uni I took a whole bunch of courses on it (wrote a compiler) and then taught several quarters an interpreters class. It seems to be a pretty popular topic lately, god, everybody and their pet is making a new programming language these days.

Making a language looks simple at first. Data structures look simple enough, parsing algorithms are well known, ditto for a whole bunch of algorithms for analysis, code generation and optimization. It looks as if it was just a matter of designing the syntax and plugging all the pieces together. I worry that this apparent simplicity makes it all too easy to just whip out a new language that just grabs and tweaks the syntax of existing languages without actually innovating about the way we write and reason about programs. It also makes it easy to pretend to be innovating by just piling up a bunch of smaller language features.

I say all that because that's exactly what I did last month. I was tired of Python's dynamic types, found no other language that satisfied me, so I decided to write a new one just for the hell of it. In the end I ended up scrapping it because I realized I was doing nothing but Python with static types, and pile of smaller features I grabbed from other languages. I figured, if I'm just going to repackage existing concepts, what's the point? What am I actually bringing to the table? And that took out all the fun from it for me…

I'm still looking for that 'thing', that new concept that makes me think 'wow, that's an interesting way of doing things'.

Sorry for turning this comment into a miniblog…

So, are you familiar with LLVM?

DesertFox 9 years, 3 months ago

My thoughts on apparent simplicity:

If it looks simple, and is, it is simple.

If it looks simple, and isn't, it is complex.

If it looks complex, it could just be complex, but it is more likely complicated.

Languages are complex. They are information-dense creatures, some of them succinct (Ruby and Python), and others are absolute beasts (C++). Minimalism and density is part of the mindset of my language. Everything that I consider as a feature, any integration must feel natural and not interrupt the design of the language, nor must it create clutter. Any further syntax sugar can later be implemented by extending the parser from within the language. That's the point of meta-programming.

Part of my interests lie in exploring language-oriented programming with this project, as that sort of thing could be incredibly useful for me.

That being said, I'm trying to avoid making my project into a frankenlanguage. I'm focused on getting the simplest viable syntax up and running for the virtual machine. For instance, I'm currently absolutely agonizing over how I want to handle scope syntaxwise. It is one of the few real remaining puzzle pieces before I can lock it all down and begin writing the parser in C.

To answer your end-of-comment question - I'm a little familiar with LLVM as it was part of my research on virtual machines. I focused more on LuaVM/YARV as they were more what I was looking for. Plus, the temptation of implementing a VM myself was impossible to say no to :P

s 9 years, 3 months ago

Mind explaining that AST? Would've assumed textual AST to be more minimalist lisp looking

Acid 9 years, 3 months ago

This also allows you to be creative in your execution of the language. Will it be self-compiling or need a compiler/interpreter? I know you already pretty much stated that you have a compiler that essentially tested itself upon use, but it's a thought.

I enjoy the theory and ideas behind computer science, but I generally just use programming as a means to an end - I would NOT enjoy writing my own language. :P

DesertFox 9 years, 3 months ago

@s Its a bit more hefty than a standard AST I suppose, though for a reason. I'm using a pratt parser which affords me a lot of flexibility mid-parse - it means I can do fiddly stuff like recasting, named children, and all sorts of other bits of logic that are more difficult to do with BNF/EBNF.

These are just the formatting rules for when I print the AST, which doesn't contain every bit of information but would be enough for reconstruction. Its not exactly perfect (missing commas between items in a list, no quotes around namespace items, other symptoms of a work-in-progress)

A symbol must have an id, and may have a value.

(eol)			-- Symbol with only id
(name 'foo')	-- Symbol with id and value

Symbols can have also have named children

(infix '*'
	left=(name 'foo')	-- For formatting, named children happen on the next line, indented
	right=(literal
		value=(int '5')	-- Same indenting rules for nested children
	)
)

List children are enclosed in brackets

(foo 'foo'
	bar=[
		(baz)
		(qux)
	]
)

Symbols can create new scope as well.

A symbol that creates scope is prefaced with a '*', and lists things in its namespace after the '->'. This is more for debugging because this info could be reconstructed from the AST because you can infer what creates a scope as well as what names are in it by traversing it top-down.

atom Foo:
	?
end
atom Bar:
	?
end

produces this as an ast (the interpreter inherently takes place in a module named 'interpreter', also things register themselves in their own namespace for things like recursion)

(*module 'interpreter' -> interpreter, Bar, Foo
	body=[
		(*atom 'Foo' -> Foo
			from=(name 'Atom')
			body=[
				(operator '?')
			]
		)
		(*atom 'Bar' -> Bar
			from=(name 'Atom')
			body=[
				(operator '?')
			]
		)
	]
)

Its the named children and the namespace stuff that make it not-minimalist-lisp-looking :P

@Acid

Quote:
Will it be self-compiling or need a compiler/interpreter? I know you already pretty much stated that you have a compiler that essentially tested itself upon use, but it's a thought.

I don't have anything self-compiling/interpreting yet - its merely planned. There will always need to be the initial bootstrap from C, but I want to minimize the boilerplate code.