nfirvine.comwiki

MarkedUpSourceCode

Filed in: Ideas.MarkedUpSourceCode · Modified on : Sun, 11 Oct 09

There is a problem with source code. Source code has exactly two audiences which must both be catered to simultaneously: humans, and machines. However, the main audience is machines, as non-functioning code is worthless.

Herein, I propose MUSC: Marked-Up Source Code, where ordinary source is embiggened with markup!

The main goal is to separate the machine parts from the human parts. The source is meant for the machine, and the markup is meant for the human. Yet, while one is meant for a particular audience, the other part is not excluded from being read by its unintended audience; i.e., the source should be readable by a human, and the markup readable by machine.

For example, let's take this spicy Python sample:

'''
This module is a MUSC example.  Not useful, not meant to be.
'
''
import sys

def foo(bar, baz=None):
    '''
    This function prints bar, and ignores baz.
    '
''
    print bar

if __name__ == '__main__':
    foo('bar?')

Instead, it should be represented as this, in MUSC:

  1. <musc xmlns='http://nfirvine.com/musc' xmlns:mc='http://nfirvine.com/musc-convert>
  2.     <mc:orginalsource>
  3. '''
  4. This module is a MUSC example.  Not useful, not meant to be.
  5. '''
  6. import sys
  7.  
  8. def foo(bar, baz=None):
  9.     '''
  10.     This function prints bar, and ignores baz.
  11.     '''
  12.     print bar
  13.  
  14. if __name__ == '__main__':
  15.     foo('bar?')
  16. </mc:originalsource>
  17.  
  18.     <source lang='text/x-python' xmlns:py='http://nfirvine.com/musc-python'>
  19.         <py:docstring level='module'>
  20.             This module is a MUSC example.  Not useful, not meant to be.
  21.         </py:docstring>
  22.  
  23.         def foo(bar, baz=None):
  24.             <py:docstring level='function'>
  25.                 This function prints <py:paramname>bar</py:paramname>, and ignores <py:paramname>baz</py:paramname>.
  26.             </py:docstring>
  27.             print bar
  28.  
  29.         if __name__ == '__main__':
  30.             foo('bar?')
  31.     </source>
  32. </musc>

(Note that I might have unintentionally/lazily not escaped XML entities.)

So far, not too impressive. It's basically just the code with bits of the AST converted to XML. However, we've added something in the docstring for foo: by marking bar and baz as parameters to the function, we could link back to the function definition. This of course has already been done 100 times: RST, ctags, epydoc, javadoc, etc. But this is just the beginning:

  • We could style elements with more than ASCII art for our eventual target (HTML).
  • We could add arbitrary metadata to AST nodes, like revisioning or bugtracking information.
  • We could add non-text things, like images.
  • We could mark that something is a table, like a multi-line list, so that would could size it dynamically in the output.

Furthermore, we could perform various transforms on the code, afforded to us by the magic of XML:

  • We could export super-formatted Python code: as you're writing the code, you don't have to put breaks in except where it matters to the structure of the program, since the exporter takes care of making it pretty (with some help from some user-defined styling).
  • We could export obfuscated code (if we really wanted to'').
  • We could export an import tree.

Prior Art

Obviously, as mentioned above, there are systems that do most of the sort of thing I'm talking about by embedding things in comments or docstrings using yet another markup language (not necessarily YAML, which I think is rather nice). But they've all got their own syntax to learn, their own escape codes, and their own way to embed into the programming language. Which means you have to have a parser for each tool. MUSC would consolidate features that are common to most languages into the MUSC core and outsource language-specific things to language-specific namespaces and plugins.

Part of the problem lies with the way current solutions attempt to shoehorn something into the language they're augmenting: javadocs live in special comments (like ctags), and RST lives in docstrings (which might as well be special comments, except that they're added to the object for introspection purposes). The code is a first-class citizen and the metadata is second-class. With MUSC, the code and the metadata are on the same level: neither is nested inside the other.


Powered by PmWiki