Friday, January 25, 2013

Where the real power of computers lies - raw text

There are two things that really drive computers:
  1. Conventions/Standards
  2. Raw Text
Almost every PC operation (with the exception of UI rendering of images, controls and sound) ultimately use simple raw text as the true container for information.  These simple text messages are sent across your machine and the Internet to transfer information.
Many times data is stored in a binary form and there are complex protocols for transferring and interpreting data, but really, when you unpack everything and see where it is ultimately used, it is generally just text.
Even the languages used to express programs consist of text.
This is because text embodies human-readable and interpretable information.  It is the ability of humans to read text that ultimately gives text its value and capacity to hold information.
This is a significant point which I think our modern gadget/image/sound driven world is forgetting as they design software.
Many times I see information (ultimately in the form of simple text of course), packaged into forms that are difficult to extract.  Even UI that presents information to you on say a web page or a PDF file often can prevent the extraction of the raw text for transfer to other places.
Just think about it.  Where would you be with out Cut/Copy and Paste?  The clipboard is probably the most-used feature of any GUI.  In the old command line days (still alive and well today) it was the ability to pipe text between commands that gave UNIX and DOS users the power to take disparate tools and combine them efficiently to perform complex tasks.
Can you imagine where we would be without piping and clipboards?
With standards like UTF-8 there just aren't any more excuses.  We can pass virtually all information in human-readable text and have the most efficient systems possible.
So here's my tip for the day - are you coding up tons of UI or interface code just to ultimately get some text across a wire or into another program or across to a user?  If so, you may be approaching your problem wrong.

Tuesday, January 8, 2013

Ideas on building software

Over my many years of programming time, I have often had ideas on how the tools should be made to make building software more efficient and pleasurable.
There have been lots of advances over the years but I, being the old fart that I am, still think very highly of Kernighan and Ritchie, and their first OS.
I long for the days of simplicity.
The creation of the IDE (Integrated Development Environment) is more of a boon to software companies wanting to rope developers into their pet APIs of choice.  After all, there is only so much a developer can learn so if you can keep him doing things your way you can sell him software forever.
There is so much creativity out there it's quite hard to corral these guys into a single paradigm, but for most it has been done.
I admit there is synergy in having a bunch of programmers all working on the same language and tools, I'm not against that, but I am against a single paradigm being boxed up and shoved down your throat - you want to work differently, then make your own tools.
The openness of a simple compiler, linker and make system I still like best.  The tools are individually comprehensible and manageable - especially if the input and output are mostly text files.
I wish they had gone further in developing the multi-phased approach to building software.
Here's a list of the parts I see as being nice to have separated and individually accessible:
A Pre-compile Phase Macro Package
The old C compiler had this built in which you could utilize by invoking a build so you could see the output for help with buggy macros.  But I think such an intermediate file is actually quite useful to have in the build process as a separate phase of the process.
It is incredibly useful to place things in one place and have other files build from those definitions in different ways. Maybe you have common constants you want to use across php, javascript, C and SQL.  Maybe you have some key core data that would really speed things up if it were hard-coded in each layer of your distributed design.  Keeping things identical can only be done easily if the parts are derived from a single source combined with formatting or other options to create the output.
These pre-compiled output files can themselves become checked-in source that is only rarely built when the core definitions change.  Why precompile EVERY time right?
File Dependency Modules
The real nightmare of make-files is keeping dependencies straight.  When done manually, they create a kind of straight-jacket that eventually ties you in knots.  I envision tools that inspect the source tree for how files are related.  It could be as simple as looking for #Include statements but it could get quite sophisticated as well.  Imagine just building each function in a separate file with no references to any headers or other modules of your code.  The dependency tool simply figures out based on the symbolic analysis of the modules what files it depends on.  Of course the modules would be language specific but they could also be invoked further down the build process like at link-time.  The output of these tools would be dependency description files.  Date/Time stamps can be used to optimize only re scanning things that changed and naturally, these intermediate files can be checked-in as well.
File location based on dependencies
I have long thought of this as one of my cooler ideas.  Suppose you had a set of directories titled Layer0, Layer1, ... where Layer0 constitutes all the base OS APIs you have at your disposal - things that you know will always be there and you can't change.  Above that in Layer1 is only code that depends on Layer0 code.  Linking this code need know nothing above Layer1 to work.  Layer2 is just code that depends on Layer1 and lower code, etc.
If  you make a change that adds a dependency to an equal or higher layer, the file gets moved to the higher layer automatically, or a warning gets issued and it won't build till you put it in the right place.
Programmers become very aware of how their changes effect the dependencies of a project and they instinctively know that the lower order layers are likely simpler and faster bits of code.
I would think such a system will have a natural ordering to it that will promote faster and better factored code.
Compilers to a common AST structure
An AST stands for Abstract Syntax Tree and is an internal structure that compilers use as an intermediate form of the code as it parses the source code.  What if you could output the AST in text form?  You could see how the parser is interpreting your code and errors would be much better understood if you could see the associated AST.  What is nice is that this intermediate form of the build process could also be used to help editors do smarter editing and syntax errors would have so much information to present on a problem as to make it very clear, most likely, what is wrong without needing to create really great error messages for the problems.
What is even more powerful is the ability to write parsers for multiple languages that output the exact same AST format.  Now we have cross-language support without runtime or target machine dependencies!
And, as suggested in other areas of this article, the AST files could also be checked in and the re-parsing of unchanged files skipped in the build process.
AST to target object compilers
The next phase involves taking input from the AST files and building linkable objects or IL (Intermediate Language) output files.  It's not a long move from AST form to IL or Linkable form - it could be a very fast process.  Optimization options go in this phase of processing I should think.
Run-Time and Link-Time tools
I recall trying to better understand .obj files sometimes to help me understand link dependencies better.  We just didn't have the tools.  But it seems at this point one could start testing, profiling and debugging your code before its even compiled into final form.  Unreachable code could be found and dependencies analyzed here.  These tools could feed back into the file location based dependencies by noting where and how many link ends there are.  Graphical models could be constructed to see dependencies across modules and suggestions made on better ways to pull parts together into modules.
Final Compilation and post compilation optimization
Another pass can be used to construct stand alone .exes or .dlls or IL assemblies.  Here is where massive configuration options can be used for proper packaging.
I have always hated COM because of the way it forces you to create so many layers of overhead that you might never really need.  COM has a great way of abstracting things out so you treat everything the same but many times its just not needed.  I have often wish I had the choice of how to connect with other code - use COM with IDispatch interfaces, use COM with core interfaces, use direct DLL export linking, or just paste the code into my code.  These are all legitimate ways to go, I wish I had a world where I could just use all of them together and change them as needed.
Build analysis tools
How long did each phase of the build of parts of a big project take?  How many times did the same module have to be linked in for others to use?  Can I see a dependency map of all parts of the process from start to finish?
Graphical Programming
Finally, the ultimate dream - building software like legos.  If there were a nice way to represent code via a hierarchy of ORM models with different shaped knobs on them to represent interfaces.  You pick up a piece and you instantly see a list of other parts it can connect to and you just graphically drag them together to form a new part.
It's all a dream but some ideas that I hope might inspire someone to think of a way to do this better.
I'm not a compiler writer so I can't really claim any real expertise here, but it just seems we could do things better.