The Art of Programming

Sunday, May 20, 2007

Some thoughts on MySQL optimization

First, the BENCHMARK function. It can be used to run a command a number of times.

e.g. SELECT BENCHMARK(1000, "SELECT COUNT(*) FROM categories")

The EXPLAIN keyword, followed by a SELECT query, gives info for that query. I have to dig more into this.

Analyzing your tables seems to be very important. It re-creates keys, indexes or something like this. Not sure, but someone tells that it improved his queries 1000 times. Well,
mysqlcheck -Aa -uroot -p does it.

Use persistent connections to avoid connection overhead.

Make sure all your foreign keys are indexed.

Make sure you foreign keys match the type and size of the main keys.

Use the smallest data type/data type size possible.

Monday, November 06, 2006

Conquering complexity

Techniques to defeat the biggest problem you have in software development: complexity. Adapted(again) from "Code Complete 2".

divide the system into subsystems at architecture level so you can focus on a smaller amount at a time
design class interfaces to ignore inner workings(encapsulation, one of the best tools one can get)
making a consistent abstraction, so you don't have to remember arbitrary details
avoid global data because it adds to the amount of code you have to juggle at one time
avoid deep inheritance. It is intellectually demanding
avoid deep nesting of loops and conditionals, to avoid killing processor time and gray matter
avoid gotos - they break the linearity, making code hard to follow
carefully define your approach to error handling. You will use a proliferation of arbitrary techniques otherwise
do not allow classes to become monster classes
keep the routines short
use clear, self-explanatory variable names
minimize the number of parameters passed to a routine
use conventions, to spare your brain the challenge of remembering arbitrary, accidental differences between different sections of code

Saturday, October 28, 2006

Reasons to refactor your code

Whenever reading your code, if you stumble upon one of the following cases, it is probably better to stop and refactor that piece of code.

Duplicate code - There is no reason to have duplicate code. Try to respect the DRY principle(Don't repeat yourself). As Parnas said best, "Copy and Paste is a design error". Also, coding will become absolutely boring.
A routine is too long - In OOP, you rarely need a routine longer than one screen. Consider breaking it into multiple routines.
A loop is too long or too deeply nested - Consider refactoring part of the code as routines, or changing the algorithm. Nested loops are one of the biggest performance penalties.
A class has poor cohesion - if a class has unrelated responsibilities, consider changing it.
A parameter list has too many parameters - If you need to pass too many parameters, consider merging them in a cohesive class or rethinking the problem.
Changes in a class tend to be compartmentalized - this may be a sign that the class should be broken into smaller ones.
Changes require parallel modifications in different classes - this is a sign that they are tied together. Try cut most of the dependencies. This kind of refactoring can be a real challenge, but it is worthy.
Inheritance hierarchies need parallel changes - This is a special kind for the problem above.
Case statements need parallel changes - consider using inheritance with polymorphism instead of case.
Related data items that are used together are not tied into classes - the first time you code/design, you may overlook some classes. Take your time and create them.
A routine uses more features of another class than of its own - probably it should be moved into the other class
A primitive data type is overloaded - For example, you may use int to represent both money and temperature. It is better to create a Money and a Temperature class. By doing so, you will be able to impose custom conditions on the types. Also the compiler will not allow you to mix money with temperatures.
A class doesn't do much - Maybe it should be merge with another.
A chain of routines passes tramp data - if a routine takes some data only to pass it to another, you should probably eliminate it.
A middle man object does nothing - same as above.
One class is very intimate to another - this works against one of your most powerful complexity management tools: encapsulation.
A routine has a poor name - In the best case, you can rename it. In the worst, the problem is the design(see a previous post called "About routines"). The name is just a sign. Anyway, take your time to solve this one.
Data members are public - This is plain wrong. Today, you can use properties in many programming languages, so hiding data behind them is very easy.
A subclass uses a small percentage of the parent class - usually, this denotes wrong inheritance design.
Comments are used to explain difficult code - Comments are very good, but creating difficult-to-understand code and commenting it is plain wrong.
Global variables are used - There are few cases when global variables are the only logical option.
You need setup/cleanup code before/after calling a routine - try to merge this code into the routine.
A program contains code that might be needed someday - The only way to write code taking into account future releases is to write it as clear and obvious as possible, enabling quick understanding and modification.

Friday, October 27, 2006

Learn form your bugs

We are humans, so we make errors. There is no perfect software, meaning that there is no perfect programmer. Yet, there are immense differences between us. At one end there are the ones that never release a software because it is to buggy to be of real use, at the other end the ones that release almost-perfect software.
If you want to go closer to perfection, the first step is to learn how to learn from your mistakes. Only then you'll be able to make progress. Every error is an opportunity to:

Learn about the framework you're using - a bug may appear because you do not understand correctly the underlying technology. So it is an opportunity to improve your knowledge about it
Learn about the kind of mistakes you make - there are patterns in your errors, like keeping forgetting to initialize an array, making typos etc. Learn about them, so you can pay more attention to those certain facts when developing
Learn about the quality of your code from a reader's perspective - while debugging, you must read the code. You will observe its readability and the points that need improvement
Learn about how you solve problems - debugging is problem solving. It requires systematic approach, deep thinking etc. You can learn more about the effectiveness of your method, gradually improving it

So, you should be very happy when encountering a error! Not really, but try to treat as opportunities rather than problems, because int the long run it makes all the difference in the world.

Thursday, October 26, 2006

Undo closed tab in Firefox2

Firefox 2 has a very hand shortcut that undoes the closing of a tab: CTRL+SHIFT+T. Very neat.

Sunday, October 22, 2006

Formal inspections

Formal inspections are an efficient and easy way to find errors in software. A formal inspection is a meeting where the code is reviewed. It is planned, moderated and must have a concrete follow-up. When doing an inspection, there are some principles to keep in mind:

the scope of an inspection is finding errors, not correcting them
an inspection is not a personnel evaluation, so it is better to keep management out of this

People that participate in the inspection are assigned roles. Here they are:
Moderator: keeps the inspection running at the required pace, not to be too slow or too fast to catch errors. Must be technically competent, not necessary an expert, but must understand the important details. He organizes the meeting, by providing checklists for the others, setting the date, preparing the work environment etc. Also, he must make sure that there is action following the inspection. He is not directly involved in the inspection, instead he makes sure that it runs as planned.
Author: The author of the software. In case the reviewers are not familiar with the project, he holds an introductory session, providing general knowledge of it. Besides that, he has the duty to explain parts of the code that are difficult to understand when asked and to explain things that are treated as errors and are actually acceptable.
Reviewer: The one that finds the errors. Must prepare beforehand by reading the materials that are supplied by the moderator. Must keep the focus on error finding, not error repairing.
Scribe: The person that records all the errors that are found.

It is recommended to never have less than 3 persons in an inspection(at least the roles reviewer, author and moderator should be played by different persons). More than 6 persons is a bad idea, because the group becomes hard to manage. Also, I repeat that management should stay out of this. The presence of a manger will make it seem like a evaluation with consequences for the author. Also, being a technical activity, the presence of a manager will add little if any value to the inspection.

The inspection has some predefined steps.

Planning - the author gives the code to the moderator. The moderator must select the reviewers and provide them with information that keeps them focused on the important aspects.
Overview - when the reviewers are not familiar with the project, the author should make an overview of the system. Yet, this step should be left out, because the overview creates a mindset and this mindset might hide errors. The code and design should speak for themselves and need no introduction.
Preparation - The reviewers take some time alone to read the materials handed over by the moderator. They also have a look at the code, to get an initial idea about what comes next. They should be assigned perspectives, e.g. one should keep an eye on security, another on data validation etc.
Meeting - The actual meeting. Its rhythm is crucial. A slow meeting will decrease concentration; a fast one will let errors slip away. Even if there is no widely accepted inspection rhythm, probably 150-200 non-blank, non comment lines of code per hour represent a good start. Yet it greatly depends on complexity, design goals, experience and many others.
Report - In less than 24 hours, the moderator creates a report with all the discovered errors and their importance, making it public. That ensures that there will be no forgotten errors.
Rework - The initial author or another is assign to fix the discovered errors.
Follow-up - The moderator must decide if another inspection is needed to re-test the program and look at the previous inspections and try to find patterns in the errors, finding causes for them.
Third-hour meeting - Even if solutions are not to be discussed in the inspection, some might feel the need to talk about them. In this case, the moderator should organize an informal meeting with this scope.

The main objective of an inspection is to find errors. The author should not feel threatened. So, the reviewers must be trained in making good comments ("I never seen something so stupid" is not a good comment). Their job is to find errors, not evaluating the programmer. They should also not suggest solutions, respecting the author's right to do this.
Mixing evaluation with formal inspection is a very bad idea. The author will try to hide the errors and he will probably succeed, minimizing the inspection’s primary objective.
Skipping steps or mixing roles will greatly affect quality. If you cannot measure the effect of a change, you should not do it. The formal inspection is a qualitative process, so decreasing its quality will render it useless.

Saturday, October 21, 2006

Measuring routine complexity McGabe's way

A well known tehnique of measuring a routine's complexity is proposed by Tom McGabe, by counting the "decision points". The tehnique is described below:

Start with 1 for the straight path through the routine.
Add 1 for each of the following keywords, or their equivalents: if, while, repeat, and, or.
Add 1 for each case in a case statement

If the score is:

0-5: the routine is probably fine
6-10: start to think how to simplify the routine
>10: break the routine into smaller ones

These begin said, I personally think this method is far from perfect. Yet it is easy to understand and apply. Also, the studies show an improve in code quality when using it. Wouldn't it be nice to be intergated in your IDE?