Oracle, index-organized tables, and overflow tablespaces

I’ve been rewriting a utility that moves certain tables and their indexes from their old tablespaces (many, many old tablespaces) to a new one. This week I ran into a new (to me) error message:

SQL Error: ORA-25191: cannot reference overflow table of an index-organized table

My utility wasn’t providing enough special handling for index-organized tables (IOTs), which were throwing up this error (and a few others). So what’s an index-organized table, and why are they so difficult to work with?

A refresher: a SQL database is relational, and rows are stored unsorted; Oracle calls standard tables heap-organized tables. For a given SQL query, unless an ORDER BY clause is used, results can be returned in any order. Oracle also has IOTs, which are tables that are instead stored in a B*Tree index structure. This structure stores all columns of the table, sorted by primary key; because this is both the table and the index, accessing rows is much faster and less storage space is needed than for a heap-organized table with a separate index.

There are, however, exceptions to the rule that everything is stored in one place. If some columns are rarely used, they can be moved into an overflow area (which could be in a different tablespace) to speed up access to the main segment. Additionally, any columns that cannot fit into a single block must go into the overflow area.

When an IOT is created using an overflow segment, Oracle will automatically create a second table with a name like SYS_IOT_OVER_XXXXX. Trying to use this table directly results in the above error.

If I query dba_tables, I see that SYS_IOT_OVER_XXXXX has an IOT_TYPE of IOT_OVERFLOW (confirming that it’s an overflow table, as expected) and the IOT_NAME column gives me the name of the original table that this is holding overflow from. I can then resolve the issue by moving the IOT, specifying both the regular and overflow tablespaces:

alter table SCHEMA.”TABLE_NAME” move tablespace TABLESPACE_NAME overflow tablespace OVERFLOW_TABLESPACE_NAME;

Now Oracle moves the overflow table along with the heap-organized table, and everyone is happy.

Comma placement in sql scripts

It’s a truism that old habits are hard to break. Turn down the same street every day, and you’d better pay attention if you want to actually make it past that intersection without turning! One habit I feel that I need to change is my comma placement in a select statement.

Today I’ve been working on debugging a new view I’m writing to simplify some Oracle code. I’ve taken some existing code and removed the columns I don’t need, leading to something like this:

SELECT owner,
segment_name,
segment_type,
partition_name,
tablespace_name,
FROM

Running the script, I get this error:

SQL Error: ORA-00936: missing expression

This is actually more helpful than many Oracle error messages. If you look back at the select statement, you’ll see that when I deleted the final columns, I neglected to remove the comma after tablespace_name, so Oracle is looking for another column and failing to find it. If the statement had been written like this:

SELECT owner
,segment_name
,segment_type
,partition_name
,tablespace_name
FROM

then I wouldn’t be able to make that mistake. The nice thing about this format is that no line depends on the next line, and if you add another column to the end the previous line doesn’t need to be modified. I think I have a hard time getting myself to use this format because it’s not good English – but it does seem like good programming practice.

Building a Pugly-Linked List

Meet Janie. Janie has an ever-changing menagerie of pugs, and would like to use a data structure to keep track of them. What data structure would be most suitable?

Sad pug
Sad pug image by wikimedia user DodosD. Reused under Creative Commons license.

Janie could keep her pugs in an array, but this isn’t a good choice for several reasons. Arrays are best for collections of constant size, while the size of her menagerie varies. Further, any ordering of the pugs will require re-sorting the array as pugs enter and leave, and without this ordering we lose the benefit of being able to find a given pug in O(1) time by indexing into the array.

Queues and stacks don’t have the constant size problem – we expect them to grow and shrink. However, they’re unsuitable because they define the order in which pugs have to be removed after they’re added; the oldest pug might usually tend to come off the queue first, but not always, and we certainly hope the newest pug won’t be popped off the stack right away.

Consider a linked list, however. A linked list is a data structure in which we have some number of nodes, each of which points to the next node on the list; by maintaining a pointer to the head of the list, we can obtain the rest of the objects by following the next pointers. In a doubly-linked list, each node contains a pointer to the next node and a pointer to the previous node.

We’ll modify this slightly to get a pugly-linked list. The head of each pug (its head pointer) will point towards the previous pug in the list, and the tail of each pug (its tail pointer) will point to the next pug in the list. (Normally, we’d keep one head and one tail pointer, pointing to the first and last elements of the doubly-linked list, while each node would have previous and next pointers…but we’ll abuse the terminology a bit to make it pug-specific). If the tail pointers are curly this will, naturally, give us a circular list.

If we need to find a specific pug, we can iterate through the list, which is the same thing we’ll need to do with an unsorted array – but we won’t have any empty cells from missing pugs. We just look at the first pug and follow the tail pointers until we find the pug we want. We can add pugs in constant time – just set its head to point in the same direction as the first pug’s head, set its tail to point at the first pug, and set the first pug’s head to point the new pug, which has now become the first pug in this pugly-linked list. We can remove pugs (provided we have a pointer to them) in constant time as well, by updating the next and previous pugs to point at each other rather than the pug being removed.

And this is how Janie can keep track of all her pets with a pugly-linked list.

Interfaces in C#

Consider test-driving a new car, of a make and model you’re not familiar with. It’s unlikely you’ll need any instruction in how to do it: rightmost pedal makes the car go, pedal next to it makes the car stop. Even though there are many different cars, they have a reasonably consistent interface that allows most people to drive most cars.

Subclassing Car

Suppose we want to implement this in code. Let’s say I want to use a Camry object. A Camry “is a” car, so it makes sense that the Camry class inherits (possibly indirectly) from the Car class. We would never actually instantiate a generic “car” object, so Car can be an abstract class.

What methods would Car have? Two obvious ones could be ApplyGas and ApplyBrake. Maybe they’re implemented something like this:

Car class

In this case, the comments are standing in for some code that physically allows more gas into the engine or applies the brakes. This works fine….as long as every car works exactly the same way. But what happens if we have, for example, a Prius? Now stomping on the gas pedal may or may not actually be sending more gas to the engine, which means that we need to override the behavior in the derived class. More generally, for each type of car that derives from the Car class and uses different hardware, we’ll need to rewrite the methods for that car’s hardware – and if we forget, and the method tries to use hardware that isn’t there, then things can go horribly, horribly wrong.

Ok, so we’ll make ApplyGas an abstract method; now each derived class will be forced to implement it appropriately. The same will go for StartCar (do we need a key?), ChangeGear (which gears are available?), and many of the other methods, but the Car class is accomplishing our goal of ensuring that each derived class has a consistent interface.

However, we should step back and ask – what are we gaining here? We’re not sharing implementations; we simply want a consistent interface. So…maybe we should consider using an interface?

Defining an Interface

While inheritance allows us to share behavior between related objects, an interface is simply a contract that defines certain methods each class implementing that interface promises to provide. As a driver, I don’t care how each method is implemented; I simply want to know that, given an instantiation of any class which implements ICar, stomping on the gas will make me go faster and using the brake will make me slow down. I want to learn the interface, and from that, be able to handle anvy new car that might be released.

Let’s define a simple interface for car:

Car interface

Here we’ve defined various methods that we’d expect to apply to any car. There’s a way to turn it on and off, which could be by means of a key (most vehicles), by the location of the key (Tesla), or some other method entirely – we don’t know what it is, just that there is one. There’s a way to accelerate and a way to slow down. We can get the current speed and we can change direction. As is standard practice, we’ve named the interface starting with a capital I.

Notice two things that none of the methods have: an access modifier and an implementation. Since an interface is just a contract, all methods are public by default – an implementing class can’t agree to implement the methods if it doesn’t know what they are! Similarly, the interface doesn’t define how any method is to be implemented, just that they must be.

Implementing the Interface

Suppose we go ahead and create a class that implements ICar:

Car class without implementation

 

 

 

 

 

 

 

Here we’ve made the claim that this class (unhelpfully named Program – we should really choose a better name) implements ICar, but we haven’t actually provided any method definitions, which gives us the red squigglies. Hovering over ICar shows all the methods that we’re required to implement before the program will compile. This is one of the advantages of using interfaces – we can’t possibly forget to provide a custom implementation for one of the methods and end up using the default implementation without meaning to.

If we hit control-., Visual Studio will offer to implement the interface for us, either implicitly or explicitly.

Car class implement interface

 

 

 

 

 

 

If we choose to implement the interface, then each of the required methods will be created; since all interface methods are public, that access modifier is automatically assigned. If we implement the interface explicitly, then instead of the access modifier the method is explicitly called out as being the implementation of an interface. This is important because it’s possible that our class implements several interfaces which have methods with the same signature that need different implementations; declaring which implementation the method is associated with allows us to have several methods with the name name and call the appropriate method for the interface we’re currently using.

Car class notimplementedexception

 

 

 

 

Multiple Interfaces

Let’s look at an example. Here I’ve added an interface IBoat which also has a GetCurrentSpeed method, and updated that method in ICar so that both methods return strings. Since I’ve implemented ICar implicitly, IBoat must be implemented explicitly (and Visual Studio will do this no matter what option I choose when I tell it to implement the new interface).  I’ve renamed Project to something that makes more sense (Vehicle) and written a main() routine that creates two vehicle objects, one for each of the two classes. When I run the program, each call to GetCurrentSpeed will automatically find the correct method for its interface, and in fact the boat object cannot even see the methods which are specific to the car interface.

Vehicle class multiple interfaces

 

Programming to the Interface

You may have heard that you should program to the interface, not to the implementation. This means that if we know what interface an object implements, we don’t need to know or care what the object actually is. In our example above, we know that car implements ICar, but we don’t actually need to care that it instantiates Vehicle (as opposed to ToyotaCamry, FordTaurus, etc). If in the future we decide to change the implementation of car to be TeslaModelS instead of Vehicle, we only have to change one line of code. Because car is defined as an ICar, we can only call methods defined in ICar (or Object, which ICar inherits from) on it, and those methods are guaranteed to exist in every class that implements ICar.

Thus, although interfaces can be used as a type of multiple inheritance, their real value is in allowing components to be loosely coupled: when one part of the program changes, it becomes less likely that this will result in other parts of the program needing to change, which means less work and (hopefully) fewer errors. By restricting what we’re allowed to do (in this case, restricting objects to using only methods required by the interface), we actually make those objects more usable.

When not to optimize

Software development has some consistent goals. We want the software to be as fast as possible, as small as possible, to work as well as possible, and to be done as soon as possible. Naturally there tend to be trade-offs between these goals: the fastest algorithm may require more storage space, the algorithm that requires the least space may be complicated and prone to errors, extensive testing may delay the release. We decide which goals take priority and what values are acceptable for each. But how to choose?

In some cases, optimizing for speed and size makes a lot of sense. If a loop will execute ten thousand times, a one-millisecond delay in each loop is likely to be unacceptable. If there are ten million items to be stored, requiring an extra byte for each item can make a difference, even with today’s storage allocations. Even when not under tight constraints, all things being equal, we’d like to optimize our performance.

In some cases, though, it’s better not to optimize. I’m not talking about avoiding premature optimization – I’m talking about not optimizing at all. For example, suppose you have an object which will be instantiated a few dozen times (with each instantiation being stored on disk) and there are two ways you can code it. One will require an additional string value, the other will simply perform some calculations (perhaps a dozen lines of code) each time the object is loaded. The time to do this calculation will be unnoticeable. Should the value be included or not?

At first glance we might say, sure, the calculation time is insignificant so we might as well do it. On the other hand, since the object will only be instantiated a few times, the extra space requirements are also insignificant; it simply makes no difference to the performance whether we store the extra item or not.

Does that mean it doesn’t matter which decision we make? In this case….no. We also have to factor in the complexity of the calculation: not to the computer, but to the programmer. Adding extra code makes it that much more likely that a bug will slip in someplace; given two options where neither has a performance benefit, we should opt for the simpler one. In this case, then, I would choose to use the extra item, not to save the calculation time for the computer, but to save the mental energy of the programmer by making the code that much simpler.

Delegation, flexibility, and software development

Something that occurred to me yesterday is how much less structured my work week is than it used to be.

When I was a new developer, my TL (team lead) liked to have everyone maintain a spreadsheet showing everything that we were currently working on, with an estimate of how many hours we were planning to put into each item for the coming week. I was never convinced of the ability of the average developer (or, more specifically, of myself) to accurately estimate how long things would take, but it at least showed how much time would be taken up by meetings, vacation, etc, and how much work everyone had on their plate.

Over time, I stopped doing the spreadsheet, and my workplan meetings largely became me saying “I’m currently working on this, this, and this” and my TL saying “ok, sounds good” or “I need you to prioritize this other thing.” This worked for me – I still had a good idea of what I was going to be working on, but I wasn’t spending time typing up estimates of how long each change would take.

These days, that’s still how my workplan meetings go, but two things have changed. One is that I now bring written notes, because even though I don’t feel the need to do the formal plans we used to use, I have too many things going on to remember everything I want to talk about without them. The other is that my plans for the week are much more open-ended: rather than having six fixes I plan to do, I’ll have two or three things at the top of my priority list and other stuff hanging around for “when I get to it.”

Largely, this is because I’ve taken on more responsibility: a lot of my time goes to either dealing with important issues that come in and get routed to me that I wouldn’t know about in advance, or helping newer members of the team. As a result, it’s not unusual that only a minority of my time is actually devoted to what I’d planned to spend the week working on. The other reason is just that I tend to work on larger projects now, which take several weeks or longer to complete, with smaller development thrown in when I need a break; this means it’s rare that I have a half-dozen fixes on my “will absolutely do this week” list.

At the same time, while I’m not directly responsible for any other people on the team, I am in charge of a project that several people are working on, which means I’m setting priorities and making decisions for multiple developers. My impulse so far has pretty much been to pretty much say “here are the team standards and the company standards; I expect you to follow these. Otherwise, work how you like and let me know when you need more.” If I can agree on a design with one of my developers and then not hear from him again until he’s finished coding three weeks later, I’m fine with that; I have no interest in knowing how many hours he’s spending on a given part of his activity each week.

My own feeling is that if you give someone work to do and let them figure out how to get it done (assuming that person is reasonably competent and professional), with the occasional check-in, work will get done more effectively and people will be happier than if you try to micro-manage what’s being done when. At least, that’s always been my experience – if I have multiple things I need to get done, I’ll be more productive if I can just choose to work on whichever item I happen to most feel like working on at the time, not what the schedule says I should be doing. The nice thing about being a software developer is that this is quite possible; as long as I’m hitting my deadlines, how I’m managing my time doesn’t need to concern anyone else.

But it’s still a good thing the computer keeps track of everything I need to do without my having to do anything. My memory isn’t that good…

Checklists and Software Development

In 2011, Atul Gawande published The Checklist Manifesto: How to Get Things Right. I first became aware of it when I had the opportunity to hear him speak during his book tour for his latest book, Being Mortal, at the end of 2014, but only got around to reading it this week. The book is about managing complexity in the medical field, but can also be applied to software development.

Checklist Manifesto book coverThe central thesis of the book is that modern knowledge has made the world too complex for humans to handle unaided. Take any complex profession – medicine, architecture, etc – and you find specialization and sub-specialization: there’s simply too much expertise required for anyone to be a generalist. Even within the sub-specialties, there’s so much to keep track of that it’s easy to miss a step, and that one missed step can result in a collapsed building, crashed airplane, or dead patient. We have finally advanced enough that “errors of ignorance” – mistakes because of what we don’t know – are often less important than “errors of ineptitude” – mistakes from not making proper use of what we know.

The solution is the simple checklist: a written list of the steps to take each time an action (whether it’s landing a plane or checking a patient) is performed. Interestingly enough, the checklists aren’t useful only for those who are new to an area, but even for experts with many years of training and experience who may nonetheless overlook a minor, routine step under stress.

So how can we apply this to software development? I personally use a checklist (in the form of a spreadsheet) that I compare against every new activity my team creates. The list contains around a hundred items to check, from making sure that CSS files are linked in the correct order to ensuring that our naming conventions are followed and strings are properly internationalized. It’s a lot of minor details, the vast majority of which we can expect to do correctly on any given day without the aid of the checklist, but I find that I usually catch a few things in each new activity. Unlike the steps on a doctor’s checklist, these aren’t going to kill anybody if they’re missed – most of them will simply result in a slight inconvenience to us (the developers) in the future – but running through the checklist before the end users ever see the software lets us fix the issues more easily and makes the software look more polished to end users (and testers) as we find issues before anyone sees the results.

Of course, we can also look at non-technical checklists: does the code do what it’s supposed to do, regardless of the implementation? To some extent this is what we’re doing when we write unit tests: we define what each unit should do, and then check that it actually does what we expect. I find that one of the best ways to avoid getting a lot of change requests is to have a checklist of exactly how the finished activity should work, and verify that it meets every one of the requirements (many of which might be too simple to remember when coding) before marking the development as complete.

What processes do you use for finding bugs in code? Can they become a checklist?

Error! Error!

One of my least favorite errors is ORA-01775: looping chain of synonyms. This often means that you’ve referred to something that can’t be found for one reason or another – a table doesn’t exist, a table is owned by a different user, there actually is a synonym loop…

The nice thing about Oracle error messages is they all come with descriptions to tell you what’s going on and numbers so you can look up more information. The bad thing is that the messages aren’t necessarily all that helpful, and the error may be too general to let you know exactly what’s going on, as in the case above.

The action could not be completed. Ok, what do I do next?
Ok, what do I do next?

In 2013, I spent a good chunk of time going through the software I work on and looking for any error or warning messages that were not helpful. We used to have one internal error message that really meant you needed to have the Oracle Client Tools installed on your machine, but that wasn’t what it actually said, and we got questions about it pretty regularly. While those questions only took five seconds to respond to, it would have been a lot better for the message to actually be meaningful to the person experiencing the problem! Thankfully, we got those cleaned up so that now our error messages actually tell you what you need to do to resolve them.

This applies to help text as well; much of our help text has been rewritten to be actually, y’know, helpful. By definition, help text will be used by someone who’s not an expert in the software or problem domain, so it needs to be very clear about what’s expected of the user.

While this isn’t directly related to coding, it’s still something programmers should consider as part of usability. Someone has to use your software; part of your job is to make sure they can do so effectively. Include technical details if it’ll be helpful in debugging, but remember that every error, warning, or help box should make sense to the end user of the software. Give that person clear direction so that he or she knows what to do next, and doesn’t attempt to defenestrate the computer.

Never mind the looping chain of synonyms; just tell me you can’t find the darned table!

Preparing for Microsoft exams: how and why

The value of certification is a constant hot topic for many people; a search for certifications on stack overflow alone returns over 27,000 results, and adding the exact phrase “worth it” still returns almost 800. I’ve never been responsible for hiring people or been asked about certifications, so I have absolutely no comment to make about whether they’re helpful for finding a job.

Where I find certifications helpful is in setting a deadline for myself. Since 2011, I occasionally needed to write SQL code and wanted to learn more about SQL, but I never did because it was never a priority. A few years ago, I signed up to take exam 70-461: Querying Microsoft SQL Server 2012; because I was paying for it (my company will reimburse exams only if you pass), I finally had the incentive I needed to sit down and learn the material. I never bothered with the other exams in the certificate, because the certificate wasn’t what was important to me; what I needed was to learn the material that was covered in that first exam.

So that’s my primary motivation for looking at certifications: as a way to encourage myself to study. After five years as a professional programmer, I feel like I still have a ton to learn, so I’m in favor of anything that helps me to focus.

Right now I’m looking at Microsoft’s exam 70-483: Programming in C#. Looking over the list of topics covered, I see some stuff I know reasonably well, and some stuff that I’ve never learned because it’s never been something I’ve needed to use. Here’s the key part, though: it all looks like stuff I would like to know, that could very well be useful to me at some point. So this is actually stuff I’m interested in learning; the test just helps me find the areas I need to know more about.

For example, I see that one of the topics covered in the exam is performing symmetric and asymmetric encryption:

Choose an appropriate encryption algorithm; manage and create certificates; implement key management; implement the System.Security namespace; hashing data; encrypt streams

The theory behind this is all stuff I studied in school, but I’ve never had any reason to use it in practice. As it happens, Pluralsight has a two-hour course called Introduction to Cryptography in .NET, which I expect will go into way more detail than the exam will. This is what I mean about focus: Pluralsight has just a ton of content that I find interesting, so having a list of topics that I need to cover helps me decide which courses to take first.

When I took the SQL exam, one thing that helped me relax was having the second shot offer, which lets you retake an exam for free; even though I didn’t need it, it was helpful to know that if I somehow screwed up and failed the exam it wasn’t going to automatically cost me $150. The most recent second shot offer expired a few weeks ago, so my plan is probably going to be to start reviewing this month and be ready to schedule the exam next time the offer comes around again. Or if it’s not available by this summer, I’ll probably take the exam then anyway.

In areas as diverse as finishing my PhD and learning to write a SQL query, setting a deadline is what helped me get it done, and that’s why I plan to pursue certifications.

Clarity in programming and the conditional operator

The interesting thing about the conditional operator (?:) is that hardly anybody knows what it’s called (it’s often just referred to as the ternary operator), but everyone seems to have an opinion on whether it should be used.

The conditional operator is shorthand for an if/else statement; rather than writing

if (condition) {expression 1}
else {expression 2}

we can just write

condition ? expression 1 : expression 2

Coworkers arguingI’ve had one code reviewer who was extremely against ever using this, to the point where I replaced the one I was using with an if/else statement (at the expense of clarity) just to end the argument (knowing that I was unlikely to work with him again anyway). The conditional operator can certainly be misused; consider something like the following:

condition1 ? exp1 : condition2 ? expr2 : condition3 ? expr3 : expr4

That’s a bad use of the conditional operator, because the reader has to stop and think through it to understand what’s going on. The equivalent if/else block, on the other hand, is longer but much easier to read:

if (condition1) { exp 1 }
else if (condition2) { expr2 }
else if (condition3) {expr3 }
else {expr4}

Understanding the second block is trivial. Side note: I would normally prefer to have each expression on its own line, but that’s a whole different argument and I’m condensing it a bit for this post. Also, I’m using C# here, in which the conditional operator is right-associative; that’s not true in all languages.

On the other hand, if you have a block that looks like this:

if (var1 % 2 == 0) {var2 = Parity.Even}
else { var2 = Parity.Odd }

Then the conditional operator version is just as easy to read, if not easier:

var2 = (var1 % 2 == 0) ? Parity.Even : Parity.Odd;

This version also has the advantage of making it obvious that, regardless of the outcome of the test, var2 is being set. There’s no way to accidentally set var2 in one case and var3 in the other, which removes one small possibility for bugs. It emphasizes the operation being done, while the if/else version emphasizes the condition.

Long story short: use the conditional operator when it improves the readability of the code, don’t use it when it doesn’t.  Here are some more bad examples:

a = expression : true ? false;

a = b != null ? b : null;

a = (b != null ? b : c); // use a = b ?? c;

Again, this just comes down to a simple rule: given two or more ways to code something, tend towards whichever method requires the least amount of concentration from the reader. The sanity you save could be your own.