Wednesday, November 11, 2009

Branch and Bound, Structured Grids, and Graphical Models

It is nice to see some discussion of graphical processing, and it makes sense then that what we would see is a whole-picture processing by effect on the image. Also, what's with that particular image being some sort of academic standard. It seems... sleazy.

Anyway. I find it strange that the Structural Grid pattern (perfect for that sort of application as is indicated in the paper) starts out presenting itself as though it is going to be a discussion of a multidimensional version of the multicore array we discussed earlier and then begins to try to phrase itself as some sort of master conductor unit when it is clearly exactly a tool to help apply basic operations over multiple dimensions quickly. Multiple threads are of course going to be needed for that to work, and the data structure should be the one to dispatch but I kept reading the phrasing as though it was overstating what was actually being accomplished. The discussion on the ghost nodes would have benefited from a little more depth and a few more examples. Other than that a good description of what was necessary.

I liked the amount of coverage of the different approach styles in the Branch & Bound paper and the description of the pros and cons of each. I also appreciated a concrete example of when the multiqueue structure would be used to gain efficiency, and the fact that the queue technique in the previous paper was used directly. It seems to me that using the branch and bound technique to drive statistical models (like the problems Monte-Carlo is being used for in most examples) would be a viable route as we can limit the total number of runs using this structure and then collapse the results to find the actual best and the conditions under which it occurs.

Shared Queue, Speculation, Digital Circuits

Shared Queue:
I suppose this is a sign that I haven't really been thinking things through, because we've had several patterns discussed already that talked about using a queue to create multiple tasks or threads with the option to use task stealing. And the idea of concurrency problems happening at the dequeue and enqueue operations scares me to death. Multiple counts of the same tasks being created, multiples getting added to the queue, and the list goes on. That we have waited until now to discuss this issue is worrisome, but what's scarier is that those papers we read before seemed to be willing to sweep these possibilities under the rug. And the deep problems at the task generation and retrieval stage can cause such extreme problems with data and process integrity.

Speculation:
It's a strange but interesting idea. I'm much more used to stages of programs flowing into each other in a way of tight dependency. I can see the possible uses for this showing up as a part of setup of methods and difficulty during other stages of execution. Additionally, decomposing the various tasks of interest into the FSMs for easy charting is going to be complex. The parsing example is a very good one, one where multiple stages can be broken cleanly into non-dependent parts, but getting another example showing where it could be useful would help easy my mind. Also, and this is probably just my lack of understanding, but I'm still not clear on the recovery process when a decomposed thing doesn't execute correctly. It's nice that this has been considered and covered I'm still confused as to how it's actually going to do its job.

Circuits:
Strange that this paper is almost 90% examples with very little definition. While I understand that there are only so many bitwise operations, seeing the examples of each with a clear explanation would be nice because I've only got a vague understanding of this stuff because it has been a while since I've last looked at it. Other than that, it seems sensible, though I personally haven't really worked with bitmasks there are very few tools to do so easily, and so doing CRCs on large disc images cannot be an easy task.

Tuesday, November 10, 2009

PRES

I think the funniest moment in this entire paper was the accurate yet obvious statement that if it takes a large amount of time to locate a bug in a program it will take a longer time to fix the bug. I understand the direction that this statement is heading towards: that concurrency bugs can be maddening to locate, and if it takes years to find them, it may take years just to correctly reproduce the conditions they occur under, and therefore it will take even more time to figure out how to protect against that condition. It is a very important point, as these concurrency issues are extremely important to solve before concurrency becomes standard.

What I find interesting is the approach of a lite-data retention in the first passes to build an outline of where the problems lie. I remember a previous paper where we were using technical abilities to predict areas of conflict in concurrency. This seems like a great possibility for combination of tools in order to better produce techniques for correct development. Again it seems that we have a great idea, with good technical setup and a good tool base, but since it succeeds at its own aims the authors haven't looked beyond to create a stronger technique for us to use to solve these complex problems with greater efficacy.

Monday, November 9, 2009

Task Queue, Graph Partitioning Etc.

Task Queue confuses me. It seems that all it is attempting to do is build a list of executable units, and then, as space opens, distribute those units amongst the available resources. But this seems like something the Operating System is supposed to as part of its normal operation. And since the queue system needs a system like Fork Join in order to distribute the tasks and do things like task stealing I'm not sure I understand what this is trying to accomplish that standard system doesn't already accomplish during its standard work. It also doesn't seem like there is much difference in intent than the pipes and filters pattern of dealing with multiple tasks of varying lengths and deciding how to move between them and begin those steps.

The Loop Parallelism paper reminds me of the discussions in the computer architecture courses that go over how to unroll loops and the complexities involved in that. It seems like this paper wants to use those same techniques whenever possible and assign the unrolled portions to different processors. Unfortunately I think this would severely limit the kinds of loops that this can be applied to, especially since many loops involve cumulative work on the components. When we read about the various multicore data structures like the parallel array there was a realization that scans of it would be easy to parallelize, but some alterations would be too difficult to do and maintain data integrity. Additionally, the architecture classes point out the amount of additional work that must be done to make the answers from the various stages of the loop pipeline available elsewhere. Combine these two ideas, of the necessary hardware additions for efficiency, and the restrictions of the way to use the structures in the code itself and I'm not sure I'm grasping why this would be more useful than either the techinques in wide use today or some of the tools and techniques we've already examined.

Thursday, November 5, 2009

Armstrong Thesis Chapter 6

It was nice to finally see what an application using this structure would look like, and to get an idea how one would actually use the supervisor idea presented in the last chapter to create an effective hierarchy. I'll admit I'm fuzzy on why the finite state machine was constructed in that way and built into the structure the way it was.

To answer the point about why would you construct a separate event handler from the server, I would assume it's because by splitting the two into independent modules we can allow for the full purpose of the structure to work for us. If both modules were combined into a single asset then any modules we built to deal with an error would need to be able to handle server and event errors, which may be being caused by such different conditions and solved in such different ways that we would be better off splitting them into two separate error handlers. And since we would need two totally separate error handlers that should be a huge clue that what we're trying to capture errors from is too complicated. Just because they seem like they serve similar functions doesn't mean we can lump them together without considering all possibilities and repercussions.

Wednesday, November 4, 2009

Pipeline, Geometric Decomposition, Data Parallelism

What's interesting about the pipeline pattern is that, when properly constructed it works as a wonderful platform to use of further patterns. The paper mentions that we can use things like the discrete events pattern from earlier this week to pass messages between the stages of the pipeline, which of course also allows us to use the Fork-Join framework to ease the overhead as pointed in the earlier paper. This idea could also be used, depending on the resource requirements, to have multiple full pipelines running in multiple processors, though without investigation I would assume that it is actually more efficient to have multiple instances of the individual stages working in concert. Perhaps it is just my misunderstanding, but I'm not sure what the differences are between pipes and filters and pipeline. It seems to me that one implements non-hardware pipelines through the use of pipes and filters, by using that groundwork to lay out the connections and the data handling between the linkages.

The Geometric Decomposition is interesting, and I would have loved to have a different example than one with pictures, as the 2D format makes them an obvious application. I would hope that in applications that need the additional efficiency is actually a complicated data set which would allow this, and although the example restricted itself to just the 2D data set I feel it is obvious that expansion into higher dimensional data sets would be a matter of defining cuts more closely and could be accomplished with a minimum of thought.

Tuesday, November 3, 2009

Armstrong chapter 5: Fault Tolerance

The first thing that jumped out at me during reading of this was the idea of decomposing tasks into simpler and simpler pieces so that correct, or at least safe, execution could be achieved in all cases. I originally misinterpreted this to have to do with trying to perform only subtasks of the original task in order to complete as much as possible before exiting. This of course would be very dangerous, for example, if at an online store the money is subtracted from the account but the items aren't credited to the user or even the other way around. As I read on it seems that the system being proposed is actually focusing in two related areas: recovery and dependence. Firstly, it sounds that if a task fails to execute we either restart that task hoping it returns correctly on a new run, or run an alternative task to ensure that the system is restored to the correct state from before the failed task was run. This would ensure that there are no instabilities in the system, since we are able to explicitely design the heirarchy to be able to recover from any set of errors with the input or the execution.

Reading further it seems that the greatest use, thanks to this And/Or Supervisor technique is to create a heirarchy that guarantees the proper execution of tasks in a specified order (done through careful construction of the supervisor tree). This still helps wtih the overall chapter goal of designing fault-tolerant software as we can build using the correct Ands and Ors a tree that will prevent the program from getting into a state that causes problems because of unfulfilled preconditions. Building to use this structure will take careful management and a mindful hand, but in systems where the ability to correctly handle all adverse conditions of execution and input designing for safety needs to already be something in the forefront of the designer's mind. It is much easier to design a system to use this approach from the ground up rather than trying to go back and reconfigure everything to work in this format. There is even a chance to design some parallelism into the system, if the subtasks are all dependence related rather than fault-recovery, we can create new threads or processes to execute them in an efficient manner.

Monday, November 2, 2009

Recursive Splitting, Task Parallelism, and Discrete Event

The title of the Task Parallelism brought to mind many of the similar papers and patterns we have already discussed in the class, ones that talked about how to correctly leverage the splitting of large complicated tasks into ones able to take advantage of multiple processors. The paper did indeed end up discussing this, not to my surprise, but seemed to function more as an overview or index of other patterns, and a discussion of what is necessary to get them to work efficiently. It makes sense then that we have held off on reading this until now, as discussions of things like the Monte Carlo Simulation do require enough of a working knowledge to gain the full impact. Because this level of parallelism has to be kept in mind from the beginning in order to ensure that proper utilization occurs this pattern is firmly placed in the structural category. I don't think hardware is the issue here, the bigger hurdle seems to be that we engineers and developers are not properly creating the software to take full advantage of the hardware available, and as I've mentioned in earlier posts I'm not expecting this to change anytime soon without a concentration of effort to provide some tools.

The Recursive splitting paper made me question why we were assigned this paper. As it seems like yet another paper that discusses using a Fork Join framework to make computation faster, and I could swear we had already had at least one paper that discussed using recursive calls as the basis for producing new threads. Or maybe it's just that the frequent extolling of Fork-Join's low overhead, good switching, and safety have made it so that I automatically associate its use with any paper that begins discussing taking advantage of parallelism. I also question the use of insisting that this paper utilizes the Divide and Conquer pattern when the paper clearly says that it is that same class of problem that it is intending to solve.

For the Discrete Events paper, I would initially say that I have yet to work in a memory sharing environment, but have clocked a decent amount of time working with message passing systems, typically over networks. Since deadlocks can occur for many reasons, especially when we're waiting for tasks that can occur in different orders every time most systems would prefer to analyze expected runtimes for the various components that will allow for a sensible timout to be set. With the technologies that underlay most networks, busses, and similar message passing structures we should be able to trust that lack of returns in this reasonable amount of time is the result of failure and not inefficiency. Of course, this assumption won't work in all cases, such as ones where modules exist on multiple platforms and those platforms experience unexpectedly heavy loads. But in those cases the tasks may never finish, or may arrive too late anyway, so discarding and attempting again may alleviate these problems. Since an in-depth knowledge of the problem space and the solution possibilities exist already in the mind of the designer they should be able to recognize those few cases where the formal deadlock condition is preferred and plan accordingly, but I don't expect this to be a frequent thing.