I have lived this life for more than 6 years now, and while these situations have occurred often, don't despair. There is hope! Getting by this many years was only possible by relying on those sharing their knowledge online. Stack Overflow may be one of the most useful websites around. Many blog posts may be found to help one solve a specific problem. This blog was created to help me convey some of the knowledge and skills I have accrued over the years to help others that work in the computational sciences or conduct scientific research solve common problems, learn necessary skills and overcome problems.
A little about my background: I have a BS in Chemistry and a BS in Computer Science. The past six years of my life have involved computational sciences of many kinds. Now I am pursuing a PhD in Chemistry, though the research is more realistically described as Computational Biophysics. I have experience with a handful of programming languages and many unix tools. With experience in computational fluid dynamics, continuum mechanics for material modeling, stochastic biological simulations, molecular dynamics and some quantum chemical modeling, my experience in computational modeling is rather diverse. In the past I used and developed code for the Uintah Computational Framework used by many to simulate a number of materials and fluids problems. Currently, I develop code for the stochastic biochemical problem solving environment Lattice Microbes. In solving these problems both workstations and supercomputers have had to be employed.
This blog will draw on these experiences and I intend to cover such topics as:
- Data analysis/processing
- Using supercomputers
- Compiling software on different architectures
- Math techniques in scientific computing
- Setting up simulations
- Programming techniques
- and many more...
With that, I hope you all will find this useful.
You mention workstations and supercomputers a couple times. Do you see these as completely different systems, with different calculations going to one or another, or is it more of a gradual increase in computational power, and you select a system size based on the size of the computation you want to run?
ReplyDeleteWhile simulation size is often the deciding factor of the computational resource selected to solve the problem, in principle the simulation setup and solution should be independent of the calculator. (This is mostly true for POSIX systems.)
ReplyDeleteThe distinction, in my mind, is largely based on the mode and level of interaction with the computer. For instance, running a simulation on a workstation is much easier than running it on a supercomputer; in one case you just execute the program, and the other you build a submission script that has to have all the right parameters and submit it to a queuing system. There are some idiosyncrasies with batch systems. For instance, some batch systems throw away the user environment, which means the computer may not be able to find the necessary libraries and executables that were readily available in the user's interactive environment. These problems are a pain to debug.
The level of interaction of a user with a workstation is also more intimate. Often times, you will be compiling a version of a common simulation package (unless you are lucky enough have a really helpful system administrator). Compiling your own code can be a real beast, especially when getting all the dependencies in place. Often times the supercomputers will have the software preinstalled as moduleds which you can merely load and use with little work. However, often times these modules are compiles in triplicate with different compilers and deciding which particular version of the software to use is also difficult. For example, one might be optimized for a low number of nodes, while another is compiled against a parallelization library that is optimized for a large number of nodes.
And finally, there is the issue of etiquette. While it is rarely written and even more rarely discussed, there are certain ways to behave when using a supercomputer because it is often a shared resource. A workstation may be shared between one or two people who can merely walk down the hall and hold the other accountable for their actions (taking up the whole hard disk, running too many simulations at once, breaking software, etc.). This clearly will not work when you have a shared resource that dozens to perhaps thousands of users may utilize.
I hope this clarifies.