I started using ParFlow.CLM, a 3D variably saturated overland flow-subsurface model for my research in small urban watersheds about six months ago. In this post, I list the skills that were needed to start using ParFlow and the right attitude to have going in to this type of project.
- Some connection to a ParFlow user community and the ability to search for help. You will run into issues and problems when trying to apply ParFlow to your domain. The key is figuring out how you can find a way to resolve your issue. There are always resources online, in the user group community, in the ParFlow manual, in published articles, the ParFlow blog, etc. The main thing here is attitude. If something isn’t working, you CAN find a way to fix it, as long as you are willing to keep searching and trying different things, and are not afraid to try many different things and keep track of what you try.
- Organization. Keeping a good “lab notebook” of things you try, how long runs take, what keys you turn off and on, etc. There are so many different things to try to do to “optimize” your ParFlow runs in terms of speed and convergence, that you must have a system for managing all the files you generate, as well as the different things you try to get your domain represented as you want.
- Familiarity with GIS software and spatial data. I started out as a pretty proficient ESRI software user (about 4 years of professional experience as a civil engineer, and 3 years of research experience), but I had never used other GIS software. Much of the pre-processing of file inputs to ParFlow I ended up doing in GRASS GIS, because it was easier to automate the raster-based processing tasks. Familiarity with ESRI software, coordinate systems, projections, and coding simple for-loops (I used the R package for GRASS GIS, because I am more comfortable coding in R than in Python, for example)
- Familiarity with high-level, interpreted computer languages, such as R, Matlab, or Python. While doing any heavy lifting in these programs will take way too long, they are very useful for both pre-processing inputs and for making sense of your outputs. To give you an idea of my level of R expertise, I have been using it exclusively for data cleaning, statistical analysis and simulations for the past three years in my research, and I TA an applied stats course to master’s students who are first time users of R. I understand enough of R to stumble through Python (Googling often) and other computer languages. Some code snippets for pre-processing inputs (for example NLDAS climate data) are provided in NCL, another high level, interpreted language. Understanding at least one of these languages makes it easy to apply codes that others have written. Confidence using some high-level computer language, especially writing for-loops will also help you familiarize yourself with the ParFlow tools you can call using tcl scripts, which you will definitely need in post-processing.
- Willingness to learn the Linux operating system computing environment. Prior to deciding to apply ParFlow in my research, I had installed a dual operating system (Windows/Ubuntu) on an old laptop *once*. I think the main thing here is attitude– if you are excited about learning how to use the command line interface, communicate with your computer without icons, buttons, and graphical user interfaces, ie, directly, then starting from zero in this area is just fine. You can find anything out by Googling. Willingness to learn a Unix-based computing environment includes things like bash shell scripting, actually compiling and building ParFlow on your machine, and if you move to a shared parallel resource (like Stampede, an XSEDE high performance computing resource “supercomputer” that I am on), reading manuals on how to communicate with the job scheduler.
- Willingness to learn a compiled computer language, such as Fortran. I recommend a compiled language to run any pre- and post- processing steps on all the cells of your domain. For me, my domains are on the order of millions, so doing cell-by-cell calculations using an interpreted language would just not be possible.
- Hardware. To run ParFlow in parallel, you will need access to a parallel computing resource. Access and rented computing time can be expensive, so I also recommend that you start off doing your test ParFlow runs locally. I started out on a 2-core laptop (X230 ThinkPad with i7 processor and 8GB RAM) while I was going through the Getting Started with ParFlow document and starting to do simple tests using my own domain, then moved over to a 4-core workstation at my lab to continue testing . You want to stay local while you can, because shared parallel computing resources often have queues that you get into so you’ll have to wait awhile before your run gets started. If you start to find yourself waiting too long between your tests though (like, hours), then move to your parallel resource.