Tuesday, August 13, 2013

C Programming Memory Debugging: Valgrind

     I wanted to spread the word about Valgrind, a memory management debugger for C programs. I just found out about it and in the last hour or so has already caused me to identify numerous bugs in my CFD program. My CFD program (which is an unsteady, multi-block, structured grid, incompressible Navier-Stokes solver) is the largest program I have ever written and bugs occasionally pop up when inputs change (in my case, larger grids and thus larger memory allocations), and in my search for solutions I came across Valgrind. It is also very easy to use and well documented. Go to the Valgrind website and check out the 'Quick Start' section to try it out yourself.

Tuesday, August 6, 2013

Writing an Incompressible Flow Solver

     To all that have requested help from me directly by sending messages I hope your issues are resolved. Sorry if I never got back to you and missed your message.
     For the past few months I have been enthusiastically writing my own CFD code. It is an incompressible Navier-Stokes (INS) solver. The two reasons why I decided to write my own code instead of using OpenFOAM are the following:
     1. Nested Sliding Interface Capability.
     2. Time Accuracy.
I detail these reasons in the following paragraphs.

The First Reason
OpenFOAM AMI/GGI is hard-coded to handle only one rotating region. Users however have figured out a way to use more than one in the same simulation (it is somewhere on the CFD-online forums). However, to my knowledge there is no way to nest them. What I mean by nesting is putting one rotating region inside of another. AFAIK Star-CCM+ is capable of this, but that is out of my budget (which is non-existent), and the second reason I listed above still remains a problem.

The Second Reason
     You would think that compared to the compressible NS (CNS) equations, the incompressible would be easier, since you get to make the assumption that density does not change and set all of the density derivatives to zero. It may be a simplifying assumption to some degree, but in fact, the solution method for the INS is less straightforward. With the CNS, the equation of state (pressure, density, temperature equation) closes the equation set. With the INS, you can no longer use this closing equation. What most people utilize now to address this is a bit of a dirty solution: Operator Splitting. Without the equation of state, there is no explicit equation for pressure, but it appears in the momentum equations of the INS. The widely employed dirty solution is to split the equations so you solve for approximate pressure and approximate velocity and have them relate to each other to satisfy continuity. As you might guess, this has been shown to reduce the temporal order of accuracy of whatever scheme you use. Temporal accuracy is especially important for turbomachinery-type simulations.
     There is a way to solve for the INS without operator splitting, by utilizing vorticity formulations. However, these methods have major practicality drawbacks. These difficulties include specifying boundary conditions and complications that arise in three dimensions.
     After some research I came across the Exact Fractional Step (EFS) Method by J.B. Perot. With this method you can solve for the INS exactly, because pressure is eliminated. Furthermore, primitive variables are used, including in the specification of boundary conditions. Basically the biggest issues of both the vorticity formulations and traditional operator-splitting methods are fixed, while retaining their relative advantages. To me it seems like this is THE way to solve the INS. Why is it not already widely used? Well according to Dr. Perot, whom I have contacted personally by email, is that humans are creatures of habit. Ha ha. Another reason I can immediately think of is that a particular arrangement of flow quantities is required. I am using a staggered variable arrangement, where velocity components are at the cell edges, and scalar quantities (such as pressure or turbulence variables) are stored at cell centers. The staggered method is not as common in CFD solvers as co-located arrangements, where all velocity components and scalar quantities are stored at the nodes, due to the fact that staggered arrangements are less straightforward to accurately transform in structured grids. As I understand, using staggered arrangements in unstructured grids is a topic of research of Dr. Perot's. I forgot to mention that I am using structured grids in my code and am only interested in them, as opposed to unstructured grids. Anyways, I did find papers by Pieter Wesseling, who is now retired and has a textbook published on CFD, that shows how to transform staggered arrangements for good accuracy with relate ease and practicality.

Status
     So far I have implemented a laminar multiblock structured grid explicit solver and tested it on a lid-driven cavity simulation. The results are good. Right now, I am in the finishing stages of implementing a Spalart-Allmaras turbulence model. I hope to soon run and post results of a high-Re airfoil simulation.
     Although it is fully explicit, I still have to invert a system (i.e. solve a matrix Ax=b problem) as part of the EFS. As a side note, traditional operator-splitting methods also have to always invert a system for the pressure, and the error is in continuity. However for the EFS, the error is in vorticity, and the continuity is always satisfied to machine precision. It stands to reason that continuity error is more detrimental than vorticity error.
     Right now, I am using a canned Conjugate-Gradients (CG) algorithm by Richard Shewchuk as the solver. I plan to in the distant future to implement this algorithm on the GPU, and CG seems well-suited to this. I am also interested in Algebraic Multigrid Preconditioning, which tends to reduce number of iterations by an order of magnitude. But until all of the desired solver features are done, vanilla CG will be good enough.

This has been quite a journey and I have learned very much. Thanks for reading.

Thursday, November 22, 2012

Boundary Conditions For an Open Wind Tunnel

An open wind tunnel is a wind tunnel that takes in air from the surroundings, and outputs it to the surroundings. This is in contrast to a closed wind tunnel, which recirculates the air (i.e. the outlet is connected to the inlet). The difference is mainly power savings.

This article will describe the boundary conditions necessary to simulate an open wind tunnel.

An important thing to realize is that static pressure in a wind tunnel is not constant from upstream to downstream, as it is for an actual vehicle in flight. The small but significant pressure gradient is required to create a flow and  is created by the fan powering the wind tunnel (fans produce a pressure gradient, which is how they provide thrust).

Often in aerodynamics, because the air is invisible, many flow properties are indirectly measured. Wind tunnel velocity is typically measured by dynamic pressure, which is the additional pressure generated when a moving flow comes to a halt. For incompressible flows (below about Mach 0.3), the dynamic pressure can be calculated by 0.5*density*velocity^2.

Knowing this we can set up proper boundary conditions for an open wind tunnel. The following are the boundary conditions for U (velocity) and p (pressure). Set the desired wind tunnel velocity to V.

Walls: U = 0, p = zero gradient
Inlet: U = zero gradient, p = zero gradient, p0 (total pressure) = 1/2*density*V^2
Outlet: U = zero gradient, p = 0

Sunday, November 11, 2012

Getting Started with GPU Coding: CUDA 5.0

Hello everyone, this is a quick post detailing how to get started with harnessing the potential of your GPU via CUDA. Of course, I am planning to use it for OpenFOAM; I want to translate the pimpleDyMFoam solver for CUDA.

Especially awesome about CUDA 5.0 is the new Nsight IDE for Eclipse. If you have ever programmed with Eclipse (I have in Java), you know how spoiled you can get with all of the real-time automated correction and prediction.

Anyways, here are the steps I took:

I have Ubuntu 12.04, and even though the CUDA toolkit web page has only downloads for Ubuntu 11.10 and 10.04, it really does not matter. I used 11.10 without a hitch.

Download the CUDA toolkit at:
https://developer.nvidia.com/cuda-downloads

Before proceeding make sure you have all required packages:
sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

Then go to the directory of the download in a terminal. Then type (without the angle brackets):
chmod +x <whatever the name of the download is>
sudo ./<whatever the name of the download is>

If you get an error of some sort like I did, type:
sudo ln -s /usr/lib/x86_64-linux-gnu/libglut.so /usr/lib/libglut.s
And try the 'sudo ...' command again.

The 'sudo ./...' command actually installs three packages: driver, toolkit, and samples. I actually installed the nvidia accelerated graphics driver separately via:
sudo apt-add-repository ppa:ubuntu-x-swat/x-updates
sudo apt-get update
sudo apt-get install nvidia-current

But if the included driver installation works for you, then by all means continue.

Now, add to ~/.bashrc by:
gedit ~/.bashrc
Scroll down to bottom of file and add:
export PATH=/usr/local/cuda-5.0/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-5.0/lib64:/usr/local/cuda-5.0/lib:$LD_LIBRARY_PATH

Now to test everything out, go to your home directory:
cd ~/
Then go to the cuda samples folder (type 'ls' to view contents).
Once in the samples folder type 'make'. This will compile the sample codes, which may take a while. If you get an error (something like 'cannot find -lcuda', like I did), then type:
sudo ln -s /usr/lib/nvidia-current/libcuda.so /usr/lib/libcuda.so
 
Then go into the sample files and run the executables (whatever catches your fancy)! Executables are colored green if you view the  file contents via 'ls', and you run the executables by typing './' before the name of the file.

Tuesday, August 14, 2012

OpenFOAM 2.1.1 GGI / AMI Parallel Efficiency Test, Part II

These figures are pretty self-explanatory. Again, these are on single computers. My i7-2600 only has four cores, and the other data set represents a computer with 8 cores (2 x Opteron 2378). It is clear there is a significant parallelizability bottleneck (serial component) in the AMI. However, it seems this bottleneck may be controllable by minimizing AMI interface path face count, if the interface patch interpolation is a serial component.



Sunday, August 12, 2012

Connecting Two Posts: AMI Parallelizability and Hyperthreading Investiagtion

http://lordvon64.blogspot.com/2012/08/openfoam-211-ggi-ami-parallel.html

http://lordvon64.blogspot.com/2011/01/hyperthreading-and-cfd.html

The most recent post shows that parallel efficiency stays constant with any more than one core with AMI. In an older post testing the effect of hyperthreading, efficiency went down, instead of staying constant. These results are still consistent, because the two tests were fundamentally different, one obviously using hyperthreading and the other using true parallelism. Hyperthreading makes more virtual cores, but with less physical resources per core (makes sense; you cannot get something for nothing). Since we hypothesized that the lower efficiency is due to serial component, having weaker cores will lead to a slower overall simulation, since the serial component (which can only be run on one processor) will have less resources. Thus the lower performance using hyperthreading is due to the serial component running on a weaker single core. Also, the software could have made a difference as that test was run with GGI on OpenFOAM 1.5-dev.

This has practical implications for building / choosing a computer to use with AMI simulations. One would want to choose a processor with the strongest single-core performance. Choosing a processor with larger number of cores, but weaker single cores would have a much more detrimental effect in AMI parallel simulations than one would think without considering this post. This is actually applicable to all simulations as there is always some serial component, but the lower the parallel efficiency, the more important the aforementioned guideline is.

Saturday, August 11, 2012

OpenFOAM 2.1.1 GGI / AMI Parallel Efficiency Test

A 2D incompressible turbulent transient simulation featuring rotating impellers was performed on a varying number of processors to test the parallel capability of the AMI feature in OpenFOAM. The size of the case was small, about 180,000 cells total. This ensures that memory would not be a bottleneck. The exact same case was run on different numbers of processors. The computer used had 2 x AMD Opteron 2378 (2.4 MHz), for a total of 8 cores. I am not sure what CPU specs are relevant in this case, so you can look it up. Let me know what other specs are relevant and why in the comments.
The following are the results:

 The efficiency is the most revealing metric here. There seems to be a serial component to the AMI bottle necking the performance. I will post some more results done on a different computer shortly.