Saturday, April 28, 2012

Construction of a New Basis Set. Part I. Planning.

As the title suggest, I'd like to share my experience (over several parts) with the construction of a new basis set for use in the calculation of NMR spin-spin coupling constants (SSCC). There are several key aspects that one must take into consideration and to loosely just name a few of them, they are:
  1. Which molecule(s) am I interested in developing a new basis set for?
  2. What basis set will I use to create my new basis set? Will I generate it from scratch?
  3. What geometry will I use?
  4. What is my approach for making the new basis set? (Interestingly, this point has many sub-points)
    1. Uncontract, add specific functions and then recontract?
    2. Uncontract, remove specific functions and then recontract?
  5. Publish my results.
This post is about points one and two and coincidentally will not contain any data. This will change in the next post.

This whole project fell out the sky and hit me in the head because I attended a course with associate professor Stephan Sauer on molecular electromagnetism and instead of solving the exercises, I opted to do a mini-project. There is nothing wrong with broadening your academic abilities I thought and here I am. The project was: "Make new spin-spin coupling constant basis sets for Gallium, Germanium, Arsenic, Selenium and Bromine atoms". While Stephan had participated in publishing one paper on H2Se and the corresponding basis set(Warning - paywall), I thought why not try and use that as inspiration for making my own and actually see if I could improve on what he did in the first place. Given the project, item number one was pretty clear - use the most basic molecules you can think of (or draw) and use that as your starting point. I chose: GaH, GeH4, AsH3, H2Se and HBr as staring points.

Since Stephan has also contributed in the form of building basis sets by using Dunnings correlation consistent aug-cc-pVTZ basis set, that was also a natural starting point for me. I guess that you need somewhat of an iron will if you want to start from scratch, but I suppose in some cases - why not?

In any case, the plan was:

  1. See what (and equally important how it) was done for H2Se. I also had another paper for the rows of elements just above my row(Warning - paywall). It also turned out to help a great deal too.
  2. Try and be systematic about it. (I guess everyone is a bit sporadic with their placement of data once it starts rolling, am I right?)
  3. Getting started since these calculations do not scale very well with the number of basis functions we use, and lets face it - even for hydrogen, the aug-cc-pVTZ-J basis set is HUGE.
  4. Share the data with the world when I get it.
It'll be a grand experience.

Tuesday, April 24, 2012

The Purpose of This Blog

Inspired by blog posts from around the web, people around me, specifically my supervisor and his fight for doing "science in the open", I've decided that I also want to take part in this scientific movement in a form that is more subtle but which will benefit other people.

I'm particularly interested in providing the data from my own research for anyone to view or to do their own calculations on the data. Maybe someone discovers a better result hidden away in the data that I've not thought about or found. There is no real standard or golden path to follow when it comes to sharing data, so I guess I just have to see how it goes and present it in a form that I like and which is hopefully understandable for others. I think my greatest hope with this sharing of the data is that people can see how they could attack a scientific problem and first learn by following what I've done, and then extend it or take the research in another direction.

Sometimes the data will be published here and the scripts might show up on my blog on python in chemistry. Maybe there will be data shared via the excellent slideshare.net. I've also been recommended to try figshare.com/ which also lets you store data sets. One can also use services which stores information in plain text such as pastebin.com or even use gists (see a discussion on that in a blog post I wrote on the python in chemistry blog).

We'll see how it goes. I've thought about it for a while and at least I'd like to try it to discover a reasonable way to share data.

This is my attempt at contribution to do science in the open.