These are some of the R functions written over time to make some things easier to do. If you find them useful, that’s great! Please let me know. :) I do not believe there are packages or functions that do this—if there are, tell me! And these would end-up on a pile “I struggled, solved it—oh, wait, there’s a super easy package for that”.


Improving regression ceofficient plots

Following great advice for data visualization in Kieran Healy’s Data Visualization, among other things when plotting marginal effects (or coefficients) one should order them in their size. However, in case of large number of categorical variables, with several levels, it can be quite tricky to read. Once several models are included it becomes a hell. This function prepares data frame with coefficients for plotting with facets. Effectively, it orders whole variables according to the number of models in which they are, within those groups, it then orders variables according to their average value, and in the end, orders levels (sample is included). It’s not complicated, but it uses several for loops and there is probably way to do some of these things easier. It’s here.

Recoding GSS religion resurrectionr package logo

UPDATE: Function is turned into package resurrectionr.

Function recodes religion from three variables: relig, denom, and other according to classification provided by Sherkat and Lehman in “After The Resurrection: The Field of the Sociology of Religion in the United States”. Main goal is to address three issues:

  1. Recode religion successfully.
  2. Easily (automatically) recode whole family of religion variables in GSS (religion, religion at 16, parents’ religion, and special modules on friends’ religion).
  3. Recode no matter how GSS data is imported to R.
  4. Provide two ways of grouping religion (12 and 7 groups).

At this point, only 1 works well, while 2 and 3 are partial. Beside huge number of categories (cca 200 for other), main problem is that punch codes from GSS codebook do not correspond to factor values once data is imported (some codes and blocks of numbers are just skipped). You can see more here.

Mean on at least

Does the same thing as SPSS mean.#. Calculates mean of a case if more than n values are not NA, otherwise returns NA. Handy for making indexes of several variables when you can specify to calculate it if e.g. more than 75% or 90% of values are present in each case. See it here.

Scaling thousands and millions

Nothing special, but nice. Using scale package, in ggplot2 numbers on axis start to have K or M, depending on their size (3000 to 3K, 7000000 to 7M). See it here.