How to Convert String Variables With Non-Numeric Values to Numeric Variables in Stata

We can convert string variables with non-numeric values to numeric variables in Stata using the encode or egen commands.

Many survey questionnaires use a Likert or Likert-like scale, e.g.:

  1. Strongly Agree
  2. Agree
  3. Neutral
  4. Disagree
  5. Strongly Disagree

or

  1. Always
  2. Usually
  3. About Half the Time
  4. Seldom
  5. Never

Below is another example of non-numeric values in a variable:

  1. A
  2. B
  3. C
  4. D
  5. E

When analyzing data, it is often desirable to have numeric values (e.g., 0, 1, 2, 3, 4 or 1, 2, 3, 4, 5) instead of non-numeric ones. Stata recognizes these non-numeric values as “string” values, and their variables are called “string variables.”

In Stata, there are a few ways of converting string variables (with non-numeric values) to numeric variables (with numeric values). The commonest way to achieve this is probably by using the encode command, i.e.:

. encode oldvar, generate(newvar)

where oldvar is the name of the old variable and newvar is the name of the new variable. If we use the encode command, the new numeric variable will have value labels added to it.

Another way of doing the same thing is by using the egen command, i.e.:

. egen newvar = group(oldvar)

The new variable will have numeric values without value labels.

3 comments add your comment

  1. Dear Dr. Andy;
    When I was searching about how to convert string variables in to numeric variables in stata, I found your document. It was really helpful for me. Thank you so much sharing your knowledge with others.

    Warm Regards,
    Shantha

  2. Respected sir,
    i have non-numeric code in a variable in stata and i want to rename that non-numeric code (under that particular variable) into a numeric value. please tell me, how can i solve my problem?

Leave a Comment