In my last project I had to find theoretical limits for a *psychometric* index involving *Likert scale data* (aka *categorical data*). After successfully finding it, I decided to test the results in a simple *Monte-Carlo simulation*.

I was surprised to find out that there is no built-in categorical data generator in *R*. What I was looking for, was something like `runif(100)`

which would generate a vector of length *100* where every element is drawn from a *multinomial distribution* in general or a categorical distribution in particular.

The first idea was to use `sample`

function with given probabilities: `sample(c(0,1,2),1,prob=c(0.33, 0.33, 0.34))`

but you couldn’t repeat this procedure for *N* participants without using loops, which is very inefficient, or you would end up with `rep`

repeating the same random pick *N* times.

I didn’t want to use any third-party libraries just for this small application either, so I came up with this simple trick.

Suppose, you want to generate a 5-category data *(x1, x2, x3, x4, x5)* for *N* participants with probabilities *(1/10, 2/10, 4/10, 2/10, 1/10)*. The following formula will work:

```
distribution <- c(rep(x1,1),rep(x2,2),rep(x3,4),rep(x4,2),rep(x5,1))
potential <- rep(distribution, M)
likert_data <- sample(potential, N)
```

or, as one-liner:

```
likert_data <- sample(rep(c(rep(x1,1),rep(x2,2),rep(x3,4),rep(x4,2),rep(x5,1)), M), N)
```

Notice that `distribution`

sets the probabilities, `potential`

repeats this *M* times (where *M* is any number greater than or equal to *N* — I personally used *M = N*), and `likert_data`

(*uniformly*) randomly picks *N* elements and returns the required vector.

Notice how in the screenshot above, we obtain almost exact probabilities we wanted: *(1, 2, 4, 2, 1)/10*. Since every time is a random draw, there are some deviations, but repeating this formula and averaging, gives the desired values.

**UPDATE:** A StackExchange user suggested a better hack — to randomly sample *with replacement*. This would make my solution obsolete, but it’s brilliant:

```
likert_data <- sample(c(x1,x2,x3,x4,x5), N, replace = TRUE, prob=c(1/10, 2/10, 4/10, 2/10, 1/10))
```