Wordle Wordle on the Wall | Who is the most probable of them all ?

Ibrezm
3 min readFeb 5, 2022
Photo by Andreas Fickl on Unsplash

Playing wordle, I was wondering the same question some time back, your gut feel would mostly say that it should be the vowels (a,e,i,o,u). and you would think that would be the best answer to start with. I will try to take you through the journy instead of just jumping to the solution. If you are restless jump to the end of the article :)

So my first gut feel would be to create a word with the most vowels ? is that enough ? lets try to dig in deeper!!

If you simply google most used letters in english, you can get initial analysis here , you see that according to the Concise Oxford Dictionary (9th edition, 1995) most used letters of english words are as below (E, A, R, I .. so on )

Things are getting a little more interesting now !! so “R” jumped to the list that our gut feel said should be only vowels. But again is this truly what is the best estimate ? Also again we would need a word to work on not just the most used letters , right ?

Here is one more cent, are we really worried about frequency of all letters in the complete language , or just a subset ? we should just be looking at 5 letter words frequency subset right ? well if we get a 6 letter wordle we can go there but for now lets stick with five letters and try to find the frequency

Hmmm, Looking at above we should use “AESOR” right ?? but there are two problems with that. we haven’t considered positional occurance probability ( where to put ‘a’ where to put ‘e’ . Second we have to form a word !!!

Lets try to capture letter and location wize occurance

Seems good, lets just summurize what we have done so far!! for every five letter word we have calculated location wize occurance , which will help us calculate the overall probability for that word.

Partial snipit
full file extract for location probabilites

Above table give the probability of each word for that location, few interesting things here

S is most likely to be starting letter for the word with a probability of 0.11

S again is the most likely to be the ending word with a probability of 0.198

Q is least likely to be in the 4th or 5th position !!

if the dictionary contraint was not present “SARES” would be the most likely word

Now lets find which dictionary word is the most probable. given our constraint of 5 letters and dictionary word.

“CASAS” is the most likely 5 letter dictionary word but has repeating charaters

XEROX is the least likely word dictionary word based on letter probabilities

“PACTS” is the most likely 5 letter dictinary word with all unique letters and probably the best option to start with

Here is the collab link to my notebook, go ahead have some more fun!!

--

--