I don't got it
Some people have a knack for languages. They say annoying things like "Portuguese is just Spanish with an accent", or "we speak French to the kids". Fuck those people.
I was not allowed to take my French exams in secondary school for fear I would bring down the school test scores as well as my own (hi there hothouse British school). Instead I was made to study Latin (did I mention my ridiculous school?) because it had no speaking component to fail. Nevertheless, I barely passed, scoring the worst mark in my class.
Languages? I just don't got it.
When the love of my life informed me, therefore, that in order for us to marry and start a family I would need to learn Dutch, I realised I would die alone. I had already brushed off her family's requests - the Dutch have higher English literacy rates than the British! Isn't Dutch a dead language? I'm pretty sure even Dutch people just revert back to English when no one is looking. If a tree falls in a forest and no one is around to hear it, does it make a sound? No matter how I protested, my girlfriend remained dead serious.
So, I had to learn Dutch. But how? I had some major constraints:
I don't like teachers,
Other ways of learning a language don't work,
And I need to learn a language, sharpish.
I'm sure if you find your Mr Miyagi, teachers are great. But I never found mine. I was always getting in trouble at school. My primary school teacher said I needed 'squashing', but unfortunately she was so unsuccessful that I wound up getting kicked out of secondary school. Teachers were off the menu.
And everything else kind of sucks for actually learning. We all have a friend with a 500 day Duolingo streak who can't have a basic conversation. We all used those textbooks that tell you the items in your pencil case before they tell you how to order a beer. We all know that weeb whose identity became teaching abroad. I ain't doing it.
And why should I? Do the old ways even work? Estimates say it takes about 1000 hours to learn a language. That's three years doing an hour a day! Even if I had three years to spare I wouldn't spend it on Dutch.
I therefore had no choice but to examine, from first principles, how do you learn a language in the least amount of time possible, when you don’t want to?
An example of my then girlfriend’s ideal family arrangement.
What is a language, anyway?
Lots of time in language classrooms and textbooks and apps is spent on grammar and pronunciation and spelling and cultural context. I am going to claim that these are, in the scheme of things, complete bullshit.
A language is just words. While this sounds trivial, when you think about it it is a startling and underutilised fact. If you don't know the words, you can never speak the language. If you do know the words, you might get sentences upside down or say them funny, but you have a chance of saying something. There's no escaping it: to learn a language you have to learn the words.
So what words do you have to learn?
The good news is, you don't need to learn a lot of words to get started. Something like 1000 words make up about 75% of all the words used in regular sentences, and 2-3000 words will cover 95% of everyday conversation. The bad news is, you need to know a lot of words to get near 100%. Not a single native speaker knows 100% of the words in their own language, and they generally know 10-20,000.
Language schools have a system, CEFR, aimed at measuring fluency. And research suggests 3000 words are necessary for B1 aka 'Bad'. The final boss of CEFR, C2 aka 'Completed', needs about 5000 words. Knowing a word in this context means recognising and comprehending at least one word in a word family (run, ran, running = 1 word family). These aren't words you can actively use off the top of your head and conjugate perfectly. Research suggests your active active vocabulary might be less than half your passive.
Want to speak a language? You've got to learn at least a few thousands words.
So, I thought, may as well start there.
Memorising 10,000 sentences
I would be going to my in laws for Christmas, and I wanted to surprise them by speaking Dutch.
After thinking long and hard about ways of learning Dutch that didn't require me to know any Dutch words, I realised it could not be avoided. So I committed like a ten year old microwaves a fork. Over the next few months, I went on a vocabulary spree and memorised roughly 5,000 Dutch sentences, each testing a different fill-in-the-blank word. There was no fancy gamification, no teacher, no cultural exchange. Just raw dogging rote memory.
Christmas came, and I was ready to test my theory. I felt quietly confident. I gave my usual apologies for only speaking English at the start of the evening, and waited for conversation to switch to Dutch as is both reasonable and predictable at a Dutch family Christmas. I had my interjection rehearsed - I would crack a timely joke in Dutch for the big reveal. Unfortunately, I understood nothing. What my girlfriend failed to mention is that her family speaks Twents, the native tongue of the Dutch region Twente. Though technically a Dutch dialect, Twents is gobbledygook even to native Dutch speakers. It is perhaps gobbledygook even to Twents speakers - interactions in Twents seem more like a mutual dare than a genuine form of communication. Fail to understand random mutter of guttural vowels? You must be a poser from the city.
I remained silent for the meal, and when we got in the car home, explained to my girlfriend that it was a lost cause - I had memorised thousands of words but didn't recognise a single one. She explained that her family spoke Twents, and said I should try again, with her parents, in Dutch. Annoyed at her omission but not wanting to have wasted my time memorising 5,000 useless sentences, the next day I tried again. And this time it worked. I switched into Dutch mid sentence, and her father cried in joy. I spoke Dutch!
Now, I don't want to overstate things - I spoke Dutch like an idiot. But I spoke Dutch. I had, for some definition, learned a language. This had never happened to me before. And my excitement about this carried me to memorise >10,000 sentences, which got me fluent enough to stop caring. I still make mistakes all the time (de or het anyone?), but I speak Dutch enough to go on holiday with friends and family, and for us to never have to switch to English even if we could. I am now successfully married to my Dutch wife with a kid who speaks Dutch at home, even if I'm a little rusty. I don't know what CEFR this is but, for me, that is enough.
You're probably thinking, "this Alex guy is full of shit. Who would learn a language by rote memorising sentences?". Me. I would do that. It took something like 120 hours over a year to learn 10,500 fill in the blank sentences, and about 60 more to get them into long term memory, less than a minute per sentence overall. I reviewed more than 115,000 flashcards, studying 80% of the days and averaging >237 reviews at a time. This is a lot of time, but 150 hours normally gets you to A1 aka Amoeba in CEFR. Here are the receipts from the flashcard software I used:
The first principles of language learning
You might think that memorising sentences is about the dumbest possible way to learn a language. Why else do we have all these teachers and apps and textbooks? My theory is that learning a language is very simple but very boring. Teachers and apps and textbooks are much less boring, and make you feel like you're learning without you having to do the painful actually learning part. Duolingo doesn't care whether you learn French, they care whether you use Duolingo, and so over time Duolingo has gravitated towards whatever makes you keep your subscription. People don't subscribe to very boring apps.
To understand why my method worked, and how to do it yourself, you need to understand some fundamental truths of language learning. I'll explain them below, from most important to least important.
Vocabulary is King
You can't speak a language fluently without somehow memorising thousands of words. So your initial focus should be on memorising those words. Here's a rough guide:
# of Word Families / Words | % Coverage of Text | Description / Text Type |
---|---|---|
3,000 | ~95% | Informal spoken / scripted dialog (movies, TV, soap operas) |
4,000 | ~96% | Academic spoken English (lectures/seminars) |
5,000+ | ~98% | Informal spoken / scripted dialog |
8,000 | ~98% | Academic spoken English |
Frequency is Queen
You can get more coverage from more common words. If you learn the 1,000 rarest words in a language, your coverage will still be close to 0. It makes sense, then, to learn the words according to a Frequency Lists. Frequency Lists are lists of words from most to least common in a language, and they're easy to find for most languages.
There is some dilemma with Frequency Lists about whether to count lemmas (run, running, ran = 3) or morphemes (run, running = 1). From my own experience, you can just learn the most common lemmas, roughly in order, until around 5k lemmas. At this point you'll start to be able to converse and read things. 10k lemmas was enough for me to feel that my vocabulary was not an issue.
You do not need to start with words that 'feel' basic but aren't common, or avoid words that feel harder but are common. Children learn words in order of conceptual complexity, because they are limited by their conceptual understanding. As an adult, you have all the concepts. So you should learn words in the most logical way. If 'enterprise' is more common than 'rhino', learn enterprise.
NB. This is one of the reasons why learning Chinese/Japanese characters is tough. Kids have to learn them in order of their conceptual complexity, which means they learn the Kanji in a somewhat random order. Systems like Heisig's Remembering the Kanji allow you to learn radical (component) by radical.
Learn vocab in context
You never want to learn vocabulary in isolation, only in sentences. "Miss" might mean "not have" or "fail to catch" or "young lady" or "long for". You need to intuit that you can't say "I missed my keys" and you can say "I missed the bus". These intuitions only come from seeing words in their proper context, in sentences.
Memorise actively
The most important lever you have in memorisation is the Testing Effect. Anything passive, like reading, will be much less useful for memorisation than anything active, like being repeatedly asked "what's dog in Spanish?".
Ultimately, speaking is a test of your active vocabulary, vocabulary you can spontaneously produce. Passive vocabulary, vocabulary you can comprehend, is useful but should be a byproduct of trying to increase your active vocabulary.
I recommend that you use Cloze Test type cards: that way you can learn vocabulary in context while being tested on it. Think "Today, I went to the ___________ and bought some milk and eggs. I knew it was going to rain, but I forgot to take my ________, and ended up getting wet on the way.". I like to have an English translation as the blank, as it removes ambiguity and research is mixed on if staying in your target language entirely is good or bad.
Memorise with Spaced Repetition
Memorising thousands of words is very time consuming. We can make it take 10-30% of the time of conventional methods by using with Spaced Repetition Software (SRS). Basically, SRS will test you over and over, and learn both how good you are at memorising things, and how tough the individual things you're trying to memorise are.
For discrete, large scale memorisation tasks like thousands of vocabulary in context, SRS is kind of miracle. It's hard to overstate how useful SRS is if you want to memorise things. It is the most important innovation in memorisation and learning, and you should be using it.
Amazingly, the best software for this is open source - Anki running the FSRS algorithm. Sadly, Anki has quite a steep learning curve. No one has kept what makes Anki great while making SRS more accessible, though most of the big appsstarted out with this premise.
Learn incrementally
In order to memorise items quickly, we want to learn the least amount of new information possible per sentence. We don't want sprawling sentences with lots of new words. Ideally, we have reasonable length sentences (you can look up the average for your language) with only one word we don't know. This is sometimes called i+1 learning.
When you don't have any vocabulary at all, this is hard to achieve. But it gets easier as you go along. Early sentences might look something like this: The cat sat on the mat. > The dog sat on the mat. > The dog sat on the floor. > The dogs sat on the floor.
At the start this is near impossible. When your i is close to 0, learning i+1 is incredibly tough. Many sentences will contain multiple words you don't understand. Do not be disheartened. Brute force the first ±100-500 words and it will get easier.
Go fast
We should be trying to go fast when testing if we know vocab, even if that means we fail more often. We’re trying to make recalling words automatic, not deliberate. Studies show automaticity is a key measure of fluency and we should be measuring retrieval speed.
As per my stats, my average review time for a piece of Dutch vocabulary is <5s. I either know it or I don't.
Never listen to non natives
If you want to learn a native accent, you should only ever hear native speakers. Every sentence you encounter should have audio to go with it and you should try to emulate that when you read it. You want to bake the lilt of the language into your brain.
One reason foreigners sound so foreign is because they practice together. If you are copying mispronunciations, you're destined to mispronounce.
What about grammar?
Don't worry about it. Whether or not you can spontaneously produce correct grammar doesn't depend on whether you can slowly get grammar exercises right in a book. The best way to get grammar is 'feel' grammar through massive amounts of exposure. If you can't figure out how a sentence works by reading through it, look up how it works. But don't practice grammar - your time is going to be much better spent expanding your vocabulary in context, and thus also your exposure to grammar, than by completing grammatical exercises.
What's the catch?
Following these principles is, to my knowledge, absurdly boring. And adherence - sticking to the plan - is necessary to make progress. Learning a language is like bodybuilding: you might know how to pick things up and put things down, but the only way to get strong is to do it. That's the catch.
Making this practical
You might have noticed that teachers, textbooks, and apps break the first principles of language learning. There is no perfect solution to this (yet). Learning is hard work, and people don't pay to work hard. The good news is, you are not people. You are here and therefore you are different. I believe in you.
The test it out to see if you like it way
Clozemaster is the closest approximation to my method that's available. Sadly it tests multiple choice (passive recall) or typing (slow) rather than Cloze deletion, is automatically generated so can be janky, and doesn't use FSRS, but it's otherwise not bad.
The nerdy way
This is pretty manual, but free and you can customise it to your heart's content. For an example or a starting point, you can download my English to Dutch deck and a Dutch/English to Spanish deck here.
Creating a deck
Download a sentence list from Tatoeba.org or Open Subs in your target language.
Create a frequency list based on this, or use one available online.
In a spreadsheet (example here) or with code:
Sort the sentences by the hardest word in them, with some additional variables for sentence length and average word difficulty. Call this the ‘difficulty score’.
For each word in the frequency list, find the lowest difficulty score sentence for that word.
Make a cloze sentence by blanking out the word.
You now have a frequency list with example cloze sentences for each word.
If you are missing translations, add them in with something like DeepL or even Sheets translate. It doesn't matter much because you won't rely much on them.
Set up your Anki cards in the below format.
Import as a CSV to Anki.
Generate AI audio with this Anki Add On. AI audio is pretty good now. Much closer to the pronunciation you want than non natives and Twents speakers.
Reviewing
Review diligently for some amount of time you can sustain almost every day (eg 10 minutes a day). Don't do more, the SRS backlog will punish you for your ambition.
Ask Chat GPT for explanations and don't be afraid to change / delete sentences you don't like.
Do reviews quickly, maximum 10 seconds per review.
Only use "Again" and "Good".
Again is for when you didn't understand the sentence or couldn't produce the right word quickly.
Good is for when you understood the sentence and could produce the right word quickly.
Read the sentence out loud as you review and listen to the native speaker.
Don't give up.
Card formats
You want the front of your card to look like this, using the Dutch word "regenlaarzen" (a sadly very common word in the Netherlands):
_____
Ze moeten hun wellies niet vergeten.
_____
And the back to look like this:
_____
Ze moeten hun regenlaarzen niet vergeten.
regenlaarzen
wellies
A hint / translation
*The audio of the sentence plays.*
_____
The flip from english to target language for the target word can be done automatically. You can download my English to Dutch deck and a Dutch to Spanish deck here. You can copy the card formatting and set up to get the automatic language switching.
FAQ
Doesn't [Insert App / Teacher / Thing] do it better?
Potentially. I'd love to hear about it if they do! If you can find a real flesh and blood adult who actually learned to speak a language in a reasonable time through using an app / teacher / thing, tell me.
Isn't memorising lots of words boring?
Reeeeallly boring, yep. It's hard to keep up.
Isn't it a problem that it's boring?
Definitely. In the end If there were a way to make it not boring, that would be a very successful tool.
Won't AI make learning languages irrelevant?
Quite probably.
Won't AI make learning languages easier?
I hope so!