The L.A.R.R.I. Project
I have always struggled to understand the functioning of my own voice. Over the years, I have struggled to acquire the necessary procedural knowledge to get my voice to do what I want. I often avoided the subject of vocal anatomy and exactly what parts are responsible for what actions when I vocalize. At the same time, I was attempting to “learn” my own voice, guided by the theory that it is an instrument, and that “playing” this instrument necessarily involves having some concept of how it functions. Most of the vocal tract is, after all, invisible not only to my sense of vision, but also to my senses of touch, proprioception (sensing where my body parts are), and kinesthesia (sensing my own movements). After two decades of study and practice as a professional singer, and even as the author of a technical approach grounded in position and motion, I must admit that I still rely primarily on hearing to consistently elicit a particular coordination. It’s true that changing my behavior as a vocalist can be done more efficiently by inferring the actual movements of relevant body parts. But it’s also true that my voice isn’t an instrument after all, and consequently, it’s also true that knowing the parts of my own vocal tract doesn’t serve the same purpose as knowing the different parts of the violin. A violinist can use knowledge of this sort to make a clear decision to play near the bridge, for instance. In that case, the knowledge directly informs the player’s decision, and applying it to any musical situation is perfectly straightforward: see the bridge, now just play close to the bridge. Even if we remove the player’s sense of sight, it’s no less straightforward: locate the bridge by feel, play close to the bridge. Playing close to the bridge always produces the same effect, and any outside observer can confirm that the player is indeed playing near the bridge. Nobody argues about whether a player really played near the bridge or not. The opposite is true for singing. Before ever picking up “the instrument”, a singer is already vocalizing in everyday life. These vocalizations are specific and consistent position and motion, but the vocalist does not necessarily have conscious awareness of either. A vocalist can learn what body parts need to move, how they need to move, and where they need to go – but when they attempt to do it, they simply revert to whatever vocal coordination they already associate with the phrase.

No system or training methods I have come across can alter the way in which I seem to store and recall the memories of my “vocal actions” and their associated sounds, to which I give the name “vocalization” or “vocal coordination”. These compound memories are sounds with appended sensations, emotions, and intentions. As a student, I was exposed to all the usual visual aids and explanations meant to teach how the larynx and the rest of the vocal tract function, and I was given lists of (often conflicting) sensations, the experience of which I was told would indicate that I was safely pursuing the correct approach. Perhaps this teacher insisted I should “feel it speak”, or that one might suggest that I “spin the sound”. Another presented classical singing as a Zen koan: open the throat, but do not sing from the throat (the true voice can only be released when the student finally achieves (and does not achieve) non-duality as the throatless throat, which acts without action). Yet another insisted I should place the sound in my mask, while relaxing everything. These instructions (or lack thereof) to somehow compel and then “experience” ineffable sensations by putting on inscrutable intentions seemed to help precisely nobody achieve exactly nothing, but they were nonetheless superior in most cases to the results of pedagogies whose authors claimed to have a “biomechanical” or “acoustic” basis for recommending this or that approach. Vague imagery might be useless, but bad science can be downright hazardous, especially when combined with big expectations.
So what is the function of modeling in the pursuit of vocal coordinations? If it doesn’t directly inform the vocalist’s decisions about position and motion, why do we need it anyway? My answer is that although it’s not necessary for vocalists to have personal knowledge of vocal anatomy and function, it is an obligation for pedagogues to say precisely what they mean, so that it can be tested by others. It is therefore beneficial to the entire community if someone makes it easier for anyone to show precisely what they mean, in 3-D. The L.A.R.R.I. project aims to make it not only easy and free to precisely illustrate intended vocal coordinations – it aims to make it so simple that it becomes virtually compulsory.
What is modeling in the first place? A model is an imaginary construct that represents objects, systems, and their functions. If bad modeling is a problem, can’t we simply not use them? Unfortunately, we don’t get to decide if we will use some sort of mental modeling to think about and practice singing. Although we can’t use our mental model of the vocal tract in the same way a violinist can use their own map to play close to the bridge, we can use it to influence our development, and we can use an improved mental map to make improved guesses when we are “solving” a particular phrase or even an entire role. Improved guesses are still guesses, and the basic process of trial and error, listening and imitation, is not replaced. It is improved.
There are reasons why bad modeling of real physical properties can hinder the user so. In my experience and as reported by countless others, proprioception, kinesthesia, and enteroception in general seem to be somewhat plastic. In the short and medium term, we might experience them as consistent from moment to moment, with some notably abrupt shifts. But especially over the long term, it’s possible (and almost inevitable) to influence how we perceive and feel the position and motion of (especially) unseen body parts. Pedagogues who rely on vague imagery avoid any problems arising from spoiled proprioception and kinesthesia that can result from flawed or misinterpreted anatomical and biomechanical models of the vocal tract and its functions. But this is not to say that they and their students do not rely on the same kind of mental mapping as their “science-based” counterparts. Following one method or another can influence the map, but it can not alter the fundamental relationship between that map and the the user’s sense of where they are and what shape they are in; the former is a direct and immutable consequence of the latter, and a fact of existence for the user. If the teacher’s influence on the student’s proprioception is weak, there is some chance that new (and commercially viable) vocal coordinations arise (more or less) spontaneously in the studio or during solitary practice, no matter what the teacher is actually “saying”. But if the teacher’s influence is strong, then the results of the student will be more closely tied to the quality of their modeling and the specific details therein. Anatomical diagrams carry the weight of authority, and few students would (or should) challenge the validity of models created by experts. Medical diagrams, therefore, greatly enhance the teacher’s influence over the results of the student, for better or for worse. The unfortunate result is that the more well-educated a professional voice teacher is, the more consistently they seem to lead their students towards rather unfortunate performance habits. New School Singing aims to reinvigorate the natural tension and competition between subjective (or “artistic”) pedagogies and their objective (or “scientific”) counterparts, by offering better modeling to improve the competitiveness of facts. “Acoustic Vocal Pedagogy” and all the previous iterations of the classical singing pedagogy I call “New York School” have consistently failed to get good results for the vast majority of their clients, not because science is powerless to help singers, but because pedagogues in general have consistently failed to rigorously field-test their theories. As a result, there has been no incentive for the pedagogy industry to improve its modeling over time. They are too busy trying to justify the results and defending the inclusion of anything scientific in the first place. Objective truth, as a result, has lost its credibility in the marketplace of ideas, particularly in the somewhat specialized field of classical singing.
When I was a university student, I was introduced to the same medical diagrams as everyone else, and as a result I was just as dumbfounded when it came to understanding the form and function of all the parts of my voice. The problem was not me, and it wasn’t the teachers. It was the models. We were told, for example, that the thyroarytenoid (TA) and cricothyroid (CT) muscles work in opposition to decide whether a sound is more “chesty” or more “heady”, according to the relative “dominance” of each. The CT and TA, in this model, are compared to other muscles with antagonistic relationships, for instance the biceps (flexion) and the triceps (extension). Rather than flexing and extending the arms, these muscles are said to shorten (TA) and lengthen (CT) the vocal folds.

The TA, in this model, shortens the vocal folds, and the CT lengthens the vocal folds, and this is analogous to the action of the triceps and biceps. Frustratingly, when I looked at anatomy diagrams expecting to see these muscles configured to pull the same joint in opposite directions, I instead noticed that the CT and TA don’t control the same joints. The TA, as implied by their name, connect the thyroid cartilage to the arytenoid cartilages, one on either side.
The shape of the vocal folds is governed by five cartilages that form four joints, or, if one prefers, one compound joint: the thyroid cartilage, the arytenoid cartilages, and the cricoid cartilage, and the epiglottic cartilage. This joint “complex” can be thought of as having four subsystems that govern the “basis” for vocal fold shaping: thyroarytenoid (working as a pair), cricoarytenoid (working as a pair), aryepiglottic (working as a pair), and cricothyroid.
The arytenoids are shaped like 3-sided pyramids, and are attached at their base to the top of the cricoid cartilage. At the base of the anterior (front) side of each arytenoid cartilage, a needle-like process (a technical term for a projection or outgrowth that is a feature of a body part) provides an attachment point for each vocal fold. When the vocal process moves away from the central notch located at the front of the thyroid, the tissue that connects them (the vocal folds)
becomes stretched. This is one way (but not the only way) that the pitch of the voice can be raised.
The (the thyroid cartilage) is free move in any direction, but only a very small distance, since the thyroid and cricoid cartilages are connected by ligament as well as muscle, and the cricoid itself is attached to the trachea. Both cartilages are also attached to the pharynx, such that elevating or depressing the larynx is by definition a movement of the pharynx, further limiting their ability to move independently from one another. The arytenoid cartilages are set on top of the cricoid, but the surface on which they glide and rotate is not flat so that “when the arytenoids glide away from each other they also glide in an inferior direction. When they glide towards each other, the opposite occurs. Consequently, this leads to shortening or lengthening of the vocal cords during vocal adjustments” (source: see hyperlink). Although this perhaps explains why thyroid tilting is observed during lighter vocalizations, it is plainly evident, even before taking vocal acoustics into consideration, that it’s not logical to explain all laryngeal gestures involving pitch change as a balancing act between the thyroid and cricoid alone. This perhaps explains why nobody can actually feel the CT-TA antagonism they claim to use in their training and in performance. You can’t feel it because it doesn’t exist. But if a pedagogue creates the expectation that you ought to be able to feel it, there is essentially a perception waiting for an event, and whatever that event is, it will be identified as CT-TA antagonism in some way, and the vocalist will build a theory from there. If necessary, it’s possible to simply have a contradiction in the theory while relying on the mind’s tendency to adapt to the situation (mainly via cognitive dissonance). Often, this appears to “work” for a time, because it is sometimes better to do something than to wait for the perfect idea of what to do. The human voice is directly attached to a human brain, and it often finds a way to bend itself in the direction of our goals, even though we do our best to hinder it. This is not a good reason to be satisfied with inadequate modeling! The sensations felt by the vocalist are real, and therefore any corrupted perceptions are necessarily stealing perfectly good sensory feedback from the vocalist. Those sensations could be helpful, if only they could come to represent real events. That potential is progressively unlocked when models are continuously updated according to the latest information.
Expressing the theory of TA-CT antagonism in 3 dimensions can help us understand why the thyroarytenoids and cricothyroids alone don’t control the behavior of the vocal folds. If we remove the thyroid cartilage, it is rather easy to understand that the popular theory of CT-TA antagonism is not problematic because it is too simple, but rather because it is not plausible. There are quite obviously quite a few bare faces here, especially on the back and sides of the cricoid and arytenoid cartilages. The thyroarytenoids (yellow) and cricoarytenoids (fuschia) don’t seem to be in a position to accomplish what is required: position of the vocal process in relation to the central notch of the thyroid. The CT and TA shown above can’t explain, for instance, how the arytenoids keep from bending uncontrollably forward, or how they are rotated to establish contact between the vocal folds.

The cricothyroids connect the cricoid cartilage to the thyroid cartilage from the front and sides. They surely work together with the thyroarytenoids to HELP control the vocal folds, but the same can be said about the cricoarytenoids (lateral AND posterior), the interarytenoid muscle, the oblique arytenoid and aryepiglottic muscles. The animated depiction to the above can help us visualize these missing components. What sort of gestures can you picture that might require tension on some or all of these muscles?
If the goal is control of the vocal folds, it simply can’t be expressed as a balancing act between the cricothyroid and thyroarytenoid muscles. They are certainly included and are important players, but they are not, by themselves, a complete team. Certain vocal pedagogies hold that the cricoid itself is capable of tilting to stretch the vocal folds. However, this is simply not plausible, given that the cricoid is not free at its base but is instead attached to the trachea and to the pharynx. The esophagus passes between the cricoid and pharynx, and together with the cricopharyngeus (alternatively, the inferior pharyngeal constrictor) form the upper esophageal sphincter (the term refer to its function rather than its form). It’s also helpful to keep in mind that the thyroid cartilage, while it is certainly able to tilt to some extent, has a very small range of motion relative to the cricoid.

It’s also valid to extend our consideration to the extrinsic muscles of the larynx, the tongue, pharyngeal constrictors, and the infra- and suprahyoid muscles. Some pedagogues, particularly in classical singing, have even taken to advising students not to assist laryngeal gestures with extrinsic musculature, claiming (without any basis in fact) that doing so is both inherently unhealthy, and that avoiding it is even possible in the first place.
The arytenoids are influenced by the intrinsic and extrinsic muscles we have seen so far. The thyroarytenoids and the cricoid, the cricoarytenoids (lateral for closing and posterior for opening), the interarytenoids (to help them glide together). But we are not finished yet, as we’ve left the apices (tops) of thew arytenoids rather bare. To help control each apex, we have a relationship to the epiglottis (and therefore the hyoid and tongue) via the aryepiglotticus muscles and mucosa (and as a consequence the thyroepiglotticus muscles), and the oblique arytenoids (the fibers of which are extended to form the aryepiglotticus). In some anatomical concepts, the aryepiglotticus and oblique arytenoids are treated as parts of a single muscle, which taken as a pair form a sort of chromosome or x-shaped structure. When the muscle fibers of this x-shape are shortened, they must contribute not only to shortening the interarytenoid gap, but also to narrowing of the laryngeal vestibule, as part of the same gesture. Essentially behaving as the mouthpiece of a trumpet, this “epiglottic funnel” is responsible for seemingly magical acoustic phenomena. It’s these same nonlinear interactions, I believe, that lead to such far-fetched theories of what is actually going on during clear, loud, and resonant singing. Although nonlinear source-filter interactions are outside the scope of this article, it is worth mentioning that the structures described here do not seem to produce these effects by accident. It’s a fascinating natural construction, one that implies nature really means it when it comes to supraglottic narrowing. That all sounds rather complicated, but if we observe the animated diagram below (an animation depicting narrowing of the “epiglottic funnel”), we can see that all this makes intuitive sense. It can be explained as a long and confusing list of tiny muscles, or demonstrated as a gesture similar to cupping your hands around your mouth in a “megaphone” shape to yell something as loudly as possible.

It is not plausible that the CT and TA muscles could not accomplish much at all by themselves, and it ought to be obvious to even a casual observer that nature agrees wholeheartedly, having provided us with a system with two winning qualities that are not so easy to find in the same natural system: versatility (via complexity) and reliability (via functional redundancy or partial redundancy). But the model persists, because even the very limited descriptions of vocal tract function listed her are extremely hard to think. Our minds are simply not good at translating these written descriptions into accurate mental images. If the results of thinking and trying to understand anatomy are going to be worse anyway, why bother trying to make sense of such a complicated description of something most of us can do already? Consumers of vocal pedagogy, like most people, are cognitive misers. They do not like to invest more energy into thinking than is necessary, especially if the results are only going to be worse for it. I am no different: although I can now describe a rather nuanced model of the vocal tract, and interpret written descriptions of its physical features, I only developed the ability because I began to spend time developing a 3-D model of the vocal tract. Understanding the form and function of the various parts is possible without necessarily being able to describe them, and it can be much easier to do with proper visual aids. It is attractive to the cognitive miser, which is to say it is attractive to you and me. And yet, it can also be more profitable, because only one’s intuitive understanding of the vocal tract can be applied to one’s practice of singing. Words and descriptions are only useful insofar as they help students attain an intuitive understanding that can be applied with good results.
Similar problems of description and illustration affect the general understanding of the tongue, pharynx, hyoid, jaw, respiratory system, and of the relationships that exist permanently or form conditionally between the rest of the body and different elements of the vocal tract. Although the capabilities of this model are still crude, it has the potential to incorporate ever more realistic and detailed movement constraints, and to integrate further subsystems to create a model that is internally consistent enough to serve as a test for any given biomechanical theory offered by researchers and pedagogues; if the proposed pattern of movement described in one subsystem results in a conflict or absurdity in another, the conflict can potentially be resolved with further research or debate. Because of the open-source nature of blender, it is possible to develop programming specific to the task in the Python programming language, and to distribute a basic and free-for-individual-use demonstration model (using the native blender game engine), while simultaneously developing L.A.R.R.I. as an open-source platform on which anyone is free to develop and offer professional services such as pedagogy and research. All of this is possible, but highly improbable for an individual working alone. This article, therefore, is not only an introduction to the project – it is also an invitation. L.A.R.R.I. is currently seeking individuals and organizations who would like to take part. If you would like to contribute, please follow the contact instructions at newschoolsinger.com.
-Philippe Castagner
newschoolsinger.com
