Introduction#
Previously in parts 1 and 2, we covered Python’s scalar, sequence, and set types. We learned about the mutability and hashability of these types.
We also saw that the sequence types allow us to look up values quickly by their position in the sequence using the square bracket syntax. What if we could look up values just as quickly, but by an arbitrary key instead of a number, like a key-value pair?
In this part, we will first explore Python’s mutable mapping type, dictionaries, and then move on to the immutable mapping type, frozen dictionaries, which is new in Python 3.15.
An interesting historical note about frozendict is that the idea is nothing new at all. We saw in part 2 that data types can have their frozen counterparts like set and frozenset. But what’s even more surprising is that frozendict was already proposed in 2012 (PEP 416) to be introduced in Python 3.3, but was rejected at the time. We will understand what has changed since then that led to its acceptance in Python 3.15 (PEP 814).
We will also finalize our discussion of Python’s built-in data types with a when to use which type guide. This will help you choose the right data types with real understanding instead of just memorizing the differences between them or their syntax.
Dictionary#
When writing code, we will often need to store and manipulate data that is organized in key-value pairs. For example, we might want to store information about a country, such as its name, population, and capital city.
We learned about sequence and set types that allow us to deal with collections of Python objects with different types. So, we could organize this information in a list of lists.
countries = [
["USA", 340000000, "Washington, D.C."],
["UK", 70000000, "London"],
["France", 69000000, "Paris"],
["Germany", 83000000, "Berlin"],
["Turkey", 88000000, "Ankara"],
["Russia", 146000000, "Moscow"],
["India", 1450000000, "New Delhi"],
["China", 1400000000, "Beijing"],
["Japan", 120000000, "Tokyo"],
]Python allows us to use underscores in numeric literals for better readability. So let’s do that.
countries = [
["USA", 340_000_000, "Washington, D.C."],
["UK", 70_000_000, "London"],
["France", 69_000_000, "Paris"],
["Germany", 83_000_000, "Berlin"],
["Turkey", 88_000_000, "Ankara"],
["Russia", 146_000_000, "Moscow"],
["India", 1_450_000_000, "New Delhi"],
["China", 1_400_000_000, "Beijing"],
["Japan", 120_000_000, "Tokyo"],
]Now, to answer the question “What is the capital of France?”, we would have to loop through the list of countries, check if the name of the country is “France”, and then get the capital.
countries = [
["USA", 340_000_000, "Washington, D.C."],
["UK", 70_000_000, "London"],
["France", 69_000_000, "Paris"],
["Germany", 83_000_000, "Berlin"],
["Turkey", 88_000_000, "Ankara"],
["Russia", 146_000_000, "Moscow"],
["India", 1_450_000_000, "New Delhi"],
["China", 1_400_000_000, "Beijing"],
["Japan", 120_000_000, "Tokyo"],
]
for country in countries:
if country[0] == "France":
print(country[2])
break $ python countries.py
ParisWe could also make this a little more readable by using sequence unpacking.
countries = [
["USA", 340_000_000, "Washington, D.C."],
["UK", 70_000_000, "London"],
["France", 69_000_000, "Paris"],
["Germany", 83_000_000, "Berlin"],
["Turkey", 88_000_000, "Ankara"],
["Russia", 146_000_000, "Moscow"],
["India", 1_450_000_000, "New Delhi"],
["China", 1_400_000_000, "Beijing"],
["Japan", 120_000_000, "Tokyo"],
]
for name, _population, capital in countries:
if name == "France":
print(capital)
break But this is too much code to write just to query the capital of a country. We didn’t even cover the control blocks like for loops, if and break statements yet. It also doesn’t scale well if we have a lot of countries in our list, since we are looping through them one by one.
A Python dict solves this directly: it stores key-value pairs and looks up values by their keys.
countries = {
"USA": (340_000_000, "Washington, D.C."),
"UK": (70_000_000, "London"),
"France": (69_000_000, "Paris"),
"Germany": (83_000_000, "Berlin"),
"Turkey": (88_000_000, "Ankara"),
"Russia": (146_000_000, "Moscow"),
"India": (1_450_000_000, "New Delhi"),
"China": (1_400_000_000, "Beijing"),
"Japan": (120_000_000, "Tokyo"),
}
print(countries["France"][1])$ python countries.py
ParisHere, we organized the information about each country in a tuple, and then stored the tuples in a dictionary as values while using the country name as the key. In part 2 we said tuples are like immutable lists, but here we are using a tuple as a record, relying on it being ordered and immutable, which is another, often overlooked way of using tuples.
To make the code more readable, we could even use a named tuple, a tuple whose values can be accessed by names, instead of a plain tuple.
from collections import namedtuple
Country = namedtuple("Country", ["population", "capital"])
countries = {
"USA": Country(340_000_000, "Washington, D.C."),
"UK": Country(70_000_000, "London"),
"France": Country(69_000_000, "Paris"),
"Germany": Country(83_000_000, "Berlin"),
"Turkey": Country(88_000_000, "Ankara"),
"Russia": Country(146_000_000, "Moscow"),
"India": Country(1_450_000_000, "New Delhi"),
"China": Country(1_400_000_000, "Beijing"),
"Japan": Country(120_000_000, "Tokyo"),
}
print(countries["France"].capital)We could also use a nested dictionary.
countries = {
"USA": {"population": 340_000_000, "capital": "Washington, D.C."},
"UK": {"population": 70_000_000, "capital": "London"},
"France": {"population": 69_000_000, "capital": "Paris"},
"Germany": {"population": 83_000_000, "capital": "Berlin"},
"Turkey": {"population": 88_000_000, "capital": "Ankara"},
"Russia": {"population": 146_000_000, "capital": "Moscow"},
"India": {"population": 1_450_000_000, "capital": "New Delhi"},
"China": {"population": 1_400_000_000, "capital": "Beijing"},
"Japan": {"population": 120_000_000, "capital": "Tokyo"},
}
print(countries["France"]["capital"])The quick look-up of dict is made possible by using the hash table idea for the keys of the dictionary. Hashing also powers the set types as we saw in part 2. The keys of a dictionary are hashed to find the bucket that holds the corresponding value, which is what makes the look-up fast. There are two straightforward consequences of this.
First, the keys of a dictionary will be unique, since they are hashed, just like sets contain unique elements. If you try to use the same key again, it will overwrite the previous value.
>>> favorite_shows = {
... "Alice": "Mr. Robot",
... "Bob": "Suits",
... "Charlie": "Monty Python's Flying Circus",
... }
>>> favorite_shows["Alice"] = "Person of Interest"
>>> favorite_shows["Alice"]
'Person of Interest'Second, you can use other types as keys too, as long as they are hashable. For example, you could use integers or tuples with immutable elements as keys.
>>> id_to_person = {
... 0: {"name": "Alice", "age": 30, "city": "New York"},
... 1: {"name": "Bob", "age": 25, "city": "Los Angeles"},
... 2: {"name": "Charlie", "age": 35, "city": "Chicago"},
... }But a list or a set cannot be used as a key since they are mutable and not hashable.
>>> my_dict = {[1, 2, 3]: "list key, str value"}
Traceback (most recent call last):
File "<python-input-1>", line 1, in <module>
my_dict = {[1, 2, 3]: "list key, str value"}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot use 'list' as a dict key (unhashable type: 'list')
>>> my_dict = {{1, 2, 3}: "set key, str value"}
Traceback (most recent call last):
File "<python-input-2>", line 1, in <module>
my_dict = {{1, 2, 3}: "set key, str value"}
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: cannot use 'set' as a dict key (unhashable type: 'set')Frozen Dictionary#
There are times when we may not want to change the information stored in a dictionary after it is created. The frozen dictionary is a new built-in type introduced in Python 3.15. It is an immutable version of the dictionary, which means that unlike dict, once you create a frozendict, you cannot make any modifications to its keys or values.
Countries are a good example of this since attributes of a country are not expected to change frequently, and also we do not expect new countries to be formed or existing countries to cease to exist. Frozen dictionary is built-in, so you can use it directly without any import.
countries = frozendict({
"USA": (340_000_000, "Washington, D.C."),
"UK": (70_000_000, "London"),
"France": (69_000_000, "Paris"),
"Germany": (83_000_000, "Berlin"),
"Turkey": (88_000_000, "Ankara"),
"Russia": (146_000_000, "Moscow"),
"India": (1_450_000_000, "New Delhi"),
"China": (1_400_000_000, "Beijing"),
"Japan": (120_000_000, "Tokyo"),
})In 2012, the proposal to introduce frozendict in Python 3.3 was rejected because people did not often need an immutable mapping type at the time. In the small number of cases where it was needed, there were already third-party libraries that provided this functionality, so there was no pressing need to have it in the standard library.
Instead, the read-only view of a dictionary idea from that proposal was introduced in the standard library as types.MappingProxyType. But MappingProxyType is not hashable, and it’s also easy to retrieve the original dictionary, which can then be mutated.
Since 2012, the Python ecosystem has evolved to have more use cases for an immutable mapping type. For example, asyncio was added in 2014 (Python 3.4), and more recently, free threading was added in 2024 (Python 3.13), and concurrent.interpreters was added in 2025 (Python 3.14). Don’t worry if you don’t know what these are, we will cover them in the next chapters. But these are all features that allow you to write code that can deal with or do more than one thing at the same time. Concurrent and parallel programming have increased the need for immutable mapping structures that are safe for such work, which led to the acceptance of frozendict in Python 3.15.
When to Use Which Data Type#
So which data type should you actually choose to use, and when?
The basic flat types like none, booleans, numbers, strings, and bytes are usually a straightforward choice for representing simple data since they are all immutable and hashable. Just keep in mind that the immutability of these types may mean you’re creating a new object when you operate on them, which can lead to performance issues if you’re doing a lot of operations.
But more often than not, you don’t have to worry about this. Take building up a long string in a loop.
def build_string(length: int) -> str:
my_long_string = ""
for i in range(length):
my_long_string += str(i)
return my_long_stringThe advice you’ll usually hear is to never write this. People will tell you that += on strings is slow, so you should always collect the strings in a list and .join() them at the end.
def build_string(length: int) -> str:
return "".join([str(i) for i in range(length)])The .join() part is good advice, but the slow part is mostly a myth, because CPython optimizes the += pattern for strings as long as you stay inside a function. When the string you’re growing is a local variable that nothing else refers to, instead of building a brand-new string and copying everything across on every step, it grows the buffer it already has in place. You can watch this by printing the string’s id as the loop runs: it stays the same across most steps, the same buffer being extended, and only jumps on the occasions where the string outgrows its slot and has to move to a new place in the heap.
def build_string(length: int) -> str:
my_long_string = ""
for i in range(length):
my_long_string += str(i)
print(id(my_long_string))
return my_long_string$ python -i string_concatenation.py
>>> build_string(10)
4380210592
4380210592
4380210592
4380210592
4380210592
4380210592
4380210592
4380307888
4380307888
4380307888
'0123456789'Note that += optimization is an implementation detail, not a language guarantee. That is why .join() stays the more predictable choice. But the optimized case is common, and the simple loop is rarely the disaster it’s made out to be. On the rare occasions where it does matter, Python gives you tools like joining, slicing, and bytearrays.
Beyond the simple data types, the choice of data type becomes more nuanced when working with collections of objects. A common beginner’s mistake is to use lists for everything, but as we have seen, there are many other data types that may fit your specific use case better than lists. For example, if you need an ordered collection of items that do not need to be changed after creation, like a record, a tuple could be a better choice than a list.
def oprahify(iterable):
"""Everybody gets a list!"""
return list(iterable)When someone, whether it’s you, other developers, or AI, looking at the code sees a tuple, it’s immediately clear that its length will never change. If the elements of the tuple are immutable, the values will never change either. This can make code easier to understand and reason about, since you don’t have to worry about the state of the data changing unexpectedly. It also gives performance benefits over lists since tuples use less memory and allow Python to do optimizations. The same goes for set and frozenset, or dict and frozendict. If you don’t need mutability, it’s better to use the frozen versions of these data types for better readability and performance.
We also saw that if you are going to be doing a lot of membership testing and the order of objects doesn’t matter, a set type like set or frozenset could be a better choice than sequences like list or tuple, since sets use hash tables for fast lookups. If, in addition, you also have a notion of a key-value pair, then a mapping type like dict or frozendict could be a better choice than sequences since they also use hash tables for fast lookups.
Once you learn about all the data types you can use in your code, there’s a tendency to over-optimize by using the most specific data type for a given use case, but it’s important to remember that readability and maintainability of code are also important factors to consider when choosing a data type. We’ve seen this famous quote before:
“Premature optimization is the root of all evil.” - Donald Knuth
Optimization may be important, but it should not be done prematurely. You should first write code that is correct and easy to understand, and then optimize only if it is necessary and you’ve actually identified the performance bottleneck through profiling. In many cases, the performance difference between different data types may be too small to justify the added complexity of using a more specific data type.
Finally, beyond the built-in types we covered here, there are many other data types that may fit your specific use case better. Here we show a table of some of these data types without going into details, so that you can get a sense of the options available instead of reinventing the wheel.
| Data Type / Family | Why Use It | Mutability | Availability | Notes |
|---|---|---|---|---|
memoryview | Work with binary data without copying it | Depends on underlying object | Built-in | Zero-copy view over objects that support the buffer protocol (influenced by NumPy). Mutable only if the underlying buffer is mutable. |
array.array | Store same-type primitive numeric data compactly | Mutable | Standard library | More memory-efficient than a list of Python numbers, but less powerful than NumPy. |
queue.Queue | Thread-safe queues for producer-consumer workflows | Mutable | Standard library | Use for multi-threaded FIFO workflows. For LIFO or priority behavior, use queue.LifoQueue or queue.PriorityQueue. |
collections.deque | Fast append/pop from both ends | Mutable | Standard library | Good for queues, stacks, sliding windows, and BFS. Faster than queue.Queue when thread safety is not needed. |
collections.Counter | Count hashable objects | Mutable | Standard library | A dict subclass for frequencies, histograms, and multiset-like operations. |
collections.defaultdict | Provide default values for missing keys | Mutable | Standard library | Useful for grouping, counting, and nested data structures. |
types.MappingProxyType | Expose a read-only dynamic view of a mapping | Read-only view | Standard library | Prevents mutation through the proxy, but reflects changes made to the original mapping. |
Array / tensor types: numpy.ndarray, torch.Tensor, jax.Array | Efficient numerical computing over large arrays | Varies | Third-party | NumPy is the baseline array type; PyTorch is common for deep learning; JAX is common for compiled, functional, accelerator-oriented numerical code. |
DataFrame / tabular types: pandas.DataFrame, polars.DataFrame | Work with tabular data | Varies | Third-party | pandas is the common default; Polars is a fast, Arrow-based alternative. DuckDB is a useful SQL companion for querying DataFrames and files. |
Conclusion#
So far, we’ve looked at Python’s object model and data types: what objects and values are, how they behave, and how Python represents them.
In the next chapter, we’ll ask a different question: once we have objects and values, how do we tell Python to decide what happens next?
That brings us to control flow. We’ll cover familiar constructs like if, for, while, break, continue, and return but from a deeper perspective. Our goal is never to memorize syntax, so we’ll look at how these statements shape the path of execution, how conditions redirect a program, how loops repeat, and what these ideas eventually reduce to in a computer program.
See you there!
