Consider the three Dell machines described in the following table.
Model | Latitude 3440 | Precision Mobile 7780 | Precision 7960 |
---|---|---|---|
Type | Laptop | Laptop/Workstation | Workstation |
Processor | Intel Core i5-1335U | Intel Core i9-13950HX | Intel Xeon w9-3495X |
Memory (RAM) | 8GB - 3 200 MT/s - DDR4 (1 × 8GB) | 32GB - 5 600 MHz - DDR5 (1 × 32GB) | 512 GB - 4 800 MHz - GDDR6 (8 × 64 GB) |
GPU | Intel Iris Xe Graphics G7 80EUs (integrated) | NVIDIA RTX 3500 | NVIDIA RTX A6000 |
Storage | SSD 256 GB | SSD 1 TB | RAID 5 : 4 × SSD 4To |
Dimensions | 14" - 1920 × 1080 | 15.6" - 3840 × 2160 | -- |
Weight | 1.5 kg | 2.5 kg | -- |
Price | 900 € | 4 600 € | 32 100 € |
Specs | click here | click here | click here |
The first computer is a light laptop suitable for personal and/or office use. The second is a laptop workstation that can support applications requiring computing and graphics. The third is a powerful server capable of working on massive calculations, such as deep learning.
Computers compared in 2020
Model | Latitude 3410 | Precision Mobile 7550 | Precision 7920 |
---|---|---|---|
Type | Laptop | Laptop/Workstation | Workstation |
Processor | Intel Core i5-10210U | Intel Core i9-10885H | 2 × Intel Xeon Gold-5220 |
Memory (RAM) | 8Go 2667MHz DDR4 (1 × 8Go) | 32Go 2933MHz DDR4 (2 × 16Go) | 128Go 2933 MHz DDR4 (8 × 16Go) |
GPU | UHD620 (integrated) | NVIDIA Quadro T2000 | Nvidia Quadro GV100 |
Storage | SSD 128 Go | SSD 512 Go | RAID 5 : 8 × SSD 1 To - SATA |
Dimensions | 14" 1366 × 768 | 15.6" - 3840 × 2160 | -- |
Weight | 1.6 kg | 2.5 kg | -- |
Price | 500 € | 1 500 € | 7 500 € |
Specs | click here | click here | click here |
Look at the specification of the first two computers (link in the last row of the table). How many screens can be simultaneously used with these laptops, regardless of the characteristics of the GPU?
ANSWER ELEMENTS
- Latitude 3440 : 3
- Laptop screen
- HDMI 1.4
- USB 3.2 Gen 2 Type-C port with DisplayPort
- Precision Mobile 7780 : 7
- Laptop screen
- HDMI 2.0a or 2.1
- USB 3.2 Gen 2 Type-C port with DisplayPort
- 2 x Thunderbolt 4 ports with USB Type-C (each supports two 4K displays)
What are the data transfer rates on the available connections of the two laptops?
ANSWER ELEMENTS
- Latitude 3440
- USB 3.2 Gen 1 : 5 Gbit/s
- USB 3.2 Gen 2 Type-C : 10 Gbit/s
- RJ45 (Ethernet) : 10/100/1000 Mbit/s
- Wifi : Up to 2400 Mbit/s
- Bluetooth 5.3 : Up to 2 Mbit/s
- WWAN module : Up to 1 Gbit/s DL - 150 Mbit/s UL
- Precision Mobile 7780 : add/change
- Thunderbolt 4 ports with USB Type-C : 40 Gbit/s
- WWAN module : Up to 3 Gbit/s DL - 250 Mbit/s UL
We now compare the processors of the three computers. The following table summarizes the key points. An official and exhaustive comparison is available here.
Intel Core i5-1335U | Intel Core i9-13950HX | Intel Xeon w9-3495X | |
Number of cores | 2 + 8 | 8 + 16 | 56 |
Base frequency | 1.7 GHz | 2.20 GHz | 1.90 Ghz |
Turbo frequency | 4.60 GHz | 5.50 Ghz | 4.80 Ghz |
Cache | 12 MB | 36 MB | 105 MB |
Power | 15W | 55W | 350W |
Max. memory size | 64 GB | 128 GB | 4 TB |
Max. memory channels | 2 | 2 | 8 |
Recommended price | $340.00 | $590.00 | $5889.00 |
Geekbench 5 single-core score (cpu-monkey) | 1628 | 2108 | 1734 |
Geekbench 5 multi-core score (cpu-monkey) | 7240 | 19759 | 56911 |
Computers proposed for comparison in 2020
Intel Core i5 10210U | Intel Core i9 10885H | Intel Xeon Gold 5220 | |
Number of cores | 4 | 8 | 18 |
Base frequency | 1.6 GHz | 2.40 GHz | 2.20 Ghz |
Turbo frequency | 4.20 GHz | 5.30 Ghz | 3.90 Ghz |
Cache | 6 MB | 16 MB | 25 MB |
Power | 15 W | 45 W | 125 W |
Max memory size | 64 GB | 128 GB | 1 TB |
Max memory freq | 2 667 MHz | 2 933 MHz | 2 667 Mhz |
Max memory channels | 2 | 2 | 6 |
- Discuss the differences between the three processors. Can you guess what the different features listed in the table mean?
- Look at the CPU limitations with memory. Are these limitations respected in the hardware configuration of the three computers? Can we still add memory to the three configurations?
ANSWER ELEMENTS
Here is a description of the features:
- Number of cores. A core is a single processing unit of a CPU. A core can read and execute program instructions. A multi-core CPU can execute multiple instructions at the same time.
- Base frequency. The frequency of the CPU clock under typical utilization.
- Turbo frequency. The frequency of the CPU clock under heavy utilization. The Turbo Boost technology dynamically increases the clock speed to handle heavy workload. The frequency of the clock is set based on the system heat and the number of cores in use. The clock speed cannot exceed the one specified by the turbo frequency feature.
- Cache. Indicates the amount of cache memory integrated in the CPU chip.
- Power. The average level of heat generated under heavy utilization while the CPU is running at its base frequency.
- Max. memory size. The maximum amount of memory that the CPU can address.
- Max. memory channels. The maximum number of memory modules that can be supported by the CPU.
- Geekbench. A benchmark, that is a set of performance tests, that gives a performance score to a CPU. The scores in the table are taken from the website cpu-monkeys.
In general, a higher clock speed means a faster CPU. However, many other factors might affect the performances. The CPU has several ways to optimize the execution of program instructions. Today's technology is able to distribute the execution of instructions among the computer cores in an intelligent way. It is therefore possible that an older CPU with a higher clock frequency is less performant of a newer CPU with a lower clock speed.
The cache size is also important.
Benchmarks are used to compare the CPU performances. From the table we learn that the Intel Core i9 has a higher score than the Intel Xeon (that costs more) when only one single core is used. This is true for the selected benchmark, another may give a different result. The performance scores should be considered with caution.
As for the second question, we see that the three hardware configurations respect the limitations with the memory. Actually, the first two computers only use one memory channel, we can add one more module, as the respective CPUs support two channels.
Finally, you might have noticed that the first two computers have two types of cores. You can see it in the detailed specifications. Intel CPUs are generally equipped with P-cores (or, Performance-cores) and E-cores (or, Efficient-cores). The first are used for the heavy workload; they are more performant, but they cost more and they generate more heat (as a result, they consume more power). The second are used for background tasks that do not require a high computing power. They are less performant, but they cost less and generate less heat. The goal of having two types of cores is to strike a balance between the performances and the cost and power consumption of the CPU.
We now look at the graphics processing unit (GPU). The following table describes the features of the GPUs in the three computers.
Intel Iris Xe Graphics G7 80EUs (integrated) | NVIDIA RTX 3500 | NVIDIA RTX A6000 | |
Number of cores (CUDA) | 80 | 5 120 | 10 752 |
Clock speed | 1 250 MHz | 1 545 MHz | 2 505 MHz |
Memory | 8 GB (shared) | 12 GB | 48 GB |
FP16 performance | 768.0 GFLOPS | 15.82 TFLOPS | 38.7 TFLOPS |
FP32 performance | 384.0 GFLOPS | 15.82 TFLOPS | 38.7 TFLOPS |
FP64 performance | 96.0 GFLOPS | 247.2 GFLOPS | 1.21 TFLOPS |
Tensor cores (Deep learning) | -- | 160 | 336 |
Max power consumption | < 15W | 100W | 300 W |
Computers compared in 2020
UHD620 (integrated) | NVIDIA Quadro T2000 | Nvidia Quadro GV100 | |
Number of cores (CUDA) | 192 | 1024 | 5120 |
Clock speed | 1150 MHz | 1785MHz | 1627MHz |
Memory | 32GB(shared) | 4GB | 32GB |
FP16 performance | 768.0 GFLOPS | 7.3 TFLOPS | 29,6 TFLOPS |
FP32 performance | 384.0 GFLOPS | 3.6 TFLOPS | 14,8 TFLOPS |
FP64 performance | 96.0 GFLOPS | 114.2 GFLOPS | 7,4 TFLOPS |
Tensor cores (Deep learning) | -- | -- | 640 |
Tensor performance | -- | -- | 118,5 TFLOPS |
Max power consumption | < 15W | 60W | 250 W |
Compare the features of the three GPUs. Can you understand the meaning of each feature listed in the table?
ANSWER ELEMENTS
- GPUs execute identical computations simultaneously (data parallelism) on a vector of data and produce a corresponding vector of outputs. Their primary purpose is to render three-dimensional images but they are also used to execute compute-intensive applications where data parallelism is involved.
- In the first computer, the GPU is integrated into the CPU. As a result, the CPU and the GPU share the same memory. While integrating the GPU into the CPU results in lower performances, for most users this is a convenient and cheap solution. Gamers and developers of graphics applications need a configuration with a dedicated GPU.
- The second and third computers are equipped with dedicated NVIDIA GPUs. These GPUs have a dedicated memory.
- NVIDIA GPUs consist of several CUDA cores. CUDA (Compute Unified Device Architecture) is a NVIDIA platform that enables the development of applications for GPUs. A CUDA core executes one floating-point operation per clock cycle. Therefore, the number of cores and the clock frequency are good indicators of a GPU performance.
- FP16, FP32 and FP64 stand respectively for half-float (16bits), float (32bits) and double-float (64bits) precision.
- FP16, FP32 and FP64 performance refer to the number of floating-point operations per second. This is measured in GFLOPS (Giga floating-point operations per second) or TFLOPS (Tera floating-point operations per second). The values in the table are important for comparison but they are theoretical. To have a concrete comparison, it is better to look for benchmarks like here.
- Since CUDA cores are limited to the execution of one operation per clock cycle, NVIDIA developed more advanced cores, called tensor cores. They can calculate entire matrix operations per clock cycle and bring new deep learning applications to GPUs.
Consider a typical laptop battery of 50Wh. These laptop screens consume 10W to 20W depending on size and brightness.
Considering that the three main power consuming components in a laptop are the CPU, the screen and the GPU, discuss autonomy and power consumption of the two laptops.
ANSWER ELEMENTS
- The first laptop consumes 15W for both the processor and its integrated GPU, and 10W for the screen. Assuming that the battery works properly, its autonomy is 2 hours under high load conditions.
- The second laptop consumes 45W (processor) + 60W (GPU) + 20W (screen) = 125W. The battery autonomy is less than 30 minutes under high load conditions.
In modern computer architectures, the frequency of each component is dynamically adjusted based on the actual computing load, to save power and reduce the amount of heat generated by the circuits.
The storage of the Precision 7960 is RAID 5 : 4 x SSD 4To. What is RAID?
ANSWER ELEMENTS
RAID is an acronym that stands for Redundant Array of Independent Disks. It is a technology that allows the storage of data across multiple disks to prevent data loss in case of disk failures. One of the benefits of RAID is to increase the fault tolerance of an information system.
There are different versions of RAID, known as levels, each identified by a number.
- RAID 0, also known as disk striping. In this configuration, data is split into different independent partitions (also called stripes); partitions are then stored evenly across several disks. The configuration is illustrated here. The benefit of this configuration is performance : we can read data in parallel from many disks. The disadvantage is that we may lose a set of data partitions if the disk where they are stored fails.
- RAID 1, also known as disk mirroring. In this configuration, a dataset is stored redundantly (or, mirrored) on multiple disks. The configuration is illustrated here. This configuration guarantees fault tolerance, but at the price of using many disks to store one single dataset.
- RAID 4. In this configuration data is split into different independent partitions, and then distributed evenly across several disks, like in RAID 0. Unlike RAID 0, however, an additional disk stores parity data. When a disk fails, the partitions stored in it can be recomputed by using the other partitions and the parity data. The configuration is illustrated here.
Other levels exist. Their discussion is out of the scope of this course. More details are available here.
Importantly, RAID disks are seen by the operating system as one single disk.