The answer is quite straightforward: responsiveness and simplicity.
Communication between the server and client in real-world scenarios (outside of a studio simulation) always takes a certain amount of time (typically measures in ping time). We aim to fill this figurative gap, respectively overcome this boundary between the server and connected clients, and we want to ensure seamless gaming experience
Effects, animations, sounds and UI are generally handled on client’s side because the results are immediate and there are no network delays. Otherwise, a player with an average ping time of 250 ms would receive visual updates at least 0.25 seconds later.
In Roblox, certain properties, such as humanoid states, animation status, sound status, player’s position and orientation, are directly replicated from the client to the server. As long as this is the case, working in the opposite direction would be counter-intuitive.
If animations were not replicated, the workflow would become more complex, likely requiring separate animations to be played locally and for other players involved, in order to achieve the same fluidity.
Another reason for handling visuals locally is that it lightens the server’s burden. Furthermore, local handling is practical. It allows for easier loading (a single time), good control over playing, stopping and managing the animations. A very common mistake I see is when people tend to use the server for playing animations and when they load the same animation onto the animator over and over again.
Fortunately, loading and playing locally is reasonably safe due to ‘filtering enabled’ and other protective measures. Players cannot load nor play just any animation - it has to be owned by the game’s creator.