Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs

		Serving Open Source Models 4x faster than vLLM by quantizing with ~no tradeoffs (fireworks.ai)
		3 points by georgehill on Jan 10, 2024 \| hide \| past \| favorite